hotdogs commited on
Commit
e898ae1
·
verified ·
1 Parent(s): 0e96fd5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md CHANGED
@@ -191,6 +191,40 @@ sudo docker run --rm -p 8080:8080 \
191
  | `-fa on` | Flash Attention enabled |
192
  | `--mlock` | Lock model in RAM (prevents swap) |
193
  | `--jinja` | Use Jinja2 chat templates |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
 
195
  **Single GPU alternative:**
196
  ```bash
 
191
  | `-fa on` | Flash Attention enabled |
192
  | `--mlock` | Lock model in RAM (prevents swap) |
193
  | `--jinja` | Use Jinja2 chat templates |
194
+ | `--lora` | Apply LoRA adapter (applied first, before scaled) |
195
+ | `--lora-scaled` | Apply LoRA with scale (comma-separated for multiple) |
196
+
197
+ ---
198
+
199
+ ### 🛡️ 3-Layer Stack with Refusal Removal LoRA
200
+
201
+ For the **purest uncensored stack** using weight-diff extracted LoRAs:
202
+
203
+ | Layer | Component | Purpose |
204
+ |-------|-----------|---------|
205
+ | 1 | Opus GGUF (base model) | Qwen3.6-35B + Opus reasoning |
206
+ | 2 | [refusal-removal-lora](https://huggingface.co/hotdogs/qwen3.6-35b-refusal-removal-lora) | 🛡️ Remove refusals (uncensored) |
207
+ | 3 | opus-to-kimi-lora (scale 0.5) | 🎨 Kimi K2.6 verbose style |
208
+
209
+ ```bash
210
+ docker run --gpus all -p 8080:8080 \
211
+ -v /path/to/models:/models \
212
+ ghcr.io/ggml-org/llama.cpp:server-cuda \
213
+ -m /models/lordx64_Qwen3.6-35B-A3B-Claude-4.7-Opus-Q6_K.gguf \
214
+ --lora /models/qwen3.6-35b-refusal-removal-lora.gguf \
215
+ --lora-scaled /models/qwen3.6-35b-opus-to-kimi-lora.gguf:0.5 \
216
+ --host 0.0.0.0 --port 8080 \
217
+ --n-gpu-layers 999 \
218
+ --ctx-size 131072 \
219
+ --batch-size 4096 \
220
+ -fa on
221
+ ```
222
+
223
+ > 🔬 **Technical note**: The refusal-removal LoRA was extracted via Weight-Diff SVD from `huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated` minus `lordx64/...Opus`. It modifies **only o_proj** in 10 layers (3,7,11,15,19,23,27,31,35,39) — an extremely sparse signal compared to full distillation (Kimi LoRA touches all 44 attention tensors).
224
+
225
+ ---
226
+
227
+ **Old stack (uncensored GGUF base):**
228
 
229
  **Single GPU alternative:**
230
  ```bash