Text Generation
PEFT
Safetensors
GGUF
English
Thai
lora
qwen3.5-moe
qwen3.6
reasoning
kimi-k2.6
claude-opus
distillation
weight-diff
svd
Instructions to use hotdogs/qwen3.6-35b-opus-to-kimi-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use hotdogs/qwen3.6-35b-opus-to-kimi-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled") model = PeftModel.from_pretrained(base_model, "hotdogs/qwen3.6-35b-opus-to-kimi-lora") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -191,6 +191,40 @@ sudo docker run --rm -p 8080:8080 \
|
|
| 191 |
| `-fa on` | Flash Attention enabled |
|
| 192 |
| `--mlock` | Lock model in RAM (prevents swap) |
|
| 193 |
| `--jinja` | Use Jinja2 chat templates |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 194 |
|
| 195 |
**Single GPU alternative:**
|
| 196 |
```bash
|
|
|
|
| 191 |
| `-fa on` | Flash Attention enabled |
|
| 192 |
| `--mlock` | Lock model in RAM (prevents swap) |
|
| 193 |
| `--jinja` | Use Jinja2 chat templates |
|
| 194 |
+
| `--lora` | Apply LoRA adapter (applied first, before scaled) |
|
| 195 |
+
| `--lora-scaled` | Apply LoRA with scale (comma-separated for multiple) |
|
| 196 |
+
|
| 197 |
+
---
|
| 198 |
+
|
| 199 |
+
### 🛡️ 3-Layer Stack with Refusal Removal LoRA
|
| 200 |
+
|
| 201 |
+
For the **purest uncensored stack** using weight-diff extracted LoRAs:
|
| 202 |
+
|
| 203 |
+
| Layer | Component | Purpose |
|
| 204 |
+
|-------|-----------|---------|
|
| 205 |
+
| 1 | Opus GGUF (base model) | Qwen3.6-35B + Opus reasoning |
|
| 206 |
+
| 2 | [refusal-removal-lora](https://huggingface.co/hotdogs/qwen3.6-35b-refusal-removal-lora) | 🛡️ Remove refusals (uncensored) |
|
| 207 |
+
| 3 | opus-to-kimi-lora (scale 0.5) | 🎨 Kimi K2.6 verbose style |
|
| 208 |
+
|
| 209 |
+
```bash
|
| 210 |
+
docker run --gpus all -p 8080:8080 \
|
| 211 |
+
-v /path/to/models:/models \
|
| 212 |
+
ghcr.io/ggml-org/llama.cpp:server-cuda \
|
| 213 |
+
-m /models/lordx64_Qwen3.6-35B-A3B-Claude-4.7-Opus-Q6_K.gguf \
|
| 214 |
+
--lora /models/qwen3.6-35b-refusal-removal-lora.gguf \
|
| 215 |
+
--lora-scaled /models/qwen3.6-35b-opus-to-kimi-lora.gguf:0.5 \
|
| 216 |
+
--host 0.0.0.0 --port 8080 \
|
| 217 |
+
--n-gpu-layers 999 \
|
| 218 |
+
--ctx-size 131072 \
|
| 219 |
+
--batch-size 4096 \
|
| 220 |
+
-fa on
|
| 221 |
+
```
|
| 222 |
+
|
| 223 |
+
> 🔬 **Technical note**: The refusal-removal LoRA was extracted via Weight-Diff SVD from `huihui-ai/Huihui-Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated` minus `lordx64/...Opus`. It modifies **only o_proj** in 10 layers (3,7,11,15,19,23,27,31,35,39) — an extremely sparse signal compared to full distillation (Kimi LoRA touches all 44 attention tensors).
|
| 224 |
+
|
| 225 |
+
---
|
| 226 |
+
|
| 227 |
+
**Old stack (uncensored GGUF base):**
|
| 228 |
|
| 229 |
**Single GPU alternative:**
|
| 230 |
```bash
|