Text Generation
PEFT
Safetensors
GGUF
English
Thai
lora
qwen3.5-moe
qwen3.6
reasoning
kimi-k2.6
claude-opus
distillation
weight-diff
svd
Instructions to use hotdogs/qwen3.6-35b-opus-to-kimi-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use hotdogs/qwen3.6-35b-opus-to-kimi-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled") model = PeftModel.from_pretrained(base_model, "hotdogs/qwen3.6-35b-opus-to-kimi-lora") - Notebooks
- Google Colab
- Kaggle
| # Weight-Diff SVD Extraction: Universal Method | |
| ## How to Create LoRA Adapters from Weight Differences Between Two Models | |
| This technique works for **any LLM architecture** with two adapters trained from the same base model. | |
| No GPU required, no training data needed, runs in 1-3 minutes on CPU. | |
| ``` | |
| Model A (merged LoRA) Model B (merged LoRA) | |
| │ │ | |
| └──────────┬─────────────────────┘ | |
| │ W_B - W_A = Δ | |
| ▼ | |
| Truncated SVD (rank r) | |
| │ | |
| ▼ | |
| LoRA Adapter A→B (7 MB) | |
| ``` | |
| --- | |
| ## 1. Requirements | |
| ✅ Works when: | |
| - Both models share the **same base architecture and base weights** (same commit hash) | |
| - Both models were trained with **LoRA + merge** (not full fine-tune) | |
| - Tensor names match across both models | |
| - At least 4 GB RAM to load 2 tensors at a time | |
| ❌ Does NOT work when: | |
| - Different architectures (different base models) | |
| - Full fine-tune (delta may exceed low-rank assumption) | |
| - config.json / tokenizer was modified during fine-tuning | |
| - Less than 4 GB RAM | |
| --- | |
| ## 2. Step-by-Step Guide | |
| ### Step 1: Choose Two Models | |
| ```python | |
| MODEL_A = "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled" # Source | |
| MODEL_B = "lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled" # Target | |
| ``` | |
| Rule: Both models must have identical tensor names and identical config.json. | |
| ### Step 2: Choose Target Modules | |
| Select only the linear layers you want to extract: | |
| ```python | |
| TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj"] # attention only | |
| # or | |
| TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj", | |
| "gate_proj", "up_proj", "down_proj"] # attention + MLP | |
| ``` | |
| ⚠️ **Important:** Skip 3D tensors (e.g. MoE expert layers `[256, 2048, 512]`) — they require per-slice SVD which is more complex. | |
| ### Step 3: Choose LoRA Rank | |
| ```python | |
| RANK = 16 # standard: best balance of size vs quality | |
| RANK = 8 # minimal: smaller, faster, higher reconstruction error | |
| RANK = 32 # high quality: 2× larger, ~4% less error | |
| ``` | |
| Tip: Run reconstruction error analysis to find the optimal rank for your use case. | |
| ### Step 4: Run Extraction Script | |
| ```bash | |
| python3 extract_lora_diff.py \ | |
| --model_a lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled \ | |
| --model_b lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled \ | |
| --output ./my-lora-adapter \ | |
| --rank 16 \ | |
| --target_modules q_proj,k_proj,v_proj,o_proj | |
| ``` | |
| ### Step 5: Use the Adapter | |
| **Python (PEFT):** | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoModelForCausalLM | |
| base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-35B-A3B") | |
| model = PeftModel.from_pretrained(base, "./my-lora-adapter") | |
| # model now has style B! | |
| ``` | |
| **llama.cpp (GGUF):** | |
| ```bash | |
| # Convert to GGUF first | |
| python3 llama.cpp/convert_lora_to_gguf.py ./my-lora-adapter | |
| # Run inference | |
| llama-cli -m base-Q6_K.gguf --lora my-lora-adapter.gguf -p "prompt" | |
| ``` | |
| --- | |
| ## 3. Mathematical Foundation | |
| ``` | |
| Given: M_A = W_base + Δ_A (Model A = base + LoRA A) | |
| M_B = W_base + Δ_B (Model B = base + LoRA B) | |
| Diff: D = M_B - M_A = Δ_B - Δ_A (base cancels, only delta remains) | |
| SVD: D ≈ U_r · Σ_r · V_r^T (rank-r approximation) | |
| LoRA: A = √Σ_r · V_r^T (lora_A) | |
| B = U_r · √Σ_r (lora_B) | |
| Forward: h = W_0·x + B·A·x (standard LoRA forward) | |
| ``` | |
| **Why it works:** | |
| - Both A and B were trained with LoRA rank=r → their difference has rank ≤ 2r | |
| - SVD at rank=r can reconstruct the delta almost completely (91-95% energy retention) | |
| - No training needed — this is pure mathematical decomposition | |
| --- | |
| ## 4. Examples for Other Models | |
| ### Llama 3.1 8B — Style Transfer | |
| ```bash | |
| # Two models fine-tuned from the same Llama-3.1-8B base | |
| MODEL_A = "user/llama3.1-8b-formal-style" # formal style | |
| MODEL_B = "user/llama3.1-8b-casual-style" # casual style | |
| python3 extract_lora_diff.py \ | |
| --model_a user/llama3.1-8b-formal-style \ | |
| --model_b user/llama3.1-8b-casual-style \ | |
| --output ./llama-formal-to-casual \ | |
| --rank 16 \ | |
| --target_modules q_proj,k_proj,v_proj,o_proj | |
| ``` | |
| ### Mistral 7B — Domain Adaptation | |
| ```bash | |
| MODEL_A = "mistralai/Mistral-7B-Instruct-v0.3" # general | |
| MODEL_B = "user/Mistral-7B-medical-finetuned" # medical domain | |
| python3 extract_lora_diff.py \ | |
| --model_a mistralai/Mistral-7B-Instruct-v0.3 \ | |
| --model_b user/Mistral-7B-medical-finetuned \ | |
| --output ./mistral-medical-lora \ | |
| --rank 16 \ | |
| --target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj | |
| ``` | |
| ### Qwen2.5 72B — Safety Unlearning | |
| ```bash | |
| # Extract refusal delta between safe and uncensored versions | |
| MODEL_A = "Qwen/Qwen2.5-72B-Instruct" # with safety | |
| MODEL_B = "user/Qwen2.5-72B-uncensored" # without safety | |
| python3 extract_lora_diff.py \ | |
| --model_a Qwen/Qwen2.5-72B-Instruct \ | |
| --model_b user/Qwen2.5-72B-uncensored \ | |
| --output ./qwen-safety-removal-lora \ | |
| --rank 16 | |
| ``` | |
| --- | |
| ## 5. Parameter Reference | |
| | Parameter | Default | Description | | |
| |-----------|---------|-------------| | |
| | `--rank` | 16 | LoRA rank. Higher = larger + better quality. Lower = smaller + faster | | |
| | `--target_modules` | q,k,v,o_proj | Modules to extract. Add gate/up/down for MLP | | |
| | `--alpha` | 32 | LoRA alpha (scaling factor). Typically 2× rank | | |
| | `--skip_3d` | True | Automatically skip 3D tensors (MoE experts) | | |
| | `--output_format` | peft | `peft` or `gguf` or `both` | | |
| --- | |
| ## 6. Troubleshooting | |
| | Problem | Cause | Solution | | |
| |---------|-------|----------| | |
| | `KeyError: tensor name mismatch` | Different base models | Use models trained from same base | | |
| | `CUDA out of memory` | Loading full model | Use tensor-by-tensor mode (default) | | |
| | `ValueError: non contiguous tensor` | SVD output not contiguous | Add `.contiguous()` before saving | | |
| | `GGUF conversion failed` | Tensor name mismatch | PEFT uses `.lora_A.default`, GGUF expects `.lora_A.weight` — rename | | |
| | `Rank too high for tensor` | Tensor dimensions < rank | Reduce rank or skip that tensor | | |
| --- | |
| ## 7. Limitations | |
| 1. **Attention-only bias**: Using only attention layers may miss FFN/MLP-level changes | |
| 2. **Low-rank assumption**: Works best with LoRA-merged models; full fine-tunes may exceed rank | |
| 3. **No quality guarantee**: The adapter is a mathematical reconstruction — no guarantee it matches direct training quality | |
| 4. **Single-style transfer**: Extracts only the difference between 2 styles — for 3+ styles, create multiple adapters | |
| --- | |
| ## 8. Extraction Script | |
| `extract_lora_diff.py` (193 lines) — production-ready extraction script available in this repo. | |
| --- | |
| ## 9. References & Credit | |
| - **Technique:** UKA (Hermes Agent, Nous Research) & hotdogs | |
| - **Paper:** [Weight-Diff SVD Extraction: Zero-Shot LoRA Adapter Synthesis](https://huggingface.co/hotdogs/qwen3.6-35b-opus-to-kimi-lora/blob/main/paper.pdf) | |
| - **Code + Adapter:** https://huggingface.co/hotdogs/qwen3.6-35b-opus-to-kimi-lora | |
| - **LoRA paper:** Hu et al., 2021 (arXiv:2106.09685) | |
| - **QLoRA paper:** Dettmers et al., 2023 (arXiv:2305.14314) | |