# Weight-Diff SVD Extraction: Universal Method ## How to Create LoRA Adapters from Weight Differences Between Two Models This technique works for **any LLM architecture** with two adapters trained from the same base model. No GPU required, no training data needed, runs in 1-3 minutes on CPU. ``` Model A (merged LoRA) Model B (merged LoRA) │ │ └──────────┬─────────────────────┘ │ W_B - W_A = Δ ▼ Truncated SVD (rank r) │ ▼ LoRA Adapter A→B (7 MB) ``` --- ## 1. Requirements ✅ Works when: - Both models share the **same base architecture and base weights** (same commit hash) - Both models were trained with **LoRA + merge** (not full fine-tune) - Tensor names match across both models - At least 4 GB RAM to load 2 tensors at a time ❌ Does NOT work when: - Different architectures (different base models) - Full fine-tune (delta may exceed low-rank assumption) - config.json / tokenizer was modified during fine-tuning - Less than 4 GB RAM --- ## 2. Step-by-Step Guide ### Step 1: Choose Two Models ```python MODEL_A = "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled" # Source MODEL_B = "lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled" # Target ``` Rule: Both models must have identical tensor names and identical config.json. ### Step 2: Choose Target Modules Select only the linear layers you want to extract: ```python TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj"] # attention only # or TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"] # attention + MLP ``` ⚠️ **Important:** Skip 3D tensors (e.g. MoE expert layers `[256, 2048, 512]`) — they require per-slice SVD which is more complex. ### Step 3: Choose LoRA Rank ```python RANK = 16 # standard: best balance of size vs quality RANK = 8 # minimal: smaller, faster, higher reconstruction error RANK = 32 # high quality: 2× larger, ~4% less error ``` Tip: Run reconstruction error analysis to find the optimal rank for your use case. ### Step 4: Run Extraction Script ```bash python3 extract_lora_diff.py \ --model_a lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled \ --model_b lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled \ --output ./my-lora-adapter \ --rank 16 \ --target_modules q_proj,k_proj,v_proj,o_proj ``` ### Step 5: Use the Adapter **Python (PEFT):** ```python from peft import PeftModel from transformers import AutoModelForCausalLM base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-35B-A3B") model = PeftModel.from_pretrained(base, "./my-lora-adapter") # model now has style B! ``` **llama.cpp (GGUF):** ```bash # Convert to GGUF first python3 llama.cpp/convert_lora_to_gguf.py ./my-lora-adapter # Run inference llama-cli -m base-Q6_K.gguf --lora my-lora-adapter.gguf -p "prompt" ``` --- ## 3. Mathematical Foundation ``` Given: M_A = W_base + Δ_A (Model A = base + LoRA A) M_B = W_base + Δ_B (Model B = base + LoRA B) Diff: D = M_B - M_A = Δ_B - Δ_A (base cancels, only delta remains) SVD: D ≈ U_r · Σ_r · V_r^T (rank-r approximation) LoRA: A = √Σ_r · V_r^T (lora_A) B = U_r · √Σ_r (lora_B) Forward: h = W_0·x + B·A·x (standard LoRA forward) ``` **Why it works:** - Both A and B were trained with LoRA rank=r → their difference has rank ≤ 2r - SVD at rank=r can reconstruct the delta almost completely (91-95% energy retention) - No training needed — this is pure mathematical decomposition --- ## 4. Examples for Other Models ### Llama 3.1 8B — Style Transfer ```bash # Two models fine-tuned from the same Llama-3.1-8B base MODEL_A = "user/llama3.1-8b-formal-style" # formal style MODEL_B = "user/llama3.1-8b-casual-style" # casual style python3 extract_lora_diff.py \ --model_a user/llama3.1-8b-formal-style \ --model_b user/llama3.1-8b-casual-style \ --output ./llama-formal-to-casual \ --rank 16 \ --target_modules q_proj,k_proj,v_proj,o_proj ``` ### Mistral 7B — Domain Adaptation ```bash MODEL_A = "mistralai/Mistral-7B-Instruct-v0.3" # general MODEL_B = "user/Mistral-7B-medical-finetuned" # medical domain python3 extract_lora_diff.py \ --model_a mistralai/Mistral-7B-Instruct-v0.3 \ --model_b user/Mistral-7B-medical-finetuned \ --output ./mistral-medical-lora \ --rank 16 \ --target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj ``` ### Qwen2.5 72B — Safety Unlearning ```bash # Extract refusal delta between safe and uncensored versions MODEL_A = "Qwen/Qwen2.5-72B-Instruct" # with safety MODEL_B = "user/Qwen2.5-72B-uncensored" # without safety python3 extract_lora_diff.py \ --model_a Qwen/Qwen2.5-72B-Instruct \ --model_b user/Qwen2.5-72B-uncensored \ --output ./qwen-safety-removal-lora \ --rank 16 ``` --- ## 5. Parameter Reference | Parameter | Default | Description | |-----------|---------|-------------| | `--rank` | 16 | LoRA rank. Higher = larger + better quality. Lower = smaller + faster | | `--target_modules` | q,k,v,o_proj | Modules to extract. Add gate/up/down for MLP | | `--alpha` | 32 | LoRA alpha (scaling factor). Typically 2× rank | | `--skip_3d` | True | Automatically skip 3D tensors (MoE experts) | | `--output_format` | peft | `peft` or `gguf` or `both` | --- ## 6. Troubleshooting | Problem | Cause | Solution | |---------|-------|----------| | `KeyError: tensor name mismatch` | Different base models | Use models trained from same base | | `CUDA out of memory` | Loading full model | Use tensor-by-tensor mode (default) | | `ValueError: non contiguous tensor` | SVD output not contiguous | Add `.contiguous()` before saving | | `GGUF conversion failed` | Tensor name mismatch | PEFT uses `.lora_A.default`, GGUF expects `.lora_A.weight` — rename | | `Rank too high for tensor` | Tensor dimensions < rank | Reduce rank or skip that tensor | --- ## 7. Limitations 1. **Attention-only bias**: Using only attention layers may miss FFN/MLP-level changes 2. **Low-rank assumption**: Works best with LoRA-merged models; full fine-tunes may exceed rank 3. **No quality guarantee**: The adapter is a mathematical reconstruction — no guarantee it matches direct training quality 4. **Single-style transfer**: Extracts only the difference between 2 styles — for 3+ styles, create multiple adapters --- ## 8. Extraction Script `extract_lora_diff.py` (193 lines) — production-ready extraction script available in this repo. --- ## 9. References & Credit - **Technique:** UKA (Hermes Agent, Nous Research) & hotdogs - **Paper:** [Weight-Diff SVD Extraction: Zero-Shot LoRA Adapter Synthesis](https://huggingface.co/hotdogs/qwen3.6-35b-opus-to-kimi-lora/blob/main/paper.pdf) - **Code + Adapter:** https://huggingface.co/hotdogs/qwen3.6-35b-opus-to-kimi-lora - **LoRA paper:** Hu et al., 2021 (arXiv:2106.09685) - **QLoRA paper:** Dettmers et al., 2023 (arXiv:2305.14314)