Weight-Diff SVD Extraction: Universal Method

How to Create LoRA Adapters from Weight Differences Between Two Models

This technique works for any LLM architecture with two adapters trained from the same base model. No GPU required, no training data needed, runs in 1-3 minutes on CPU.

    Model A (merged LoRA)          Model B (merged LoRA)
         │                                │
         └──────────┬─────────────────────┘
                    │ W_B - W_A = Δ
                    ▼
              Truncated SVD (rank r)
                    │
                    ▼
         LoRA Adapter A→B (7 MB)

1. Requirements

✅ Works when:

Both models share the same base architecture and base weights (same commit hash)
Both models were trained with LoRA + merge (not full fine-tune)
Tensor names match across both models
At least 4 GB RAM to load 2 tensors at a time

❌ Does NOT work when:

Different architectures (different base models)
Full fine-tune (delta may exceed low-rank assumption)
config.json / tokenizer was modified during fine-tuning
Less than 4 GB RAM

2. Step-by-Step Guide

Step 1: Choose Two Models

MODEL_A = "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled"   # Source
MODEL_B = "lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled"         # Target

Rule: Both models must have identical tensor names and identical config.json.

Step 2: Choose Target Modules

Select only the linear layers you want to extract:

TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj"]  # attention only
# or
TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj", 
                  "gate_proj", "up_proj", "down_proj"]      # attention + MLP

⚠️ Important: Skip 3D tensors (e.g. MoE expert layers [256, 2048, 512]) — they require per-slice SVD which is more complex.

Step 3: Choose LoRA Rank

RANK = 16      # standard: best balance of size vs quality
RANK = 8       # minimal: smaller, faster, higher reconstruction error
RANK = 32      # high quality: 2× larger, ~4% less error

Tip: Run reconstruction error analysis to find the optimal rank for your use case.

Step 4: Run Extraction Script

python3 extract_lora_diff.py \
    --model_a lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled \
    --model_b lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled \
    --output ./my-lora-adapter \
    --rank 16 \
    --target_modules q_proj,k_proj,v_proj,o_proj

Step 5: Use the Adapter

Python (PEFT):

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-35B-A3B")
model = PeftModel.from_pretrained(base, "./my-lora-adapter")
# model now has style B!

llama.cpp (GGUF):

# Convert to GGUF first
python3 llama.cpp/convert_lora_to_gguf.py ./my-lora-adapter

# Run inference
llama-cli -m base-Q6_K.gguf --lora my-lora-adapter.gguf -p "prompt"

3. Mathematical Foundation

Given:  M_A = W_base + Δ_A    (Model A = base + LoRA A)
        M_B = W_base + Δ_B    (Model B = base + LoRA B)

Diff:   D = M_B - M_A = Δ_B - Δ_A    (base cancels, only delta remains)

SVD:    D ≈ U_r · Σ_r · V_r^T        (rank-r approximation)

LoRA:   A = √Σ_r · V_r^T              (lora_A)
        B = U_r · √Σ_r                (lora_B)

Forward: h = W_0·x + B·A·x            (standard LoRA forward)

Why it works:

Both A and B were trained with LoRA rank=r → their difference has rank ≤ 2r
SVD at rank=r can reconstruct the delta almost completely (91-95% energy retention)
No training needed — this is pure mathematical decomposition

4. Examples for Other Models

Llama 3.1 8B — Style Transfer

# Two models fine-tuned from the same Llama-3.1-8B base
MODEL_A = "user/llama3.1-8b-formal-style"      # formal style
MODEL_B = "user/llama3.1-8b-casual-style"       # casual style

python3 extract_lora_diff.py \
    --model_a user/llama3.1-8b-formal-style \
    --model_b user/llama3.1-8b-casual-style \
    --output ./llama-formal-to-casual \
    --rank 16 \
    --target_modules q_proj,k_proj,v_proj,o_proj

Mistral 7B — Domain Adaptation

MODEL_A = "mistralai/Mistral-7B-Instruct-v0.3"           # general
MODEL_B = "user/Mistral-7B-medical-finetuned"            # medical domain

python3 extract_lora_diff.py \
    --model_a mistralai/Mistral-7B-Instruct-v0.3 \
    --model_b user/Mistral-7B-medical-finetuned \
    --output ./mistral-medical-lora \
    --rank 16 \
    --target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj

Qwen2.5 72B — Safety Unlearning

# Extract refusal delta between safe and uncensored versions
MODEL_A = "Qwen/Qwen2.5-72B-Instruct"                   # with safety
MODEL_B = "user/Qwen2.5-72B-uncensored"                 # without safety

python3 extract_lora_diff.py \
    --model_a Qwen/Qwen2.5-72B-Instruct \
    --model_b user/Qwen2.5-72B-uncensored \
    --output ./qwen-safety-removal-lora \
    --rank 16

5. Parameter Reference

Parameter	Default	Description
`--rank`	16	LoRA rank. Higher = larger + better quality. Lower = smaller + faster
`--target_modules`	q,k,v,o_proj	Modules to extract. Add gate/up/down for MLP
`--alpha`	32	LoRA alpha (scaling factor). Typically 2× rank
`--skip_3d`	True	Automatically skip 3D tensors (MoE experts)
`--output_format`	peft	`peft` or `gguf` or `both`

6. Troubleshooting

Problem	Cause	Solution
`KeyError: tensor name mismatch`	Different base models	Use models trained from same base
`CUDA out of memory`	Loading full model	Use tensor-by-tensor mode (default)
`ValueError: non contiguous tensor`	SVD output not contiguous	Add `.contiguous()` before saving
`GGUF conversion failed`	Tensor name mismatch	PEFT uses `.lora_A.default`, GGUF expects `.lora_A.weight` — rename
`Rank too high for tensor`	Tensor dimensions < rank	Reduce rank or skip that tensor

7. Limitations

Attention-only bias: Using only attention layers may miss FFN/MLP-level changes
Low-rank assumption: Works best with LoRA-merged models; full fine-tunes may exceed rank
No quality guarantee: The adapter is a mathematical reconstruction — no guarantee it matches direct training quality
Single-style transfer: Extracts only the difference between 2 styles — for 3+ styles, create multiple adapters

8. Extraction Script

extract_lora_diff.py (193 lines) — production-ready extraction script available in this repo.

9. References & Credit

Technique: UKA (Hermes Agent, Nous Research) & hotdogs
Paper: Weight-Diff SVD Extraction: Zero-Shot LoRA Adapter Synthesis
Code + Adapter: https://huggingface.co/hotdogs/qwen3.6-35b-opus-to-kimi-lora
LoRA paper: Hu et al., 2021 (arXiv:2106.09685)
QLoRA paper: Dettmers et al., 2023 (arXiv:2305.14314)