Text Generation
PEFT
Safetensors
GGUF
English
Thai
lora
qwen3.5-moe
qwen3.6
reasoning
kimi-k2.6
claude-opus
distillation
weight-diff
svd
Instructions to use hotdogs/qwen3.6-35b-opus-to-kimi-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use hotdogs/qwen3.6-35b-opus-to-kimi-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled") model = PeftModel.from_pretrained(base_model, "hotdogs/qwen3.6-35b-opus-to-kimi-lora") - Notebooks
- Google Colab
- Kaggle
Weight-Diff SVD Extraction: Universal Method
How to Create LoRA Adapters from Weight Differences Between Two Models
This technique works for any LLM architecture with two adapters trained from the same base model. No GPU required, no training data needed, runs in 1-3 minutes on CPU.
Model A (merged LoRA) Model B (merged LoRA)
β β
ββββββββββββ¬ββββββββββββββββββββββ
β W_B - W_A = Ξ
βΌ
Truncated SVD (rank r)
β
βΌ
LoRA Adapter AβB (7 MB)
1. Requirements
β Works when:
- Both models share the same base architecture and base weights (same commit hash)
- Both models were trained with LoRA + merge (not full fine-tune)
- Tensor names match across both models
- At least 4 GB RAM to load 2 tensors at a time
β Does NOT work when:
- Different architectures (different base models)
- Full fine-tune (delta may exceed low-rank assumption)
- config.json / tokenizer was modified during fine-tuning
- Less than 4 GB RAM
2. Step-by-Step Guide
Step 1: Choose Two Models
MODEL_A = "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled" # Source
MODEL_B = "lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled" # Target
Rule: Both models must have identical tensor names and identical config.json.
Step 2: Choose Target Modules
Select only the linear layers you want to extract:
TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj"] # attention only
# or
TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"] # attention + MLP
β οΈ Important: Skip 3D tensors (e.g. MoE expert layers [256, 2048, 512]) β they require per-slice SVD which is more complex.
Step 3: Choose LoRA Rank
RANK = 16 # standard: best balance of size vs quality
RANK = 8 # minimal: smaller, faster, higher reconstruction error
RANK = 32 # high quality: 2Γ larger, ~4% less error
Tip: Run reconstruction error analysis to find the optimal rank for your use case.
Step 4: Run Extraction Script
python3 extract_lora_diff.py \
--model_a lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled \
--model_b lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled \
--output ./my-lora-adapter \
--rank 16 \
--target_modules q_proj,k_proj,v_proj,o_proj
Step 5: Use the Adapter
Python (PEFT):
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-35B-A3B")
model = PeftModel.from_pretrained(base, "./my-lora-adapter")
# model now has style B!
llama.cpp (GGUF):
# Convert to GGUF first
python3 llama.cpp/convert_lora_to_gguf.py ./my-lora-adapter
# Run inference
llama-cli -m base-Q6_K.gguf --lora my-lora-adapter.gguf -p "prompt"
3. Mathematical Foundation
Given: M_A = W_base + Ξ_A (Model A = base + LoRA A)
M_B = W_base + Ξ_B (Model B = base + LoRA B)
Diff: D = M_B - M_A = Ξ_B - Ξ_A (base cancels, only delta remains)
SVD: D β U_r Β· Ξ£_r Β· V_r^T (rank-r approximation)
LoRA: A = βΞ£_r Β· V_r^T (lora_A)
B = U_r Β· βΞ£_r (lora_B)
Forward: h = W_0Β·x + BΒ·AΒ·x (standard LoRA forward)
Why it works:
- Both A and B were trained with LoRA rank=r β their difference has rank β€ 2r
- SVD at rank=r can reconstruct the delta almost completely (91-95% energy retention)
- No training needed β this is pure mathematical decomposition
4. Examples for Other Models
Llama 3.1 8B β Style Transfer
# Two models fine-tuned from the same Llama-3.1-8B base
MODEL_A = "user/llama3.1-8b-formal-style" # formal style
MODEL_B = "user/llama3.1-8b-casual-style" # casual style
python3 extract_lora_diff.py \
--model_a user/llama3.1-8b-formal-style \
--model_b user/llama3.1-8b-casual-style \
--output ./llama-formal-to-casual \
--rank 16 \
--target_modules q_proj,k_proj,v_proj,o_proj
Mistral 7B β Domain Adaptation
MODEL_A = "mistralai/Mistral-7B-Instruct-v0.3" # general
MODEL_B = "user/Mistral-7B-medical-finetuned" # medical domain
python3 extract_lora_diff.py \
--model_a mistralai/Mistral-7B-Instruct-v0.3 \
--model_b user/Mistral-7B-medical-finetuned \
--output ./mistral-medical-lora \
--rank 16 \
--target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
Qwen2.5 72B β Safety Unlearning
# Extract refusal delta between safe and uncensored versions
MODEL_A = "Qwen/Qwen2.5-72B-Instruct" # with safety
MODEL_B = "user/Qwen2.5-72B-uncensored" # without safety
python3 extract_lora_diff.py \
--model_a Qwen/Qwen2.5-72B-Instruct \
--model_b user/Qwen2.5-72B-uncensored \
--output ./qwen-safety-removal-lora \
--rank 16
5. Parameter Reference
| Parameter | Default | Description |
|---|---|---|
--rank |
16 | LoRA rank. Higher = larger + better quality. Lower = smaller + faster |
--target_modules |
q,k,v,o_proj | Modules to extract. Add gate/up/down for MLP |
--alpha |
32 | LoRA alpha (scaling factor). Typically 2Γ rank |
--skip_3d |
True | Automatically skip 3D tensors (MoE experts) |
--output_format |
peft | peft or gguf or both |
6. Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
KeyError: tensor name mismatch |
Different base models | Use models trained from same base |
CUDA out of memory |
Loading full model | Use tensor-by-tensor mode (default) |
ValueError: non contiguous tensor |
SVD output not contiguous | Add .contiguous() before saving |
GGUF conversion failed |
Tensor name mismatch | PEFT uses .lora_A.default, GGUF expects .lora_A.weight β rename |
Rank too high for tensor |
Tensor dimensions < rank | Reduce rank or skip that tensor |
7. Limitations
- Attention-only bias: Using only attention layers may miss FFN/MLP-level changes
- Low-rank assumption: Works best with LoRA-merged models; full fine-tunes may exceed rank
- No quality guarantee: The adapter is a mathematical reconstruction β no guarantee it matches direct training quality
- Single-style transfer: Extracts only the difference between 2 styles β for 3+ styles, create multiple adapters
8. Extraction Script
extract_lora_diff.py (193 lines) β production-ready extraction script available in this repo.
9. References & Credit
- Technique: UKA (Hermes Agent, Nous Research) & hotdogs
- Paper: Weight-Diff SVD Extraction: Zero-Shot LoRA Adapter Synthesis
- Code + Adapter: https://huggingface.co/hotdogs/qwen3.6-35b-opus-to-kimi-lora
- LoRA paper: Hu et al., 2021 (arXiv:2106.09685)
- QLoRA paper: Dettmers et al., 2023 (arXiv:2305.14314)