Upload METHOD.md with huggingface_hub

4bc0977 verified about 1 month ago

9.81 kB

	# Weight-Diff SVD Extraction: Universal Method

	## วิธีสร้าง LoRA Adapter จาก Weight Difference ของสองโมเดล

	เทคนิคนี้ใช้ได้กับ ทุก LLM architecture ที่มี adapter สองตัวเทรนจาก base เดียวกัน
	ไม่ต้องใช้ GPU, ไม่ต้องใช้ training data, ใช้เวลาแค่ 1-3 นาทีบน CPU

	```
	โมเดล A (merged LoRA) โมเดล B (merged LoRA)
	│ │
	└──────────┬─────────────────────┘
	│ W_B - W_A = Δ
	▼
	Truncated SVD (rank r)
	│
	▼
	LoRA Adapter A→B (7 MB)
	```

	---

	## 1. เงื่อนไขที่ใช้ได้ (Requirements)

	✅ ใช้ได้เมื่อ:
	- ทั้งสองโมเดลใช้ base architecture และ base weights เดียวกัน (commit hash เดียวกัน)
	- ทั้งสองโมเดลเทรนด้วย LoRA + merge (ไม่ใช่ full fine-tune)
	- tensor names ตรงกันทั้งสองโมเดล
	- มี RAM พอโหลดทีละ 2 tensors (~2-4 GB)

	❌ ใช้ไม่ได้เมื่อ:
	- สถาปัตยกรรมต่างกัน (คนละ base model)
	- Full fine-tune (delta อาจมี rank สูงเกิน r=16)
	- config.json / tokenizer ถูกเปลี่ยนระหว่าง fine-tune
	- VRAM/RAM น้อยกว่า 4 GB

	---

	## 2. ขั้นตอนทีละสเต็ป

	### Step 1: เลือกสองโมเดล

	```python
	MODEL_A = "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled" # ต้นทาง
	MODEL_B = "lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled" # ปลายทาง
	```

	กฎ: โมเดลทั้งสองต้องมี tensor names เหมือนกัน และ config.json เหมือนกัน

	### Step 2: เลือก target modules

	เลือกเฉพาะ linear layers ที่ต้องการ extract:

	```python
	TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj"] # attention only
	# หรือ
	TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj",
	"gate_proj", "up_proj", "down_proj"] # attention + MLP
	```

	⚠️ สำคัญ: ข้าม 3D tensors (เช่น MoE expert layers `[256, 2048, 512]`) — ต้องใช้ per-slice SVD ซึ่งซับซ้อนกว่า

	### Step 3: เลือก LoRA rank

	```python
	RANK = 16 # standard: ดีสมดุลระหว่างขนาดและคุณภาพ
	RANK = 8 # minimal: เล็กลง เร็วขึ้น แต่ reconstruction error สูงกว่า
	RANK = 32 # high quality: ใหญ่ขึ้น 2x, error น้อยกว่า ~4%
	```

	ทดสอบ: ใช้ reconstruction error test เพื่อเลือก rank ที่เหมาะสม

	### Step 4: รัน extraction script

	```bash
	python3 extract_lora_diff.py \
	--model_a lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled \
	--model_b lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled \
	--output ./my-lora-adapter \
	--rank 16 \
	--target_modules q_proj,k_proj,v_proj,o_proj
	```

	### Step 5: ใช้ adapter

	Python (PEFT):
	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM

	base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-35B-A3B")
	model = PeftModel.from_pretrained(base, "./my-lora-adapter")
	# model ตอนนี้เป็น style B แล้ว!
	```

	llama.cpp (GGUF):
	```bash
	# แปลงเป็น GGUF ก่อน
	python3 llama.cpp/convert_lora_to_gguf.py ./my-lora-adapter

	# ใช้ infer
	llama-cli -m base-Q6_K.gguf --lora my-lora-adapter.gguf -p "prompt"
	```

	---

	## 3. คณิตศาสตร์เบื้องหลัง

	```
	ให้ M_A = W_base + Δ_A (โมเดล A = base + LoRA A)
	M_B = W_base + Δ_B (โมเดล B = base + LoRA B)

	ความต่าง: D = M_B - M_A = Δ_B - Δ_A (base หายไป เหลือแต่ delta)

	SVD: D ≈ U_r · Σ_r · V_r^T (rank-r approximation)

	LoRA: A = √Σ_r · V_r^T (lora_A)
	B = U_r · √Σ_r (lora_B)

	Forward: h = W_0·x + B·A·x (standard LoRA forward)
	```

	ทำไมถึงเวิร์ค:
	- ทั้ง A และ B เทรนด้วย LoRA rank=r → ผลต่างก็มี rank ≤ 2r
	- SVD ที่ rank=r สามารถ reconstruct delta ได้เกือบสมบูรณ์ (91-95% energy)
	- ไม่ต้องใช้ training เพราะเป็นการ "แยกองค์ประกอบ" ทางคณิตศาสตร์ล้วนๆ

	---

	## 4. ตัวอย่าง: ใช้กับโมเดลอื่น

	### Llama 3.1 8B — style transfer

	```bash
	# สมมติมีสองโมเดลที่เทรนจาก Llama-3.1-8B เดียวกัน
	MODEL_A = "user/llama3.1-8b-formal-style" # สไตล์ทางการ
	MODEL_B = "user/llama3.1-8b-casual-style" # สไตล์กันเอง

	python3 extract_lora_diff.py \
	--model_a user/llama3.1-8b-formal-style \
	--model_b user/llama3.1-8b-casual-style \
	--output ./llama-formal-to-casual \
	--rank 16 \
	--target_modules q_proj,k_proj,v_proj,o_proj
	```

	### Mistral 7B — domain adaptation

	```bash
	MODEL_A = "mistralai/Mistral-7B-Instruct-v0.3" # general
	MODEL_B = "user/Mistral-7B-medical-finetuned" # medical domain

	python3 extract_lora_diff.py \
	--model_a mistralai/Mistral-7B-Instruct-v0.3 \
	--model_b user/Mistral-7B-medical-finetuned \
	--output ./mistral-medical-lora \
	--rank 16 \
	--target_modules q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
	```

	### Qwen2.5 72B — safety unlearning

	```bash
	# สกัด refusal delta ระหว่างรุ่นปลอดภัยกับรุ่นดิบ
	MODEL_A = "Qwen/Qwen2.5-72B-Instruct" # มี safety
	MODEL_B = "user/Qwen2.5-72B-uncensored" # ไม่มี safety

	python3 extract_lora_diff.py \
	--model_a Qwen/Qwen2.5-72B-Instruct \
	--model_b user/Qwen2.5-72B-uncensored \
	--output ./qwen-safety-removal-lora \
	--rank 16
	```

	---

	## 5. ปรับแต่ง Parameters

	\| Parameter \| Default \| คำอธิบาย \|
	\|-----------\|---------\|----------\|
	\| `--rank` \| 16 \| LoRA rank. สูง = ใหญ่ + คุณภาพดี. ต่ำ = เล็ก + เร็ว \|
	\| `--target_modules` \| q,k,v,o_proj \| modules ที่จะ extract. เพิ่ม gate/up/down ได้ \|
	\| `--alpha` \| 32 \| LoRA alpha (scaling factor). มักตั้งเป็น 2× rank \|
	\| `--skip_3d` \| True \| ข้าม 3D tensors อัตโนมัติ (MoE experts) \|
	\| `--output_format` \| peft \| `peft` หรือ `gguf` หรือ `both` \|

	---

	## 6. Troubleshooting

	\| ปัญหา \| สาเหตุ \| วิธีแก้ \|
	\|--------\|--------\|---------\|
	\| `KeyError: tensor name mismatch` \| โมเดลคนละ base \| ใช้โมเดลที่เทรนจาก base เดียวกัน \|
	\| `CUDA out of memory` \| โหลดทั้งโมเดล \| ใช้ tensor-by-tensor mode (default) \|
	\| `ValueError: non contiguous tensor` \| SVD output ไม่ contiguous \| `.contiguous()` ก่อน save \|
	\| `GGUF conversion failed` \| tensor names ไม่ตรง \| PEFT ใช้ `.lora_A.default`, GGUF ใช้ `.lora_A.weight` — ต้อง rename \|
	\| `Rank too high for tensor` \| tensor มิติเล็กกว่า rank \| ลด rank หรือข้าม tensor นั้น \|

	---

	## 7. ข้อจำกัด

	1. Attention-only bias: การใช้เฉพาะ attention layers อาจพลาดการเปลี่ยนแปลงใน FFN/MLP layers
	2. Low-rank assumption: ใช้ได้ดีกับ LoRA-merged models แต่ full fine-tune อาจมี delta rank สูงกว่านี้
	3. No quality guarantee: ตัว adapter เป็น mathematical reconstruction — ไม่มีการการันตีว่าคุณภาพเทียบเท่ากับการเทรนตรงๆ
	4. Single-style transfer: สกัดได้เฉพาะความต่างระหว่าง 2 สไตล์ — ถ้าอยากเปลี่ยนระหว่าง 3+ สไตล์ ต้องทำ adapter หลายอัน

	---

	## 8. สคริปต์ Extraction

	```python
	# extract_lora_diff.py — 193 lines
	# ดาวน์โหลดได้จาก HuggingFace repo:
	# https://huggingface.co/hotdogs/qwen3.6-35b-opus-to-kimi-lora
	```

	---

	## 9. อ้างอิง & Credit

	- เทคนิค: UKA (Hermes Agent, Nous Research) & hotdogs
	- Paper: [Weight-Diff SVD Extraction: Zero-Shot LoRA Adapter Synthesis](https://huggingface.co/hotdogs/qwen3.6-35b-opus-to-kimi-lora/blob/main/paper.pdf)
	- Code + Adapter: https://huggingface.co/hotdogs/qwen3.6-35b-opus-to-kimi-lora
	- LoRA paper: Hu et al., 2021 (arXiv:2106.09685)
	- QLoRA paper: Dettmers et al., 2023 (arXiv:2305.14314)