# วิธีการสกัด LoRA แบบ Weight-Diff SVD (ฉบับสากล)
## Universal Weight-Diff SVD LoRA Extraction Method

ผู้สร้าง: **UKA** (Hermes Agent, Nous Research)
วันที่: พฤษภาคม 2026

---

# 📖 แนวคิด (Concept)

**Weight-Diff SVD Extraction** คือวิธีการสร้าง LoRA adapter จากความแตกต่างของน้ำหนัก (weights) ระหว่างโมเดลพื้นฐาน (base model) กับโมเดลที่ผ่านการ fine-tune แล้ว (tuned model) โดยไม่จำเป็นต้องใช้ข้อมูลฝึกสอน (training data) หรือกระบวนการ gradient descent ใดๆ

หลักการนี้เริ่มจากการคำนวณผลต่างของน้ำหนัก:

```
ΔW = W_tuned - W_base
```

จากนั้นใช้ **Singular Value Decomposition (SVD)** แบบ truncated rank-r เพื่อบีบอัด ΔW ให้อยู่ในรูปแบบ LoRA:

```
ΔW ≈ U_r Σ_r V_r^T
B = U_r √Σ_r
A = √Σ_r V_r^T
```

ทำให้ได้ LoRA adapter ที่มีขนาดเล็กกะทัดรัด (88.2 MB สำหรับ Qwen3.6-35B) แทนที่จะต้องเก็บโมเดลเต็มอีกชุด (~70 GB)

---

# ✅ ข้อกำหนดเบื้องต้น (Prerequisites)

## ซอฟต์แวร์ที่ต้องติดตั้ง

```bash
# Python 3.10+ และ PyTorch
pip install torch numpy safetensors

# สำหรับการบันทึกเป็น PEFT format
pip install peft transformers

# สำหรับ SVD computation
pip install scipy  # (ทางเลือก — torch.linalg.svd ก็ใช้ได้)
```

## ฮาร์ดแวร์ขั้นต่ำ

| ทรัพยากร | ขั้นต่ำ | แนะนำ |
|-----------|--------|--------|
| RAM | 16 GB | 32+ GB |
| Disk | พอสำหรับเก็บ base + tuned models | SSD |
| Swap | 8 GB | 16+ GB |
| CPU Cores | 4 | 8+ |

## โมเดลที่ต้องการ

- **Base model** — โมเดลพื้นฐานที่ยังไม่ผ่านการ fine-tune (เช่น `Qwen/Qwen3.6-35B-A3B`)
- **Target model** — โมเดลที่ผ่านการ fine-tune แล้ว (เช่น `llmfan46/Qwen3.6-35B-A3B-uncensored-heretic`)

---

# 🔢 กระบวนการ 7 ขั้นตอน (7-Step Process)

## ขั้นตอนที่ 1: ตรวจสอบสถาปัตยกรรม (Verify Architecture)

ก่อนเริ่ม ต้องยืนยันว่า base model และ target model มี architecture ที่ตรงกันทุกประการ

```python
import json
from safetensors import safe_open

def verify_architecture(base_path, target_path):
    """ตรวจสอบว่า tensor names ตรงกันระหว่าง base และ target"""
    with safe_open(base_path, framework="pt") as f:
        base_keys = set(f.keys())
    with safe_open(target_path, framework="pt") as f:
        target_keys = set(f.keys())
    
    common = base_keys & target_keys
    only_base = base_keys - target_keys
    only_target = target_keys - base_keys
    
    print(f"✅ Tensors ที่ตรงกัน: {len(common)}")
    if only_base:
        print(f"⚠️  มีเฉพาะใน base: {len(only_base)}")
        for k in sorted(only_base)[:10]:
            print(f"   - {k}")
    if only_target:
        print(f"⚠️  มีเฉพาะใน target: {len(only_target)}")
        for k in sorted(only_target)[:10]:
            print(f"   - {k}")
    
    # ตรวจสอบ shapes
    mismatched = []
    for key in sorted(common)[:5]:  # ตัวอย่าง
        with safe_open(base_path, framework="pt") as f:
            base_shape = f.get_tensor(key).shape
        with safe_open(target_path, framework="pt") as f:
            target_shape = f.get_tensor(key).shape
        if base_shape != target_shape:
            mismatched.append((key, base_shape, target_shape))
    
    if mismatched:
        print(f"❌ Shape mismatch: {len(mismatched)} tensors")
        for key, bs, ts in mismatched:
            print(f"   {key}: base={bs} target={ts}")
    else:
        print("✅ Shapes ตรงกันทั้งหมด")
    
    return common, only_base, only_target
```

**สิ่งที่ต้องตรวจสอบ:**
- จำนวน tensor ตรงกัน (หรือใกล้เคียงมาก)
- dtype ตรงกัน (มักเป็น BF16 หรือ FP16)
- Shape ของ tensor ที่ชื่อตรงกันต้องเท่ากัน

---

## ขั้นตอนที่ 2: ดาวน์โหลดโมเดล (Download Models)

```bash
# ดาวน์โหลด base model
huggingface-cli download Qwen/Qwen3.6-35B-A3B \
  --local-dir ./models/base \
  --include "*.safetensors" "*.json"

# ดาวน์โหลด target model
huggingface-cli download llmfan46/Qwen3.6-35B-A3B-uncensored-heretic \
  --local-dir ./models/target \
  --include "*.safetensors" "*.json"
```

สำหรับโมเดลขนาดใหญ่ (35B+ parameters) ไฟล์ safetensors มักถูกแบ่งเป็น shards หลายไฟล์ เช่น:
```
model-00001-of-00008.safetensors  (ขนาด ~5-10 GB ต่อ shard)
model-00002-of-00008.safetensors
...
model-00008-of-00008.safetensors  (shard สุดท้ายอาจถึง 47 GB)
```

**หมายเหตุ:** shard ขนาด 47 GB เป็นที่มาของปัญหาหลักในการ extraction (ดูหัวข้อ Pitfalls)

---

## ขั้นตอนที่ 3: ค้นหา Tensors (Discover Tensors)

สร้าง index ของ tensor ทั้งหมดในแต่ละ shard:

```python
import json
import struct
from pathlib import Path

def parse_safetensors_index(shard_path):
    """
    อ่าน header ของ safetensors file เพื่อสร้าง index
    โดยไม่ต้องโหลด tensor ทั้งหมดเข้า RAM
    """
    index = {}
    with open(shard_path, "rb") as f:
        # อ่าน 8 bytes แรก — ขนาดของ JSON header
        header_size_bytes = f.read(8)
        header_size = struct.unpack("<Q", header_size_bytes)[0]
        
        # อ่าน JSON header
        header_json = f.read(header_size)
        metadata = json.loads(header_json.decode("utf-8"))
        
        # เริ่ม offset ของ tensor data
        # = 8 (header size) + header_size (JSON)
        data_start = 8 + header_size
        
        for tensor_name, tensor_info in metadata.items():
            if tensor_name == "__metadata__":
                continue
            index[tensor_name] = {
                "dtype": tensor_info["dtype"],
                "shape": tensor_info["shape"],
                "offset": data_start + tensor_info["data_offsets"][0],
                "size_bytes": tensor_info["data_offsets"][1] - tensor_info["data_offsets"][0],
            }
    
    print(f"📋 {Path(shard_path).name}: {len(index)} tensors")
    return index

# ตัวอย่างการใช้งาน
shard_index = parse_safetensors_index("models/base/model-00001-of-00008.safetensors")
for name, info in list(shard_index.items())[:3]:
    print(f"  {name}: shape={info['shape']}, dtype={info['dtype']}, size={info['size_bytes']/1e6:.1f} MB")
```

Output ตัวอย่าง:
```
📋 model-00001-of-00008.safetensors: 87 tensors
  model.embed_tokens.weight: shape=[xxx, 2048], dtype=BF16, size=xxx MB
  model.layers.0.input_layernorm.weight: shape=[2048], dtype=BF16, size=0.0 MB
  model.layers.0.self_attn.q_proj.weight: shape=[2048, 2048], dtype=BF16, size=8.0 MB
```

---

## ขั้นตอนที่ 4: กรอง Tensors ขนาดใหญ่ (Filter Large Tensors)

กรณีที่สำคัญที่สุดคือ **MoE Expert Tensors** — tensor 3 มิติที่มีขนาดใหญ่มาก

```python
import re

def should_extract_tensor(tensor_name, tensor_info, max_elements=50_000_000):
    """
    ตัดสินใจว่า tensor นี้ควรถูก extract หรือไม่
    
    เกณฑ์การกรอง:
    1. tensor ที่มีคำว่า 'expert' ในชื่อ — ข้าม (MoE expert)
    2. tensor ที่มีจำนวน elements เกิน max_elements — ข้าม
    3. tensor ที่เป็น 1D ขนาดเล็ก (เช่น norms, biases) — รวมเสมอ
    """
    # คำนวณจำนวน elements
    shape = tensor_info["shape"]
    num_elements = 1
    for dim in shape:
        num_elements *= dim
    
    # ข้าม MoE expert tensors
    if "expert" in tensor_name.lower() and "shared_expert" not in tensor_name.lower():
        return False, "MoE expert (excluded)"
    
    # ข้าม tensors ที่มีขนาดใหญ่เกินไป
    if num_elements > max_elements:
        return False, f"Too large ({num_elements:,} elements > {max_elements:,})"
    
    # รวม tensor ขนาดเล็กเสมอ
    if num_elements < 1_000_000:
        return True, "Small tensor (always included)"
    
    return True, "OK"

# ตัวอย่างการใช้งาน
filtered_tensors = {}
excluded_tensors = []

for name, info in shard_index.items():
    include, reason = should_extract_tensor(name, info)
    if include:
        filtered_tensors[name] = info
    else:
        excluded_tensors.append((name, reason))

print(f"✅ Included: {len(filtered_tensors)} tensors")
print(f"❌ Excluded: {len(excluded_tensors)} tensors")
for name, reason in excluded_tensors[:5]:
    print(f"   - {name}: {reason}")
```

**เกณฑ์การกรองที่ใช้จริงกับ Qwen3.6-35B:**
- **581 tensors** ถูก extract สำเร็จ (95.1%)
- **30 tensors** ถูกข้าม — ทั้งหมดเป็น MoE expert tensors (`mlp.experts.*`)
- Expert tensors มี shape เช่น `[128, 2048, 256]` — เมื่อ reshape จะได้ `[262144, 256]` ซึ่งใหญ่เกินสำหรับ SVD

---

## ขั้นตอนที่ 5: อ่าน Binary Headers แบบ Manual (Parse Binary Headers)

นี่คือเทคนิคสำคัญเมื่อ `safetensors.torch.load_file(mmap=True)` ไม่ทำงานกับ shard ขนาดใหญ่ (>47 GB):

```python
import struct
import numpy as np
import torch

DTYPE_MAP = {
    "F32": (4, np.float32),
    "F16": (2, np.float16),
    "BF16": (2, np.uint16),  # BF16 เก็บเป็น uint16 แล้วแปลงทีหลัง
    "I64": (8, np.int64),
    "I32": (4, np.int32),
    "I8":  (1, np.int8),
    "BOOL": (1, np.bool_),
}

def manual_safetensors_reader(shard_path):
    """
    อ่าน safetensors ทีละ tensor แบบ manual
    ใช้ seek() + read() — ไม่ต้อง mmap ทั้งไฟล์
    
    Yields:
        (tensor_name, torch.Tensor)
    """
    with open(shard_path, "rb") as f:
        # Step 1: อ่าน header size (8 bytes, little-endian uint64)
        header_size_raw = f.read(8)
        if len(header_size_raw) < 8:
            raise ValueError(f"File too small: {shard_path}")
        header_size = struct.unpack("<Q", header_size_raw)[0]
        
        # Step 2: อ่าน JSON metadata
        header_json = f.read(header_size)
        metadata = json.loads(header_json.decode("utf-8"))
        
        # Data section เริ่มที่ offset 8 + header_size
        data_base = 8 + header_size
        
        # Step 3: สร้างรายการ tensors ที่ sort ตาม offset
        tensor_list = []
        for name, info in metadata.items():
            if name == "__metadata__":
                continue
            tensor_list.append({
                "name": name,
                "dtype": info["dtype"],
                "shape": info["shape"],
                "offset": data_base + info["data_offsets"][0],
                "size": info["data_offsets"][1] - info["data_offsets"][0],
            })
        
        # Sort ตาม offset เพื่อให้ seek() มีประสิทธิภาพ
        tensor_list.sort(key=lambda x: x["offset"])
        
        # Step 4: อ่านทีละ tensor
        for tinfo in tensor_list:
            f.seek(tinfo["offset"])
            raw_data = f.read(tinfo["size"])
            
            elem_size, np_dtype = DTYPE_MAP[tinfo["dtype"]]
            arr = np.frombuffer(raw_data, dtype=np_dtype)
            tensor = torch.from_numpy(arr.copy()).reshape(tinfo["shape"])
            
            # แปลง BF16 (uint16) เป็น bfloat16 จริง
            if tinfo["dtype"] == "BF16":
                tensor = tensor.view(torch.bfloat16)
            
            yield tinfo["name"], tensor
            
            # ล้างหน่วยความจำทันที
            del tensor, arr, raw_data

# ตัวอย่างการใช้งาน
for name, tensor in manual_safetensors_reader("models/base/model-00007-of-00008.safetensors"):
    print(f"  Read: {name} shape={tensor.shape}")
    break  # ทดลองแค่ tensor แรก
```

**ทำไมถึงต้องใช้วิธีนี้:**
- ไฟล์ shard ขนาด 47 GB ไม่สามารถ mmap บน RAM 23 GB ได้
- `torch.load()` / `safe_open()` ทั้งไฟล์ต้องใช้ RAM >= ขนาดไฟล์
- วิธี manual seek+read ใช้ RAM เพียงแค่ขนาดของ tensor ที่ใหญ่ที่สุด (~1.2 GB)

---

## ขั้นตอนที่ 6: คำนวณ Delta + SVD (Compute Delta + SVD)

```python
import gc
import torch

def compute_delta_and_svd(base_tensor, target_tensor, rank=16):
    """
    คำนวณ ΔW = W_target - W_base
    แล้วใช้ truncated SVD เพื่อแยกเป็น B และ A
    
    Returns:
        B: torch.Tensor shape (d, rank)
        A: torch.Tensor shape (rank, k)
    """
    # Step 1: Delta (ผลต่าง)
    # ทำใน BF16 เพื่อประหยัด RAM
    base_bf16 = base_tensor.to(torch.bfloat16)
    target_bf16 = target_tensor.to(torch.bfloat16)
    delta = target_bf16 - base_bf16
    
    # Step 2: SVD ต้องการ FP32
    delta_fp32 = delta.to(torch.float32)
    del base_bf16, target_bf16, delta
    gc.collect()
    
    # Step 3: Truncated SVD
    try:
        U, S, Vh = torch.linalg.svd(delta_fp32, full_matrices=False)
    except RuntimeError as e:
        print(f"  ❌ SVD failed: {e}")
        del delta_fp32
        gc.collect()
        return None, None
    
    # Step 4: Truncate ที่ rank=r
    U_r = U[:, :rank]          # (d, r)
    S_r = S[:rank]             # (r,)
    Vh_r = Vh[:rank, :]        # (r, k)
    
    # Step 5: Split singular values แบบ sqrt
    S_sqrt = torch.sqrt(S_r)   # (r,)
    B = U_r * S_sqrt.unsqueeze(0)          # (d, r)
    A = S_sqrt.unsqueeze(1) * Vh_r         # (r, k)
    
    # Cleanup
    del U, S, Vh, delta_fp32, U_r, S_r, Vh_r, S_sqrt
    gc.collect()
    
    return B.to(torch.bfloat16), A.to(torch.bfloat16)


def extract_layer(base_shard, target_shard, tensor_name, rank=16):
    """
    อ่าน tensor จากทั้งสอง shard คำนวณ delta+SVD
    """
    # อ่าน base tensor
    base_iter = manual_safetensors_reader(base_shard)
    target_iter = manual_safetensors_reader(target_shard)
    
    # สร้าง dict สำหรับการ lookup
    base_dict = {name: t for name, t in base_iter if name == tensor_name}
    target_dict = {name: t for name, t in target_iter if name == tensor_name}
    
    if tensor_name not in base_dict or tensor_name not in target_dict:
        print(f"  ⚠️  Tensor '{tensor_name}' not found in both models")
        return None, None
    
    B, A = compute_delta_and_svd(
        base_dict[tensor_name],
        target_dict[tensor_name],
        rank=rank
    )
    
    # ตรวจสอบว่า SVD สำเร็จ
    if B is None:
        print(f"  ❌ SVD failed for {tensor_name}")
        return None, None
    
    return B, A

# ตัวอย่างการใช้งาน
B, A = extract_layer(
    "models/base/model-00001-of-00008.safetensors",
    "models/target/model-00001-of-00008.safetensors",
    "model.layers.0.self_attn.q_proj.weight",
    rank=16
)
if B is not None:
    print(f"✅ q_proj: B.shape={B.shape}, A.shape={A.shape}")
```

**การเลือกค่า rank:**
- `rank=8` — adapter เล็กมาก (~44 MB) อาจสูญเสีย fidelity
- `rank=16` — สมดุลระหว่างขนาดและคุณภาพ (~88 MB) ✅ (ใช้จริง)
- `rank=32` — คุณภาพสูงขึ้น แต่ขนาดใหญ่ขึ้น (~176 MB)
- `rank=64` — ใกล้เคียง full delta มากที่สุด (~352 MB)

---

## ขั้นตอนที่ 7: บันทึกเป็น PEFT Adapter (Save PEFT Adapter)

```python
import json
from safetensors.torch import save_file

def save_peft_adapter(lora_weights, output_dir, base_model_name, rank=16):
    """
    บันทึก LoRA weights เป็น HuggingFace PEFT format
    
    Args:
        lora_weights: dict {tensor_name: (B, A)}
        output_dir: path ที่จะบันทึก
        base_model_name: ชื่อ base model (เช่น "Qwen/Qwen3.6-35B-A3B")
        rank: LoRA rank
    """
    import os
    os.makedirs(output_dir, exist_ok=True)
    
    # Step 1: แปลงชื่อ tensor เป็น PEFT naming convention
    safetensors_weights = {}
    for tensor_name, (B, A) in lora_weights.items():
        # PEFT ใช้ชื่อ: base_model.model.layers.X.self_attn.q_proj.lora_A.weight
        # เปลี่ยน "." เป็นชื่อที่ PEFT เข้าใจ
        peft_name_B = f"base_model.model.{tensor_name}.lora_B.weight"
        peft_name_A = f"base_model.model.{tensor_name}.lora_A.weight"
        safetensors_weights[peft_name_B] = B
        safetensors_weights[peft_name_A] = A
    
    # Step 2: บันทึก safetensors
    output_path = os.path.join(output_dir, "adapter_model.safetensors")
    save_file(safetensors_weights, output_path)
    print(f"✅ Saved adapter weights: {output_path}")
    
    # Step 3: สร้าง adapter_config.json
    target_modules = set()
    for name in lora_weights.keys():
        # สกัดชื่อ module (เช่น "q_proj", "v_proj")
        parts = name.split(".")
        for part in parts:
            if part.endswith("_proj") or "layernorm" in part or "norm" in part:
                target_modules.add(part)
    
    config = {
        "base_model_name_or_path": base_model_name,
        "peft_type": "LORA",
        "task_type": "CAUSAL_LM",
        "r": rank,
        "lora_alpha": rank * 2,  # alpha = 2 * rank (ค่าเริ่มต้น)
        "lora_dropout": 0.0,
        "target_modules": sorted(list(target_modules)),
        "bias": "none",
        "fan_in_fan_out": False,
    }
    
    config_path = os.path.join(output_dir, "adapter_config.json")
    with open(config_path, "w") as f:
        json.dump(config, f, indent=2, ensure_ascii=False)
    print(f"✅ Saved adapter config: {config_path}")
    
    # Step 4: บันทึก extraction stats
    stats = {
        "method": "weight-diff-svd",
        "base": base_model_name,
        "rank": rank,
        "tensors_extracted": len(lora_weights),
        "total_params": sum(
            B.numel() + A.numel() for B, A in lora_weights.values()
        ),
        "credit": "UKA (Hermes Agent, Nous Research)"
    }
    
    stats_path = os.path.join(output_dir, "extraction_stats.json")
    with open(stats_path, "w") as f:
        json.dump(stats, f, indent=2, ensure_ascii=False)
    print(f"✅ Saved extraction stats: {stats_path}")
    
    # แสดงสรุป
    size_mb = os.path.getsize(output_path) / (1024 * 1024)
    print(f"\n📊 Extraction Summary:")
    print(f"   Method:      Weight-Diff SVD")
    print(f"   Rank:        {rank}")
    print(f"   Tensors:     {len(lora_weights)}")
    print(f"   Parameters:  {stats['total_params']:,}")
    print(f"   Size:        {size_mb:.1f} MB")

# ตัวอย่างการใช้งาน
save_peft_adapter(
    extracted_weights,
    output_dir="./qwen-heretic-uncensored-lora",
    base_model_name="Qwen/Qwen3.6-35B-A3B",
    rank=16
)
```

---

# ⚠️ ปัญหาที่พบบ่อยและวิธีแก้ไข (Pitfalls)

## Pitfall 1: mmap OOM บน Shard ขนาดใหญ่

**อาการ:**
```
OSError: [Errno 12] Cannot allocate memory
# หรือ
RuntimeError: mmap failed: Cannot allocate memory
```

เมื่อพยายามใช้ `safetensors.safe_open(path, mmap=True)` หรือ `torch.load(path, mmap=True)` บนไฟล์ shard ขนาด 47 GB บนเครื่องที่มี RAM เพียง 23 GB

**สาเหตุ:**
- mmap ต้องการ virtual address space >= ขนาดไฟล์ (47 GB)
- Docker container มี memory cgroup limit ที่ 23 GB — mmap ต้องใช้ physical RAM สำหรับ page faults
- Docker overlayfs มีข้อจำกัดของตัวเองที่ซ้อนทับกับ cgroup limit

**วิธีแก้ไข (ใช้จริงในโปรเจกต์นี้):**
ใช้ **manual binary parsing** (ขั้นตอนที่ 5) — อ่าน safetensors ทีละ tensor ด้วย `seek()` + `read()` แทน mmap

```python
# ❌ วิธีที่ใช้ไม่ได้:
from safetensors import safe_open
with safe_open("model-00007-of-00008.safetensors", framework="pt") as f:
    tensor = f.get_tensor("model.layers.0.self_attn.q_proj.weight")
# -> mmap fails because file is 47 GB

# ✅ วิธีที่ใช้ได้:
for name, tensor in manual_safetensors_reader("model-00007-of-00008.safetensors"):
    # process tensor here
    pass
# -> peak RAM = ขนาด tensor ที่ใหญ่ที่สุด (~1.2 GB)
```

**ทางเลือกอื่น (ถ้ามี RAM มากพอ):**
- เพิ่ม Docker memory limit: `docker run --memory=64g ...`
- ใช้ `--shm-size=8g` เพื่อเพิ่ม shared memory สำหรับ mmap
- ใช้เครื่องที่มี RAM >= 64 GB

---

## Pitfall 2: Docker Swap (OverlayFS Limitation)

**อาการ:**
```
swapon: /swapfile: swapon failed: Invalid argument
# หรือ
fallocate: /swapfile: fallocate failed: Operation not supported
```

**สาเหตุ:**
Docker ใช้ overlayfs เป็น storage driver เริ่มต้น overlayfs **ไม่รองรับ** `bmap` operation ซึ่งจำเป็นสำหรับ Linux kernel ในการ map swap pages ไปยัง disk blocks ทำให้ไม่สามารถสร้าง swap file ภายใน container ได้

**วิธีแก้ไข (Workaround):**
ไม่สามารถเพิ่ม swap ใน container ได้โดยตรง ต้องจัดการหน่วยความจำให้ดีแทน:

```python
# Memory management strategies
import gc
import torch

def memory_efficient_pipeline():
    strategies = [
        "1. Immediate deallocation หลังใช้ tensor เสร็จ",
        "2. ใช้ FP32 เฉพาะตอน SVD computation เท่านั้น",
        "3. Process tensors เรียงจากเล็กไปใหญ่",
        "4. Periodic gc.collect()",
        "5. ใช้ torch.bfloat16 สำหรับทุกขั้นตอนยกเว้น SVD",
    ]
    
    for s in strategies:
        print(f"  📌 {s}")

# Pattern สำหรับ memory-efficient processing
def process_tensor_safely(fn, *args):
    """Wrapper ที่ทำ gc.collect() ก่อนและหลัง"""
    gc.collect()
    result = fn(*args)
    gc.collect()
    return result
```

**ถ้าจำเป็นต้องมี swap จริงๆ (ต้องการ Docker host access):**
```bash
# บน Docker HOST (ไม่ใช่ใน container):
sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# ถ้าใช้ ext4 บน host:
sudo dd if=/dev/zero of=/swapfile bs=1G count=16
sudo mkswap /swapfile
sudo swapon /swapfile
```

---

## Pitfall 3: MoE Expert Tensors

**อาการ:**
```
RuntimeError: [enforce fail at ...] Cannot allocate memory
# ขณะทำ SVD บน tensor ที่มี shape เช่น [248320, 2048]
```

**สาเหตุ:**
- Qwen3.6-35B-A3B เป็น **Mixture-of-Experts (MoE)** สถาปัตยกรรมที่มี 128 experts
- Expert tensors มี shape `[n_experts, d_hidden, d_intermediate]` เช่น `[128, 2048, 256]`
- เมื่อ reshape/stack เพื่อทำ SVD อาจกลายเป็น `[248320, 2048]` ซึ่งใช้ RAM ~2 GB สำหรับ FP32 matrix + ~6-8 GB workspace สำหรับ SVD computation
- 128 experts × 28 layers = 3,584 expert matrices — ใหญ่เกินไป

**วิธีแก้ไข:**
**ข้าม MoE expert tensors ทั้งหมด** — ตามที่ทำในโปรเจกต์นี้

```python
EXPERT_PATTERNS = [
    "mlp.experts.",
    ".experts.",
]

def is_expert_tensor(tensor_name):
    """ตรวจสอบว่า tensor นี้เป็น MoE expert tensor หรือไม่"""
    for pattern in EXPERT_PATTERNS:
        if pattern in tensor_name:
            return True
    return False

# การกรอง
for tensor_name in all_tensor_names:
    if is_expert_tensor(tensor_name):
        continue  # ข้าม MoE expert
    # process tensor...
```

**ทำไมถึงทำได้:**
- งานวิจัยแสดงให้เห็นว่า behavioral modifications (เช่น uncensored, refusal removal) ถูก encode ใน attention mechanisms และ layer norms เป็นหลัก — ไม่ใช่ใน expert FFN layers
- การทดสอบเชิงคุณภาพยืนยันว่า adapter ที่ extract จาก attention + norms เท่านั้น (581/611 tensors, 95.1%) ยังคงรักษาพฤติกรรม uncensored ได้สมบูรณ์

**ทางเลือก (ถ้ามี GPU/TPU):**
```python
# บน GPU ที่มี VRAM เพียงพอ
device = torch.device("cuda")
delta_gpu = delta.to(device)
U, S, Vh = torch.linalg.svd(delta_gpu, full_matrices=False)
# GPU SVD เร็วกว่าและใช้ VRAM แทน RAM
```

---

# 📂 ไฟล์อ้างอิงใน Repository

| ไฟล์ | คำอธิบาย |
|------|----------|
| `extraction_stats.json` | สถิติการ extraction — จำนวน tensors, parameters, rank, method |
| `AGENT_GUIDE.md` | คำแนะนำสำหรับการใช้งาน Hermes Agent ในการ extract |
| `paper.pdf` | เอกสารวิชาการฉบับเต็ม (IEEE format) — Weight-Diff SVD Extraction |
| `adapter_model.safetensors` | LoRA weights ที่ extract แล้ว (88.2 MB) |
| `adapter_config.json` | PEFT configuration สำหรับ adapter |
| `figures/fig1_delta_per_layer.png` | กราฟ delta magnitude ราย layer |
| `figures/fig2_rank_vs_error.png` | กราฟ reconstruction error vs rank |
| `figures/fig3_pipeline.png` | แผนภาพ pipeline ทั้งหมด |
| `figures/fig4_layer_heatmap.png` | Heatmap แสดงการกระจายของ delta |
| `figures/layer_stats.json` | สถิติ SVD norm ราย layer (ใช้สร้าง figures) |

---

# 🧪 ตัวอย่างการรันแบบ End-to-End

```bash
#!/bin/bash
# run_extraction.sh — สคริปต์รัน extraction ทั้งหมด

set -e

BASE_MODEL="Qwen/Qwen3.6-35B-A3B"
TARGET_MODEL="llmfan46/Qwen3.6-35B-A3B-uncensored-heretic"
RANK=16
OUTPUT_DIR="./output-lora"

echo "========================================="
echo " Weight-Diff SVD LoRA Extraction"
echo " Base:    $BASE_MODEL"
echo " Target:  $TARGET_MODEL"
echo " Rank:    $RANK"
echo " Output:  $OUTPUT_DIR"
echo "========================================="

# Step 1: Download
echo "[1/7] Downloading models..."
huggingface-cli download $BASE_MODEL --local-dir ./models/base --include "*.safetensors"
huggingface-cli download $TARGET_MODEL --local-dir ./models/target --include "*.safetensors"

# Step 2: Verify
echo "[2/7] Verifying architecture..."
python -c "
from extraction import verify_architecture
verify_architecture('models/base/model-00001-of-00008.safetensors',
                    'models/target/model-00001-of-00008.safetensors')
"

# Steps 3-6: Full extraction
echo "[3-6/7] Running extraction pipeline..."
python extract.py \
    --base ./models/base \
    --target ./models/target \
    --rank $RANK \
    --output $OUTPUT_DIR \
    --filter-experts \
    --manual-parse

# Step 7: Validation
echo "[7/7] Validating adapter..."
python -c "
from safetensors import safe_open
import json

# Check adapter exists
with open('$OUTPUT_DIR/adapter_config.json') as f:
    config = json.load(f)
print(f'Rank: {config[\"r\"]}')
print(f'Target modules: {config[\"target_modules\"]}')

# Check weights
with safe_open('$OUTPUT_DIR/adapter_model.safetensors', framework='pt') as f:
    keys = list(f.keys())
print(f'LoRA layers: {len(keys)}')
print(f'File size: ', end='')
import os
size = os.path.getsize('$OUTPUT_DIR/adapter_model.safetensors')
print(f'{size / 1024 / 1024:.1f} MB')
"

echo "✅ Extraction complete!"
echo "   Adapter: $OUTPUT_DIR/adapter_model.safetensors"
echo "   Config:  $OUTPUT_DIR/adapter_config.json"
```

---

# 🧠 เครดิต (Credit)

| บุคคล/องค์กร | บทบาท |
|--------------|-------|
| **UKA** | ผู้สร้าง adapter นี้และเอกสาร METHOD นี้ ผ่าน Hermes Agent |
| **Hermes Agent** | AI Agent จาก Nous Research ที่ดำเนินการ extraction แบบอัตโนมัติ |
| **Nous Research** | ผู้พัฒนา Hermes Agent และโครงสร้างพื้นฐาน |
| **Qwen Team** | ผู้สร้างโมเดล Qwen3.6-35B-A3B (base model) |
| **llmfan46** | ผู้สร้างโมเดล uncensored ต้นทาง |

---

> เอกสารนี้เป็นส่วนหนึ่งของโปรเจกต์ **heretic-uncensored-lora**
> สร้างโดย UKA ผ่าน Hermes Agent — Nous Research
> พฤษภาคม 2026
>
> สำหรับรายละเอียดเชิงวิชาการเพิ่มเติม อ่านได้จาก `paper.pdf` ใน repository นี้