# v30b delta-distilled single-adapter results

v30b is the first post-v9 **single standard PEFT LoRA adapter** in this experiment series to improve direct VerilogEval pass count while preserving small external benchmark performance.

```text
adapter_v30b_delta_distilled_from_v9
```

## Motivation

The v29 runtime selector reached 84/156 on VerilogEval by generating candidates from v9 plus multiple specialists and selecting with compile+simulation. That is a strong practical pipeline, but it is not a single deployable adapter.

v30 attempted broad distillation of v29 into one adapter and matched v9 at 67/156. v30b instead focused training pressure on the delta:

```text
v9 fails + v29 selector passes
```

## Dataset

Builder:

```text
scripts/build_v30b_delta_distill_dataset.py
```

Rows:

```text
total rows: 3358
unique delta wins: 17
unique selector pass rows: 84
unique v9 pass retention rows: 67
unique clean/manual verified rows: 382
unique external functional rows: 18
unique synthetic verified rows: 316
```

Repeat weights:

```text
delta wins:          80x
all selector passes:  4x
v9 pass retention:    6x
clean verified:       2x
external functional: 10x
synthetic:            1x
```

The 17 delta-win problems were:

```text
Prob018_mux256to1
Prob028_m2014_q4a
Prob030_popcount255
Prob042_vector4
Prob045_edgedetect2
Prob050_kmap1
Prob054_edgedetect
Prob056_ece241_2013_q7
Prob060_m2014_q4k
Prob063_review2015_shiftcount
Prob075_counter_2bc
Prob088_ece241_2014_q5b
Prob097_mux9to1v
Prob098_circuit7
Prob121_2014_q3bfsm
Prob130_circuit5
Prob138_2012_q2fsm
```

Training used `--drop-overlength`; no rows were truncated.

## Training

Launcher:

```text
scripts/run_v30b_delta_distill.sh
```

Settings:

```text
base model: Qwen/Qwen2.5-Coder-7B-Instruct
base adapter: adapter_v9_auto_distilled_direct
output: adapter_v30b_delta_distilled_from_v9
epochs: 0.75
learning rate: 7e-7
max length: 2048
LoRA r: 16
LoRA alpha: 32
batch size: 1
grad accum: 4
warmup steps: 40
```

## Results

### VerilogEval v2 direct

| Adapter/system | Compile | Pass |
|---|---:|---:|
| v9 single adapter | — | 67/156 |
| v29 multi-adapter selector | 150/156 | 84/156 |
| v30 unified single adapter | 134/156 | 67/156 |
| **v30b delta-distilled single adapter** | **141/156** | **71/156** |

v30b improves single-adapter pass count by +4 over v9/v30.

### External checks

| Benchmark | Compile | Functional/task pass |
|---|---:|---:|
| Paper-style full | 30/30 | 26/30 task pass; 18/22 functional |
| Robust | 14/15 | 6/10 functional |
| Alt | 7/8 | 3/5 functional |

These match the v9 baseline on the same external checks, so v30b did not show a regression on these small non-VerilogEval suites.

## Interpretation

v30b validates the paper-inspired lesson: simple broad SFT/distillation did not beat v9, but **verification-guided delta distillation** can transfer some of the v29 selector's gains into one standard adapter.

Still, v30b remains far below the v29 verifier-selector pipeline because a single n=1 generation cannot reproduce multi-candidate search and runtime verification.

## Caveat

v30b is a single PEFT LoRA adapter, but it is trained with benchmark-targeted v29 selector outputs. Its VerilogEval result should be described as a targeted/distilled experiment, not a clean zero-shot leaderboard claim.

## Published artifacts

- Hugging Face adapter: `Pablo-Flores-Mollinedo/verilog-qwen2.5-coder-7b-v30b-delta-distilled-lora`
- Main scripts:
  - `scripts/build_v30b_delta_distill_dataset.py`
  - `scripts/run_v30b_delta_distill.sh`
- Result summary:
  - `results/v30b_delta_distill/v30b_summary.json`
  - `results/v30b_delta_distill/verilogeval_direct_summary.json`
  - `results/v30b_delta_distill/verilogeval_direct_summary_results.jsonl`