v30b delta-distilled single-adapter results

v30b is the first post-v9 single standard PEFT LoRA adapter in this experiment series to improve direct VerilogEval pass count while preserving small external benchmark performance.

adapter_v30b_delta_distilled_from_v9

Motivation

The v29 runtime selector reached 84/156 on VerilogEval by generating candidates from v9 plus multiple specialists and selecting with compile+simulation. That is a strong practical pipeline, but it is not a single deployable adapter.

v30 attempted broad distillation of v29 into one adapter and matched v9 at 67/156. v30b instead focused training pressure on the delta:

v9 fails + v29 selector passes

Dataset

Builder:

scripts/build_v30b_delta_distill_dataset.py

Rows:

total rows: 3358
unique delta wins: 17
unique selector pass rows: 84
unique v9 pass retention rows: 67
unique clean/manual verified rows: 382
unique external functional rows: 18
unique synthetic verified rows: 316

Repeat weights:

delta wins:          80x
all selector passes:  4x
v9 pass retention:    6x
clean verified:       2x
external functional: 10x
synthetic:            1x

The 17 delta-win problems were:

Prob018_mux256to1
Prob028_m2014_q4a
Prob030_popcount255
Prob042_vector4
Prob045_edgedetect2
Prob050_kmap1
Prob054_edgedetect
Prob056_ece241_2013_q7
Prob060_m2014_q4k
Prob063_review2015_shiftcount
Prob075_counter_2bc
Prob088_ece241_2014_q5b
Prob097_mux9to1v
Prob098_circuit7
Prob121_2014_q3bfsm
Prob130_circuit5
Prob138_2012_q2fsm

Training used --drop-overlength; no rows were truncated.

Training

Launcher:

scripts/run_v30b_delta_distill.sh

Settings:

base model: Qwen/Qwen2.5-Coder-7B-Instruct
base adapter: adapter_v9_auto_distilled_direct
output: adapter_v30b_delta_distilled_from_v9
epochs: 0.75
learning rate: 7e-7
max length: 2048
LoRA r: 16
LoRA alpha: 32
batch size: 1
grad accum: 4
warmup steps: 40

Results

VerilogEval v2 direct

Adapter/system	Compile	Pass
v9 single adapter	—	67/156
v29 multi-adapter selector	150/156	84/156
v30 unified single adapter	134/156	67/156
v30b delta-distilled single adapter	141/156	71/156

v30b improves single-adapter pass count by +4 over v9/v30.

External checks

Benchmark	Compile	Functional/task pass
Paper-style full	30/30	26/30 task pass; 18/22 functional
Robust	14/15	6/10 functional
Alt	7/8	3/5 functional

These match the v9 baseline on the same external checks, so v30b did not show a regression on these small non-VerilogEval suites.

Interpretation

v30b validates the paper-inspired lesson: simple broad SFT/distillation did not beat v9, but verification-guided delta distillation can transfer some of the v29 selector's gains into one standard adapter.

Still, v30b remains far below the v29 verifier-selector pipeline because a single n=1 generation cannot reproduce multi-candidate search and runtime verification.

Caveat

v30b is a single PEFT LoRA adapter, but it is trained with benchmark-targeted v29 selector outputs. Its VerilogEval result should be described as a targeted/distilled experiment, not a clean zero-shot leaderboard claim.

Published artifacts

Hugging Face adapter: Pablo-Flores-Mollinedo/verilog-qwen2.5-coder-7b-v30b-delta-distilled-lora
Main scripts:
- scripts/build_v30b_delta_distill_dataset.py
- scripts/run_v30b_delta_distill.sh
Result summary:
- results/v30b_delta_distill/v30b_summary.json
- results/v30b_delta_distill/verilogeval_direct_summary.json
- results/v30b_delta_distill/verilogeval_direct_summary_results.jsonl