# v30b delta-distilled single-adapter results v30b is the first post-v9 **single standard PEFT LoRA adapter** in this experiment series to improve direct VerilogEval pass count while preserving small external benchmark performance. ```text adapter_v30b_delta_distilled_from_v9 ``` ## Motivation The v29 runtime selector reached 84/156 on VerilogEval by generating candidates from v9 plus multiple specialists and selecting with compile+simulation. That is a strong practical pipeline, but it is not a single deployable adapter. v30 attempted broad distillation of v29 into one adapter and matched v9 at 67/156. v30b instead focused training pressure on the delta: ```text v9 fails + v29 selector passes ``` ## Dataset Builder: ```text scripts/build_v30b_delta_distill_dataset.py ``` Rows: ```text total rows: 3358 unique delta wins: 17 unique selector pass rows: 84 unique v9 pass retention rows: 67 unique clean/manual verified rows: 382 unique external functional rows: 18 unique synthetic verified rows: 316 ``` Repeat weights: ```text delta wins: 80x all selector passes: 4x v9 pass retention: 6x clean verified: 2x external functional: 10x synthetic: 1x ``` The 17 delta-win problems were: ```text Prob018_mux256to1 Prob028_m2014_q4a Prob030_popcount255 Prob042_vector4 Prob045_edgedetect2 Prob050_kmap1 Prob054_edgedetect Prob056_ece241_2013_q7 Prob060_m2014_q4k Prob063_review2015_shiftcount Prob075_counter_2bc Prob088_ece241_2014_q5b Prob097_mux9to1v Prob098_circuit7 Prob121_2014_q3bfsm Prob130_circuit5 Prob138_2012_q2fsm ``` Training used `--drop-overlength`; no rows were truncated. ## Training Launcher: ```text scripts/run_v30b_delta_distill.sh ``` Settings: ```text base model: Qwen/Qwen2.5-Coder-7B-Instruct base adapter: adapter_v9_auto_distilled_direct output: adapter_v30b_delta_distilled_from_v9 epochs: 0.75 learning rate: 7e-7 max length: 2048 LoRA r: 16 LoRA alpha: 32 batch size: 1 grad accum: 4 warmup steps: 40 ``` ## Results ### VerilogEval v2 direct | Adapter/system | Compile | Pass | |---|---:|---:| | v9 single adapter | — | 67/156 | | v29 multi-adapter selector | 150/156 | 84/156 | | v30 unified single adapter | 134/156 | 67/156 | | **v30b delta-distilled single adapter** | **141/156** | **71/156** | v30b improves single-adapter pass count by +4 over v9/v30. ### External checks | Benchmark | Compile | Functional/task pass | |---|---:|---:| | Paper-style full | 30/30 | 26/30 task pass; 18/22 functional | | Robust | 14/15 | 6/10 functional | | Alt | 7/8 | 3/5 functional | These match the v9 baseline on the same external checks, so v30b did not show a regression on these small non-VerilogEval suites. ## Interpretation v30b validates the paper-inspired lesson: simple broad SFT/distillation did not beat v9, but **verification-guided delta distillation** can transfer some of the v29 selector's gains into one standard adapter. Still, v30b remains far below the v29 verifier-selector pipeline because a single n=1 generation cannot reproduce multi-candidate search and runtime verification. ## Caveat v30b is a single PEFT LoRA adapter, but it is trained with benchmark-targeted v29 selector outputs. Its VerilogEval result should be described as a targeted/distilled experiment, not a clean zero-shot leaderboard claim. ## Published artifacts - Hugging Face adapter: `Pablo-Flores-Mollinedo/verilog-qwen2.5-coder-7b-v30b-delta-distilled-lora` - Main scripts: - `scripts/build_v30b_delta_distill_dataset.py` - `scripts/run_v30b_delta_distill.sh` - Result summary: - `results/v30b_delta_distill/v30b_summary.json` - `results/v30b_delta_distill/verilogeval_direct_summary.json` - `results/v30b_delta_distill/verilogeval_direct_summary_results.jsonl`