--- license: apache-2.0 base_model: samscrack/Qwopus3.6-27B-solidity-cpt-stageA language: - en library_name: peft tags: - solidity - smart-contracts - code-generation - instruction-tuning - lora - qlora - peft - blockchain - ethereum - foundry - sft datasets: - braindao/solidity-base-sft-v2 - lohoz/Smart-Contract-MultiTask-Dataset pipeline_tag: text-generation inference: false --- # Qwopus3.6-27B-solidity-sft-stage1B > ⚠️ **Intermediate checkpoint — Stage 1 of 5.** This is the spec→contract instruction-following LoRA, layered on top of the Stage 0 CPT adapter. **Not intended for direct production use.** Audit/reasoning capability and last-mile pass-rate improvements come from Stages 2-4. Use the final-stage output for actual deployment. A LoRA r=64 adapter on top of `Qwopus3.6-27B-solidity-cpt-merged` (the merged Stage 0 CPT base) that teaches the model to translate natural-language specs into idiomatic Solidity contracts, plus optionally Foundry test suites. ## Pipeline context | # | Stage | Status | Output | |---|---|---|---| | 0 | Continued pretrain (DoRA on Solidity corpus) | ✅ done | [`Qwopus3.6-27B-solidity-cpt-stageA`](https://huggingface.co/samscrack/Qwopus3.6-27B-solidity-cpt-stageA) | | **1** | **SFT (instruction): spec → contract** | **✅ done — this repo** | this repo | | 2 | SFT (audit / Long-CoT reasoning) | 🟡 in progress | TBD | | 3 | RFT (rejection-sampling FT against `forge test`) | ⬜ planned | TBD | | 4 | GSPO (sequence-level RL with executor reward) | ⬜ planned | TBD | Stage 1B is the **instruction-following head** — it makes the Solidity-pretrained base actually respond to user prompts in the qwen3-thinking chat template format. The `...` block is intentionally empty for this stage; per-token reasoning supervision comes in Stage 2. ## Training data After ruthless quality filtering (strict pragma ≥ 0.7, license-clean, no GPT-3.5 teachers, forge-verified Opus rows only): | Source | Rows | Role | |---|---|---| | `braindao/solidity-base-sft-v2` (filtered) + `lohoz/Smart-Contract-MultiTask-Dataset[requirement_fsm_code]` | 65,100 | spec → contract, pragma ≥ 0.7 | | Opus 4.7-synthesized (contract, Foundry test) pairs that compile AND pass `forge test` | 4,919 | spec → contract + test suite | | **Total pool** | **70,019** | sampled to 14,000 → 12,796 after ctx≤8192 filter | The 4,919 Opus rows were re-verified after a multi-solc patch to `verify_synth.py` (initial run had only 384 verified due to a hard-coded ≥0.8.13 preflight; patched to accept any 0.8.x and rewrite plain pragmas to `^0.8.0`). Training set: **12,796 rows** of natural-language Solidity instructions paired with verified contract outputs. ## Recipe (Jack's recipe, Stage 1B) - **LoRA**: r=64, α=64, dropout=0 - **Target modules**: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, out_proj` - **Trainable parameters**: 353,370,112 (~2.34% of effective 15B param count) - **Quantization**: base loaded in 4-bit (BnB NF4, QLoRA-style); adapter weights bf16 - **Effective batch size**: 72 (4 per device × 9 grad accum × 2 GPUs) - **Sequence length**: 8,192 - **Optimizer**: 8-bit AdamW (`adamw_8bit`), weight decay 0.001 - **Learning rate**: 2e-4, linear schedule, warmup 5% - **Epochs**: 1 - **Total steps**: 178 - **Chat template**: `qwen3-thinking` (with `` empty for this stage) - **Loss masking**: `train_on_responses_only` (loss only on assistant tokens after `<|im_start|>assistant\n`) ## Training metrics - **Wall time**: 6h 48m - **Train loss**: 0.367 → **0.289** (final), min 0.223 - **Hardware**: 2× NVIDIA RTX PRO 6000 Blackwell Workstation Edition (96 GB each) - **Distributed**: DDP via `torchrun --nproc-per-node=2` - **Framework**: [Unsloth](https://github.com/unslothai/unsloth) 2026.4.7 with TRL 0.22.2 ## Usage ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = AutoModelForCausalLM.from_pretrained( "samscrack/Qwopus3.6-27B-solidity-cpt-stageA", # merge with your CPT base first torch_dtype="bfloat16", device_map="auto", ) model = PeftModel.from_pretrained(base, "samscrack/Qwopus3.6-27B-solidity-sft-stage1B") tokenizer = AutoTokenizer.from_pretrained("samscrack/Qwopus3.6-27B-solidity-sft-stage1B") messages = [{"role": "user", "content": "Implement an ERC-20 token with a 1% transfer tax that goes to a treasury address. " "Include events, ownership, and Solidity 0.8.20 syntax."}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=2048, do_sample=False) print(tokenizer.decode(out[0], skip_special_tokens=False)) ``` ## Limitations - **No reasoning depth.** The `` block is empty by design for Stage 1. Expect direct contract emission, no chain-of-thought. Stage 2 adds Long-CoT. - **No security analysis.** Stage 1 doesn't audit, doesn't flag vulnerabilities, doesn't reason about MEV/reentrancy. Stage 2 does. - **No `forge test` validation in the loss.** Stage 1 uses next-token cross-entropy only. RFT (Stage 3) introduces test-pass as a reward signal. - **Solidity ≥ 0.7 bias** (training data was strict-filtered). Will be weaker on 0.4/0.5/0.6 idioms. - **Audit / contest authorship not validated** for this stage's data — sources are spec→contract instruction pairs, not audit findings. ## Related artifacts - **Stage 0 CPT**: [`samscrack/Qwopus3.6-27B-solidity-cpt-stageA`](https://huggingface.co/samscrack/Qwopus3.6-27B-solidity-cpt-stageA) - **Eval set (held-out, must not train on)**: [`samscrack/solidity-eval-2026`](https://huggingface.co/datasets/samscrack/solidity-eval-2026) - **Stage 2 audit-CoT corpus** (Opus 4.7 traces): [`samscrack/solidity-audit-cot`](https://huggingface.co/datasets/samscrack/solidity-audit-cot) ## Citation ```bibtex @misc{qwopus3-6-27b-solidity-sft-stage1b-2026, author = {samscrack}, title = {Qwopus3.6-27B-solidity-sft-stage1B: Stage 1 instruction LoRA on the Solidity-pretrained base}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/samscrack/Qwopus3.6-27B-solidity-sft-stage1B}}, } ```