physics-v07c-sft-qwen3.5-4b-merged

v07 SFT of Qwen/Qwen3.5-4B for EXACT-2026 Task-2 physics: emit a short (5-10 line) reasoning preamble then ONE Python code block that computes FINAL ANSWER: / UNIT:. Merged full model — serve directly with vLLM.

Training

  • Method: 16-bit LoRA, train-on-completion (loss on the assistant turn only).
  • LoRA: r=8, alpha=16, dropout=0.05, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj'].
  • Epochs=6, lr=3e-05, eff_batch=16, max_seq_len=2048.
  • Data: 2634 train / 110 val trajectories (golden_60 held out).
  • Chat template: Qwen <|im_start|>/<|im_end|>; eos <|im_end|>, pad <|endoftext|>.

Metrics

  • train_loss: 0.26994939584924715
  • best_eval_loss: 0.26999393105506897
  • best_eval_accuracy (val_56, selection metric): 0.7857142857142857
  • (run eval.py to fill)

External hint/teacher data used during data-gen is declared in the Data Disclosure Document.

Downloads last month
48
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Laplaces-Red-Devils/physics-v07c-sft-qwen3.5-4b-merged

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(304)
this model

Collection including Laplaces-Red-Devils/physics-v07c-sft-qwen3.5-4b-merged