Qwen3-4B No-Think S1 (Full SFT)

Full-parameter supervised fine-tuning (SFT) of Qwen/Qwen3-4B-Base on oci_nothink_50k (50k OpenCodeInstruct samples) using the qwen3_nothink chat template (direct answers, no extended think blocks).

Model description

  • Method: full SFT (all weights trainable), DeepSpeed ZeRO-2, 4 GPUs
  • Dataset: oci_nothink_50k (50,000 examples)
  • Template: qwen3_nothink
  • Not LoRA / not QLoRA: entire 4B model was updated

Note: The final training save was interrupted by disk full; published weights were restored from checkpoint-782 (same step count as training completion).

Training details

Field Value
Epochs 1
Seed 42
cutoff_len 4096
packing false
per_device_train_batch_size 4
gradient_accumulation_steps 4
effective_batch_size 64 (4 x 4 x 4 GPUs)
learning_rate 3e-5
train_loss 0.1572
train_steps 782
finished_at 2026-06-10 06:18 CST
runtime ~53 min

Related models

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "modrill/qwen3-4b-nothink-s1-full-sft"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

License

Released under Apache 2.0 (see upstream Qwen model card if not bundled here).

Downloads last month
13
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for modrill/qwen3-4b-nothink-s1-full-sft

Finetuned
(323)
this model