---
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- u-10bei/structured_data_with_cot_dataset_v5_6k
- daichira/structured-hard-sft-4k
- u-10bei/structured_data_with_cot_dataset_512_v4
language:
- en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
- qlora
- lora
- structured-output
---

# qwen3-4b-h100-v5-hard-ep3

Top-ranker strategy model.
Trained on H100 with a blend of three datasets (approx. 14k rows) and heavily preprocessed with custom clean_assistant_output_v2 (CoT stripping, markdown removal, TOML comment removal).

## Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Max sequence length: 4096
- Epochs: 3
- Learning rate: 2e-5
- Effective Batch size: 32 (BS=8, GradAccum=4)
- LoRA R: 128