--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/structured_data_with_cot_dataset_v5_6k - daichira/structured-hard-sft-4k - u-10bei/structured_data_with_cot_dataset_512_v4 language: - en license: apache-2.0 library_name: peft pipeline_tag: text-generation tags: - qlora - lora - structured-output --- # qwen3-4b-h100-v5-hard-ep3 Top-ranker strategy model. Trained on H100 with a blend of three datasets (approx. 14k rows) and heavily preprocessed with custom clean_assistant_output_v2 (CoT stripping, markdown removal, TOML comment removal). ## Training Configuration - Base model: Qwen/Qwen3-4B-Instruct-2507 - Max sequence length: 4096 - Epochs: 3 - Learning rate: 2e-5 - Effective Batch size: 32 (BS=8, GradAccum=4) - LoRA R: 128