Qwen3 4B Structured Output Model

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth).

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV) by focusing on raw data generation.

Key Features & Training Strategy

Complexity-Aware Reasoning: The model is trained with a dynamic approach to Chain-of-Thought (CoT).
- For simple and medium tasks, reasoning is omitted to prioritize direct and high-speed structured data generation.
- For high-complexity tasks, the reasoning process is preserved to ensure accuracy and logical consistency during complex data transformations.
Noise Reduction (Forbidden Tokens): Common conversational fillers (e.g., "Here is the data...") and markdown code blocks (e.g., ```json) are masked during the training process. This forces the model to output clean, raw structured text suitable for programmatic parsing.
Assistant-Focused Learning: The training loss is applied exclusively to the final assistant responses. User instructions and internal reasoning steps are excluded from the gradient calculation, focusing the model's capacity on providing the correct final answer.

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: QLoRA (4-bit)
Max sequence length: 512
Epochs: 2
Learning rate: 7e-06
LoRA: r=64, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "zerg2187/lora_structeval_t_qwen3_penalty_tokens_v2_d1"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data: u-10bei/structured_data_with_cot_dataset_512

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

Downloads last month: 2

Model tree for zerg2187/lora_structeval_t_qwen3_penalty_tokens_v2_d1

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5557)

this model

zerg2187
/

lora_structeval_t_qwen3_penalty_tokens_v2_d1