Qwen3-4B-Instruct-2507 TeichAI LoRA Adapter

This repository contains a LoRA adapter fine-tuned on top of Qwen/Qwen3-4B-Instruct-2507.

This is not a full merged model. To use it, load the original base model first, then apply this adapter with PEFT.

Model Details

  • Adapter type: LoRA / PEFT adapter
  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Training dataset: TeichAI/claude-haiku-4.5-1700x
  • Rows used: 1,688
  • Training steps: 156
  • Approx epochs: 1.47
  • Max sequence length: 2048
  • Precision during training: bf16 / 16-bit LoRA
  • Training GPU: NVIDIA L40S on Modal
  • EOS token used: <|im_end|>
  • EOS token ID: 151645

Training Configuration

base_model: Qwen/Qwen3-4B-Instruct-2507
dataset: TeichAI/claude-haiku-4.5-1700x
method: LoRA
rank: 16
alpha: 32
dropout: 0.05
target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
max_length: 2048
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
effective_batch_size: 16
learning_rate: 2e-4
weight_decay: 0.01
scheduler: cosine
warmup_steps: 10
max_steps: 156
precision: bf16

Training Run Summary

raw_rows: 1688
rows_before_tokenize: 1688
rows_after_tokenize: 1688
rows_final: 1688
steps_for_one_pass: 106
max_steps: 156
warmup_steps: 10
save_steps: 78
train_runtime: 1082.7869 seconds
epoch: 1.47

The run completed successfully and saved the adapter files.

Expected Files

A normal LoRA adapter repository should include files like:

adapter_config.json
adapter_model.safetensors
tokenizer.json
tokenizer_config.json
special_tokens_map.json
chat_template.jinja
README.md

If this repository only contains the adapter, it must be loaded together with the base model.

Chat Format

This adapter uses the Qwen chat template from the tokenizer.

Recommended input format is through tokenizer.apply_chat_template(...), not manually writing tokens.

Expected Qwen-style chat structure:

<|im_start|>user
Your message here
<|im_end|>
<|im_start|>assistant
Model response here
<|im_end|>

Qwen3-4B-Instruct-2507 is a non-thinking instruct model, so this adapter is intended for normal assistant responses rather than explicit reasoning traces.

Usage with PEFT

Install dependencies:

pip install -U torch transformers peft accelerate safetensors sentencepiece protobuf

Load the adapter:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA"

tokenizer = AutoTokenizer.from_pretrained(
    adapter_id,
    trust_remote_code=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(
    base_model,
    adapter_id,
    torch_dtype=torch.bfloat16,
)

model.eval()

messages = [
    {"role": "user", "content": "Write a short cozy story about a tiny robot learning to paint."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True,
)

print(response)

Usage with Transformers PEFT Integration

Transformers also supports loading PEFT adapters directly on compatible pretrained models.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA"

tokenizer = AutoTokenizer.from_pretrained(
    adapter_id,
    trust_remote_code=True,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

model.load_adapter(adapter_id)
model.eval()

Merge the Adapter

To create a standalone merged model, use PEFT:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA"
merged_output_dir = "./qwen3-4b-teichai-merged"

tokenizer = AutoTokenizer.from_pretrained(
    adapter_id,
    trust_remote_code=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(
    base_model,
    adapter_id,
    torch_dtype=torch.bfloat16,
)

model = model.merge_and_unload()

model.save_pretrained(
    merged_output_dir,
    safe_serialization=True,
    max_shard_size="4GB",
)

tokenizer.save_pretrained(merged_output_dir)

Intended Use

This adapter is intended for:

  • General assistant-style responses
  • Creative writing
  • Short-form writing tasks
  • Conversational text generation
  • Lightweight instruction following

Not Intended For

This adapter is not intended for:

  • Medical, legal, or financial decision-making
  • Safety-critical automation
  • Guaranteed factual answers
  • Formal benchmark claims
  • Production use without evaluation

Limitations

This adapter inherits limitations from:

  • The Qwen3 base model
  • The TeichAI dataset
  • LoRA fine-tuning
  • The small dataset size

Known limitations:

  • May hallucinate facts
  • May overfit to dataset style
  • May produce verbose creative outputs
  • Has not been formally benchmarked
  • May inherit biases from the base model or dataset
  • Performance may vary depending on inference settings

Evaluation

No formal benchmark evaluation has been run yet.

Recommended evaluations:

  • Manual chat quality checks
  • Creative writing tests
  • Instruction-following tests
  • Safety refusal tests
  • Comparison against the base model
  • Regression testing on common prompts

Dataset

This adapter was trained on:

  • TeichAI/claude-haiku-4.5-1700x

The dataset was used as a small instruction/style fine-tuning dataset.

Citation

Base model:

@misc{qwen3_4b_instruct_2507,
  title = {Qwen3-4B-Instruct-2507},
  author = {Qwen Team},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507}}
}

Dataset:

@misc{teichai_claude_haiku_45_1700x,
  title = {TeichAI/claude-haiku-4.5-1700x},
  author = {TeichAI},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/datasets/TeichAI/claude-haiku-4.5-1700x}}
}

License

This adapter is provided under the license listed in the repository metadata. Users should also follow the license and usage terms of:

  • Qwen/Qwen3-4B-Instruct-2507
  • TeichAI/claude-haiku-4.5-1700x
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA

Adapter
(5531)
this model

Dataset used to train Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA