Instructions to use Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-4B-Instruct-2507")
model = PeftModel.from_pretrained(base_model, "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA")

Transformers

How to use Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA

SGLang

How to use Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA",
    max_seq_length=2048,
)

Docker Model Runner
How to use Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA with Docker Model Runner:
```
docker model run hf.co/Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA
```

Qwen3-4B-Instruct-2507 TeichAI LoRA Adapter

This repository contains a LoRA adapter fine-tuned on top of Qwen/Qwen3-4B-Instruct-2507.

This is not a full merged model. To use it, load the original base model first, then apply this adapter with PEFT.

Model Details

Adapter type: LoRA / PEFT adapter
Base model: Qwen/Qwen3-4B-Instruct-2507
Training dataset: TeichAI/claude-haiku-4.5-1700x
Rows used: 1,688
Training steps: 156
Approx epochs: 1.47
Max sequence length: 2048
Precision during training: bf16 / 16-bit LoRA
Training GPU: NVIDIA L40S on Modal
EOS token used: <|im_end|>
EOS token ID: 151645

Training Configuration

base_model: Qwen/Qwen3-4B-Instruct-2507
dataset: TeichAI/claude-haiku-4.5-1700x
method: LoRA
rank: 16
alpha: 32
dropout: 0.05
target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
max_length: 2048
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
effective_batch_size: 16
learning_rate: 2e-4
weight_decay: 0.01
scheduler: cosine
warmup_steps: 10
max_steps: 156
precision: bf16

Training Run Summary

raw_rows: 1688
rows_before_tokenize: 1688
rows_after_tokenize: 1688
rows_final: 1688
steps_for_one_pass: 106
max_steps: 156
warmup_steps: 10
save_steps: 78
train_runtime: 1082.7869 seconds
epoch: 1.47

The run completed successfully and saved the adapter files.

Expected Files

A normal LoRA adapter repository should include files like:

adapter_config.json
adapter_model.safetensors
tokenizer.json
tokenizer_config.json
special_tokens_map.json
chat_template.jinja
README.md

If this repository only contains the adapter, it must be loaded together with the base model.

Chat Format

This adapter uses the Qwen chat template from the tokenizer.

Recommended input format is through tokenizer.apply_chat_template(...), not manually writing tokens.

Expected Qwen-style chat structure:

<|im_start|>user
Your message here
<|im_end|>
<|im_start|>assistant
Model response here
<|im_end|>

Qwen3-4B-Instruct-2507 is a non-thinking instruct model, so this adapter is intended for normal assistant responses rather than explicit reasoning traces.

Usage with PEFT

Install dependencies:

pip install -U torch transformers peft accelerate safetensors sentencepiece protobuf

Load the adapter:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA"

tokenizer = AutoTokenizer.from_pretrained(
    adapter_id,
    trust_remote_code=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(
    base_model,
    adapter_id,
    torch_dtype=torch.bfloat16,
)

model.eval()

messages = [
    {"role": "user", "content": "Write a short cozy story about a tiny robot learning to paint."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True,
)

print(response)

Usage with Transformers PEFT Integration

Transformers also supports loading PEFT adapters directly on compatible pretrained models.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA"

tokenizer = AutoTokenizer.from_pretrained(
    adapter_id,
    trust_remote_code=True,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

model.load_adapter(adapter_id)
model.eval()

Merge the Adapter

To create a standalone merged model, use PEFT:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA"
merged_output_dir = "./qwen3-4b-teichai-merged"

tokenizer = AutoTokenizer.from_pretrained(
    adapter_id,
    trust_remote_code=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(
    base_model,
    adapter_id,
    torch_dtype=torch.bfloat16,
)

model = model.merge_and_unload()

model.save_pretrained(
    merged_output_dir,
    safe_serialization=True,
    max_shard_size="4GB",
)

tokenizer.save_pretrained(merged_output_dir)

Intended Use

This adapter is intended for:

General assistant-style responses
Creative writing
Short-form writing tasks
Conversational text generation
Lightweight instruction following

Not Intended For

This adapter is not intended for:

Medical, legal, or financial decision-making
Safety-critical automation
Guaranteed factual answers
Formal benchmark claims
Production use without evaluation

Limitations

This adapter inherits limitations from:

The Qwen3 base model
The TeichAI dataset
LoRA fine-tuning
The small dataset size

Known limitations:

May hallucinate facts
May overfit to dataset style
May produce verbose creative outputs
Has not been formally benchmarked
May inherit biases from the base model or dataset
Performance may vary depending on inference settings

Evaluation

No formal benchmark evaluation has been run yet.

Recommended evaluations:

Manual chat quality checks
Creative writing tests
Instruction-following tests
Safety refusal tests
Comparison against the base model
Regression testing on common prompts

Dataset

This adapter was trained on:

TeichAI/claude-haiku-4.5-1700x

The dataset was used as a small instruction/style fine-tuning dataset.

Citation

Base model:

@misc{qwen3_4b_instruct_2507,
  title = {Qwen3-4B-Instruct-2507},
  author = {Qwen Team},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507}}
}

Dataset:

@misc{teichai_claude_haiku_45_1700x,
  title = {TeichAI/claude-haiku-4.5-1700x},
  author = {TeichAI},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/datasets/TeichAI/claude-haiku-4.5-1700x}}
}

License

This adapter is provided under the license listed in the repository metadata. Users should also follow the license and usage terms of:

Qwen/Qwen3-4B-Instruct-2507
TeichAI/claude-haiku-4.5-1700x

Downloads last month: 1

Model tree for Tralalabs/Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5531)

this model

Tralalabs
/

Qwen3-2507-4B-Instruct-Haiku-4.5-LoRA