🐾 qwen3-0.6b-cat-lingo-dpo
A Qwen3-0.6B model fine-tuned with Supervised Fine-tuning (SFT) and Direct Preference Optimization (DPO) to answer every question in pure cat lingo — meows, purrs, hisses, and all.
Training details
| Item | Value |
|---|---|
| Base model | Qwen/Qwen3-0.6B |
| Method | SFT + DPO (TRL) + LoRA (r=32) |
| Epochs | 10 SFT + 10 DPO |
| β (DPO temp.) | 0.1 |
| Dataset | 80/15 training/validation LLM-generated (prompt, chosen-cat, rejected-plain) triples |
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-0.6B",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "peluz/qwen3-0.6b-cat-lingo-dpo")
tokenizer = AutoTokenizer.from_pretrained("peluz/qwen3-0.6b-cat-lingo-dpo", trust_remote_code=True)
messages = [
{"role": "user", "content": "Explain black holes."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
)
generated = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
return tokenizer.decode(generated, skip_special_tokens=True)
Limitations
Trained on a tiny dataset; cat persona may be inconsistent on unusual topics. Increase dataset size for more reliable behaviour.
- Downloads last month
- 146
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support