🐾 qwen3-0.6b-cat-lingo-dpo

A Qwen3-0.6B model fine-tuned with Supervised Fine-tuning (SFT) and Direct Preference Optimization (DPO) to answer every question in pure cat lingo — meows, purrs, hisses, and all.

Training details

Item Value
Base model Qwen/Qwen3-0.6B
Method SFT + DPO (TRL) + LoRA (r=32)
Epochs 10 SFT + 10 DPO
β (DPO temp.) 0.1
Dataset 80/15 training/validation LLM-generated (prompt, chosen-cat, rejected-plain) triples

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(base, "peluz/qwen3-0.6b-cat-lingo-dpo")

tokenizer = AutoTokenizer.from_pretrained("peluz/qwen3-0.6b-cat-lingo-dpo", trust_remote_code=True)

messages = [
    {"role": "user", "content": "Explain black holes."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
)
generated = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
return tokenizer.decode(generated, skip_special_tokens=True)

Limitations

Trained on a tiny dataset; cat persona may be inconsistent on unusual topics. Increase dataset size for more reliable behaviour.

Downloads last month
146
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for peluz/qwen3-0.6b-cat-lingo-dpo

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(969)
this model

Space using peluz/qwen3-0.6b-cat-lingo-dpo 1