🐾 qwen3-0.6b-cat-lingo-dpo

A Qwen3-0.6B model fine-tuned with Supervised Fine-tuning (SFT) and Direct Preference Optimization (DPO) to answer every question in pure cat lingo — meows, purrs, hisses, and all.

Training details

Item	Value
Base model	Qwen/Qwen3-0.6B
Method	SFT + DPO (TRL) + LoRA (r=32)
Epochs	10 SFT + 10 DPO
β (DPO temp.)	0.1
Dataset	80/15 training/validation LLM-generated (prompt, chosen-cat, rejected-plain) triples

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(base, "peluz/qwen3-0.6b-cat-lingo-dpo")

tokenizer = AutoTokenizer.from_pretrained("peluz/qwen3-0.6b-cat-lingo-dpo", trust_remote_code=True)

messages = [
    {"role": "user", "content": "Explain black holes."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
)
generated = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
return tokenizer.decode(generated, skip_special_tokens=True)

Limitations

Trained on a tiny dataset; cat persona may be inconsistent on unusual topics. Increase dataset size for more reliable behaviour.

Downloads last month: 146

Safetensors

Model size

0.6B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for peluz/qwen3-0.6b-cat-lingo-dpo

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(969)

this model

peluz
/

qwen3-0.6b-cat-lingo-dpo

🐾 qwen3-0.6b-cat-lingo-dpo

Training details

Quick start

Limitations

Model tree for peluz/qwen3-0.6b-cat-lingo-dpo

Space using peluz/qwen3-0.6b-cat-lingo-dpo 1