Instructions to use Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged") model = AutoModelForMultimodalLM.from_pretrained("Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged
- SGLang
How to use Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged with Docker Model Runner:
docker model run hf.co/Nishef/Qwen3-0.6B-Full_KTO_20251225_102050-merged
Qwen3-0.6B - Kto
Model Description
This model is a Merged Standalone Model fine-tuned from Qwen/Qwen3-0.6B using the Kto training method.
Kahneman-Tversky Optimization - Binary preference optimization based on Prospect Theory
This model was developed as part of thesis research on LLM Alignment using Preference Optimization Methods.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-0.6B |
| Training Method | Kto |
| Model Type | Merged Standalone Model |
| Training Date | December 2025 |
| Framework | PyTorch + Transformers + PEFT |
Benchmark Results
| Benchmark | Score |
|---|---|
| HellaSwag (10-shot) | 0.264 |
| TruthfulQA (0-shot MC2) | 0.486 |
| MMLU-Mini (5-shot) | 0.269 |
Comparative Analysis
The following chart compares this method against other training approaches on the same base model:
Training Configuration
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Batch Size | 2 |
| Gradient Accumulation | 8 |
| Effective Batch Size | 16 |
| Learning Rate | 2e-4 |
| Max Sequence Length | 512 |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Dataset | Combined Preference Dataset (HH-RLHF + SHP + OpenAssistant) |
Combined Preference Dataset (kto_combined)
Training uses a Combined Preference Dataset built via Round-Robin Sampling from three sources:
| Source | Total Samples | Interactions |
|---|---|---|
| Anthropic HH-RLHF | 321,600 | 61,568 |
| Stanford Human Preferences (SHP) | 697,436 | 38,984 |
| OpenAssistant Conversations v1 | 16,810 | 8,904 |
| Total | 1,035,846 | 109,456 |
Actual Training Statistics (subset split train_prefs[:32090]):
- Training samples: 13,300 (paired examples)
- Validation samples: 700 (5%)
- Round-Robin distribution: 1,130 interactions per source
- Seed: 42 (for reproducibility)
Usage
Direct Loading (Merged Model)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen3-0.6B")
# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Training Methodology
Kto
Kahneman-Tversky Optimization - Binary preference optimization based on Prospect Theory
Key Features:
- Binary feedback signals (thumbs up/down)
- No need for paired preference data
- Reference model for KL divergence regularization
- Prospect Theory-inspired loss function
Citation
If you use this model in your research, please cite:
@misc{qwen3_0.6b_kto_2025,
title = {Qwen3-0.6B Fine-tuned with Kto},
author = {Thesis Research},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/Nishef/Qwen3-0.6B-Full_KTO_20251225_102050}
}
Repository Structure
.
├── adapter_config.json # LoRA configuration
├── adapter_model.safetensors # Model weights
├── tokenizer files # Tokenizer configuration
├── eval_summary.csv # Evaluation results
├── thesis_plots/ # Visualization assets
│ ├── benchmark_results.png
│ └── training_loss.png
└── README.md # This file
Acknowledgments
- Base Model: Qwen/Qwen3-0.6B
- Training Framework: Hugging Face Transformers
- Fine-tuning Library: PEFT
License
This model is released under the Apache 2.0 license.
This model was created as part of thesis research on LLM alignment using preference optimization methods.
- Downloads last month
- 2
