---
language:
- kk
- ru
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- kazakh
- multilingual
- instruction-tuning
- tool-calling
- function-calling
- agent
- conversational
base_model: nur-dev/farabi-0.6B-base
license: apache-2.0
---

# Farabi-0.6B

**Farabi-0.6B** is a compact, multilingual instruction-tuned language model with a
primary focus on **Kazakh**, alongside strong **Russian** and **English** support.
It is designed for everyday assistant use, reasoning, retrieval-grounded answering,
and **tool / function calling** in agentic applications.

The model speaks fluent Kazakh and is intended to make high-quality conversational
AI more accessible for the Kazakh language, where well-aligned models remain scarce.

Created by **[Nurgali Kadyrbek](https://www.linkedin.com/in/nurgali-kadyrbek-504260231/)**.

It is built on **[`nur-dev/farabi-0.6B-base`](https://huggingface.co/nur-dev/farabi-0.6B-base)** —
a Kazakh-adapted base model that was itself continually pre-trained from Qwen3-0.6B — and then
instruction-tuned to produce this assistant.

---

## Highlights

- 🇰🇿 **Kazakh-first** — the majority of the instruction data is native Kazakh, with
  Russian and English mixed in for cross-lingual robustness.
- 🧠 **Reasoning** — supports optional step-by-step "thinking" mode that can be toggled
  on or off at request time.
- 🔧 **Tool calling** — emits Hermes-style `<tool_call>` blocks and is compatible with
  the OpenAI-style function-calling interface and agent frameworks.
- 📚 **Grounded answering** — trained to answer from provided documents and context,
  including longer inputs.
- 🪶 **Small & deployable** — 0.6B parameters, runs comfortably on a single modest GPU.

---

## Languages

| Language | Approx. share of instruction data |
|----------|-----------------------------------|
| Kazakh (kk)  | ~56% |
| English (en) | ~33% |
| Russian (ru) | ~10% |

---

## Data coverage by domain

The model was instruction-tuned on a broad, internally curated mixture. Described in
general terms (no technical specifics), the approximate domain composition is:

| Domain | Approx. share |
|--------|---------------|
| General instruction following & multi-turn conversation | ~45% |
| Reasoning & step-by-step problem solving | ~27% |
| Retrieval-grounded answering, long context & document Q&A | ~13% |
| Tool use, function calling & agentic interaction | ~7% |
| Knowledge, culture, news & encyclopedic content | ~4% |
| Mathematics, language tasks (grammar / translation), safety & appropriate refusal, device & environment control, and assistant identity | ~4% |

*Shares are approximate and reflect general domain proportions rather than exact figures.*

---

## Data provenance & acknowledgments

The training datasets were **created internally by the author**, including original
synthesis as well as additionally processed and enriched material.

Approximately **5.4%** of all data used for instruction tuning was derived (with
additional processing and enrichment) from resources of two organizations, whose
contributions to the Kazakh language are gratefully acknowledged:

1. **Институт языкознания имени А. Байтурсынова** — *Institute of Linguistics named after A. Baitursynov*
2. **ННПЦ «Тіл-Қазына» имени Шайсултана Шаяхметова** — *Sh. Shayakhmetov National Research and Practical Center "Til-Qazyna"*

---

## Recommended sampling parameters

A good starting point for general use:

```json
{
  "temperature": 0.15,
  "top_p": 0.95,
  "max_tokens": 1024,
  "repetition_penalty": 1.05,
  "stream": true,
  "chat_template_kwargs": {
    "enable_thinking": true
  },
  "continue_final_message": true
}
```

Set `"enable_thinking": false` to get direct answers without an explicit reasoning step.
Raise `temperature` for more creative / open-ended generation.

---

## Serving with vLLM

Start an OpenAI-compatible server with tool-calling enabled:

```bash
vllm serve nur-dev/farabi-0.6B \
  --served-model-name farabi-0.6b \
  --enable-auto-tool-choice \
  --tool-call-parser hermes
```

Query it with the standard OpenAI client (and the recommended sampling params):

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

resp = client.chat.completions.create(
    model="farabi-0.6b",
    messages=[
        {"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."},
        {"role": "user", "content": "Алматы туралы қысқаша айтып бер."},
    ],
    temperature=0.15,
    top_p=0.95,
    max_tokens=1024,
    extra_body={
        "repetition_penalty": 1.05,
        "chat_template_kwargs": {"enable_thinking": True},
    },
    stream=True,
)
for chunk in resp:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
```

Tool calling works through the standard `tools=[...]` argument — the model returns
function calls that the server parses into structured `tool_calls`.

---

## Serving with PyTorch / Transformers

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "nur-dev/farabi-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."},
    {"role": "user", "content": "Қазақстанның астанасы қай қала?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    enable_thinking=True,        # set False for direct answers
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.15,
    top_p=0.95,
    repetition_penalty=1.05,
)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
```

---

## Evaluation

> ⚠️ **Interim results.** The numbers below were measured on an early checkpoint
> (~17% through instruction tuning). They are expected to improve as training
> continues, but already show meaningful capability.

### Tool / function calling — BFCL v4

Berkeley Function-Calling Leaderboard (v4), 1,040 cases, evaluated with the
HuggingFace backend.

| Category | Accuracy | n | What it measures |
|----------|----------|---|------------------|
| Simple             | 80.5% | 322/400 | one call, one tool available |
| Multiple           | 71.5% | 143/200 | pick the right tool from several |
| Parallel           | 65.5% | 131/200 | several calls in one turn |
| Irrelevance        | 5.4%  | 13/240  | abstain when no tool fits |
| **Overall**            | **58.6%** | 609/1040 | |
| **Function-calling avg** | **74.5%** | 596/800 | excludes irrelevance |

**Takeaways:**
- **Strong calling ability for a 0.6B model.** When a call is warranted it is correct
  ~74.5% of the time — right tool, valid arguments, clean JSON — including 65.5% on the
  hard parallel / multi-call category.
- **The weakness is abstention, not calling.** On queries that match no available tool,
  the model still tends to emit a call (irrelevance 5.4% → it over-triggers). This is the
  main driver of the lower overall score and the clearest area for improvement.

### Multilingual comprehension — 4-way multiple choice

Multiple-choice comprehension across the model's three languages (random baseline = 25%),
evaluated with the chat template and `enable_thinking=False`.

| Language | Accuracy |
|----------|----------|
| English  | 53.7% ±1.7 |
| Russian  | 50.0% ±1.7 |
| Kazakh   | 41.8% ±1.6 |

**Takeaways:**
- Well above the 25% random baseline in all three languages — real comprehension in
  English, Russian, and Kazakh.
- Resource ordering (en > ru > kk) is as expected; Kazakh at 41.8% is clearly non-trivial.
- Evaluating with the chat template and `enable_thinking=False` adds ~5–6 points per
  language versus a raw prompt — another reason to serve the model with its chat template
  (see serving instructions above).

---

## Intended use & limitations

Farabi-0.6B is intended as a helpful general-purpose and agentic assistant, with a
focus on Kazakh-language use cases. As a small model, it can make factual mistakes,
and outputs should be verified for high-stakes or factual-critical applications. It
should be used responsibly and in accordance with applicable laws and the base model's
license.

---

## Citation

If you use this model, please credit the author:

> Nurgali Kadyrbek — Farabi-0.6B.
> https://www.linkedin.com/in/nurgali-kadyrbek-504260231/