--- language: - kk - ru - en pipeline_tag: text-generation library_name: transformers tags: - kazakh - multilingual - instruction-tuning - tool-calling - function-calling - agent - conversational base_model: nur-dev/farabi-0.6B-base license: apache-2.0 --- # Farabi-0.6B **Farabi-0.6B** is a compact, multilingual instruction-tuned language model with a primary focus on **Kazakh**, alongside strong **Russian** and **English** support. It is designed for everyday assistant use, reasoning, retrieval-grounded answering, and **tool / function calling** in agentic applications. The model speaks fluent Kazakh and is intended to make high-quality conversational AI more accessible for the Kazakh language, where well-aligned models remain scarce. Created by **[Nurgali Kadyrbek](https://www.linkedin.com/in/nurgali-kadyrbek-504260231/)**. It is built on **[`nur-dev/farabi-0.6B-base`](https://huggingface.co/nur-dev/farabi-0.6B-base)** — a Kazakh-adapted base model that was itself continually pre-trained from Qwen3-0.6B — and then instruction-tuned to produce this assistant. --- ## Highlights - 🇰🇿 **Kazakh-first** — the majority of the instruction data is native Kazakh, with Russian and English mixed in for cross-lingual robustness. - 🧠 **Reasoning** — supports optional step-by-step "thinking" mode that can be toggled on or off at request time. - 🔧 **Tool calling** — emits Hermes-style `` blocks and is compatible with the OpenAI-style function-calling interface and agent frameworks. - 📚 **Grounded answering** — trained to answer from provided documents and context, including longer inputs. - 🪶 **Small & deployable** — 0.6B parameters, runs comfortably on a single modest GPU. --- ## Languages | Language | Approx. share of instruction data | |----------|-----------------------------------| | Kazakh (kk) | ~56% | | English (en) | ~33% | | Russian (ru) | ~10% | --- ## Data coverage by domain The model was instruction-tuned on a broad, internally curated mixture. Described in general terms (no technical specifics), the approximate domain composition is: | Domain | Approx. share | |--------|---------------| | General instruction following & multi-turn conversation | ~45% | | Reasoning & step-by-step problem solving | ~27% | | Retrieval-grounded answering, long context & document Q&A | ~13% | | Tool use, function calling & agentic interaction | ~7% | | Knowledge, culture, news & encyclopedic content | ~4% | | Mathematics, language tasks (grammar / translation), safety & appropriate refusal, device & environment control, and assistant identity | ~4% | *Shares are approximate and reflect general domain proportions rather than exact figures.* --- ## Data provenance & acknowledgments The training datasets were **created internally by the author**, including original synthesis as well as additionally processed and enriched material. Approximately **5.4%** of all data used for instruction tuning was derived (with additional processing and enrichment) from resources of two organizations, whose contributions to the Kazakh language are gratefully acknowledged: 1. **Институт языкознания имени А. Байтурсынова** — *Institute of Linguistics named after A. Baitursynov* 2. **ННПЦ «Тіл-Қазына» имени Шайсултана Шаяхметова** — *Sh. Shayakhmetov National Research and Practical Center "Til-Qazyna"* --- ## Recommended sampling parameters A good starting point for general use: ```json { "temperature": 0.15, "top_p": 0.95, "max_tokens": 1024, "repetition_penalty": 1.05, "stream": true, "chat_template_kwargs": { "enable_thinking": true }, "continue_final_message": true } ``` Set `"enable_thinking": false` to get direct answers without an explicit reasoning step. Raise `temperature` for more creative / open-ended generation. --- ## Serving with vLLM Start an OpenAI-compatible server with tool-calling enabled: ```bash vllm serve nur-dev/farabi-0.6B \ --served-model-name farabi-0.6b \ --enable-auto-tool-choice \ --tool-call-parser hermes ``` Query it with the standard OpenAI client (and the recommended sampling params): ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY") resp = client.chat.completions.create( model="farabi-0.6b", messages=[ {"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."}, {"role": "user", "content": "Алматы туралы қысқаша айтып бер."}, ], temperature=0.15, top_p=0.95, max_tokens=1024, extra_body={ "repetition_penalty": 1.05, "chat_template_kwargs": {"enable_thinking": True}, }, stream=True, ) for chunk in resp: delta = chunk.choices[0].delta.content if delta: print(delta, end="", flush=True) ``` Tool calling works through the standard `tools=[...]` argument — the model returns function calls that the server parses into structured `tool_calls`. --- ## Serving with PyTorch / Transformers ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "nur-dev/farabi-0.6B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ {"role": "system", "content": "Сіз пайдалы әрі дәл көмекшісіз."}, {"role": "user", "content": "Қазақстанның астанасы қай қала?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, enable_thinking=True, # set False for direct answers return_tensors="pt", ).to(model.device) outputs = model.generate( inputs, max_new_tokens=1024, do_sample=True, temperature=0.15, top_p=0.95, repetition_penalty=1.05, ) print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)) ``` --- ## Evaluation > ⚠️ **Interim results.** The numbers below were measured on an early checkpoint > (~17% through instruction tuning). They are expected to improve as training > continues, but already show meaningful capability. ### Tool / function calling — BFCL v4 Berkeley Function-Calling Leaderboard (v4), 1,040 cases, evaluated with the HuggingFace backend. | Category | Accuracy | n | What it measures | |----------|----------|---|------------------| | Simple | 80.5% | 322/400 | one call, one tool available | | Multiple | 71.5% | 143/200 | pick the right tool from several | | Parallel | 65.5% | 131/200 | several calls in one turn | | Irrelevance | 5.4% | 13/240 | abstain when no tool fits | | **Overall** | **58.6%** | 609/1040 | | | **Function-calling avg** | **74.5%** | 596/800 | excludes irrelevance | **Takeaways:** - **Strong calling ability for a 0.6B model.** When a call is warranted it is correct ~74.5% of the time — right tool, valid arguments, clean JSON — including 65.5% on the hard parallel / multi-call category. - **The weakness is abstention, not calling.** On queries that match no available tool, the model still tends to emit a call (irrelevance 5.4% → it over-triggers). This is the main driver of the lower overall score and the clearest area for improvement. ### Multilingual comprehension — 4-way multiple choice Multiple-choice comprehension across the model's three languages (random baseline = 25%), evaluated with the chat template and `enable_thinking=False`. | Language | Accuracy | |----------|----------| | English | 53.7% ±1.7 | | Russian | 50.0% ±1.7 | | Kazakh | 41.8% ±1.6 | **Takeaways:** - Well above the 25% random baseline in all three languages — real comprehension in English, Russian, and Kazakh. - Resource ordering (en > ru > kk) is as expected; Kazakh at 41.8% is clearly non-trivial. - Evaluating with the chat template and `enable_thinking=False` adds ~5–6 points per language versus a raw prompt — another reason to serve the model with its chat template (see serving instructions above). --- ## Intended use & limitations Farabi-0.6B is intended as a helpful general-purpose and agentic assistant, with a focus on Kazakh-language use cases. As a small model, it can make factual mistakes, and outputs should be verified for high-stakes or factual-critical applications. It should be used responsibly and in accordance with applicable laws and the base model's license. --- ## Citation If you use this model, please credit the author: > Nurgali Kadyrbek — Farabi-0.6B. > https://www.linkedin.com/in/nurgali-kadyrbek-504260231/