inspirebek's picture
docs: add model card
32c24b3 verified
metadata
language:
  - uz
  - en
license: cc-by-nc-4.0
datasets:
  - yakhyo/uz-wiki
  - tahrirchi/uz-books-v2
  - tahrirchi/uz-crawl
  - saillab/alpaca_uzbek_taco
  - behbudiy/alpaca-cleaned-uz
  - UAzimov/uzbek-instruct-llm
  - CohereLabs/aya_collection_language_split
  - med-alex/qa_mt_ru_to_uzn
  - med-alex/qa_mt_tr_to_uzn
library_name: gguf
pipeline_tag: text-generation
base_model: inspirebek/qwen3-4b-uzbek-v2
tags:
  - uzbek
  - qwen3
  - quantized
  - gguf
  - llama.cpp
  - ollama

qwen3-4b-uzbek-v2-gguf

gguf suite for inspirebek/qwen3-4b-uzbek-v2. cpu / apple silicon / vulkan / rocm via llama.cpp, ollama, lm studio, etc.

files

quant size notes
f16 8.8 gb reference fp16
Q8_0 4.7 gb near-lossless
Q6_K 3.6 gb recommended for quality
Q5_K_M 3.2 gb balanced
Q5_K_S 3.1 gb slightly lighter
Q4_K_M 2.7 gb recommended for most users
Q4_K_S 2.6 gb smaller, slight quality loss
Q3_K_M 2.2 gb aggressive
Q2_K 1.8 gb edge / low-ram only

usage

llama.cpp:

llama-cli -m qwen3-4b-uzbek-v2-q4_k_m.gguf -p "Salom! Qalaysan?" -cnv

ollama:

ollama run hf.co/inspirebek/qwen3-4b-uzbek-v2-GGUF:Q4_K_M

quantization

converted from the bf16 merged model via llama.cpp's convert_hf_to_gguf.pyllama-quantize. no calibration data (k-quants are statistics-only).

datasets

stage a — fluency (continued pretraining):

stage b — instruct (sft):

⚠️ licensing note: saillab/alpaca_uzbek_taco is cc-by-nc-4.0, which restricts commercial use of derivative models. downstream users who need a fully permissive license should retrain without that subset.

sibling formats