--- language: - uz - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation base_model: inspirebek/qwen3-4b-uzbek-v2 tags: - uzbek - qwen3 - quantized - 4-bit - awq --- # qwen3-4b-uzbek-v2-awq AWQ 4-bit activation-aware quant (~3.4 GB) of [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2). Fast GPU inference via vLLM / TGI / transformers. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer tok = AutoTokenizer.from_pretrained("inspirebek/qwen3-4b-uzbek-v2-awq") model = AutoModelForCausalLM.from_pretrained( "inspirebek/qwen3-4b-uzbek-v2-awq", device_map="auto", ) ``` With vLLM: ```bash vllm serve inspirebek/qwen3-4b-uzbek-v2-awq --quantization awq --dtype float16 ``` ## Quantization - method: AWQ (`autoawq` 0.2.9, GEMM version) - `w_bit=4, q_group_size=128, zero_point=True` - calibration: 128 Uzbek samples (2048 tokens each) from `fluency.jsonl` ## Sibling formats - [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2) - [`inspirebek/qwen3-4b-uzbek-v2-lora`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-lora) - [`inspirebek/qwen3-4b-uzbek-v2-bnb-4bit`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-bnb-4bit) - [`inspirebek/qwen3-4b-uzbek-v2-awq`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-awq) - [`inspirebek/qwen3-4b-uzbek-v2-GGUF`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-GGUF)