inspirebek commited on
Commit
5f19f5e
·
verified ·
1 Parent(s): b87fb51

docs: add model card

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - uz
4
+ - en
5
+ license: apache-2.0
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
+ base_model: inspirebek/qwen3-4b-uzbek-v2
9
+ tags:
10
+ - uzbek
11
+ - qwen3
12
+ - quantized
13
+ - 4-bit
14
+ - awq
15
+ ---
16
+
17
+ # qwen3-4b-uzbek-v2-awq
18
+
19
+ AWQ 4-bit activation-aware quant (~3.4 GB) of [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2). Fast GPU inference via vLLM / TGI / transformers.
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ from transformers import AutoModelForCausalLM, AutoTokenizer
25
+
26
+ tok = AutoTokenizer.from_pretrained("inspirebek/qwen3-4b-uzbek-v2-awq")
27
+ model = AutoModelForCausalLM.from_pretrained(
28
+ "inspirebek/qwen3-4b-uzbek-v2-awq",
29
+ device_map="auto",
30
+ )
31
+ ```
32
+
33
+ With vLLM:
34
+
35
+ ```bash
36
+ vllm serve inspirebek/qwen3-4b-uzbek-v2-awq --quantization awq --dtype float16
37
+ ```
38
+
39
+ ## Quantization
40
+
41
+ - method: AWQ (`autoawq` 0.2.9, GEMM version)
42
+ - `w_bit=4, q_group_size=128, zero_point=True`
43
+ - calibration: 128 Uzbek samples (2048 tokens each) from `fluency.jsonl`
44
+
45
+ ## Sibling formats
46
+
47
+ - [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2)
48
+ - [`inspirebek/qwen3-4b-uzbek-v2-lora`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-lora)
49
+ - [`inspirebek/qwen3-4b-uzbek-v2-bnb-4bit`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-bnb-4bit)
50
+ - [`inspirebek/qwen3-4b-uzbek-v2-awq`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-awq)
51
+ - [`inspirebek/qwen3-4b-uzbek-v2-GGUF`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-GGUF)