inspirebek commited on
Commit
93de1df
·
verified ·
1 Parent(s): 5f19f5e

docs: add model card

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -16,9 +16,9 @@ tags:
16
 
17
  # qwen3-4b-uzbek-v2-awq
18
 
19
- AWQ 4-bit activation-aware quant (~3.4 GB) of [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2). Fast GPU inference via vLLM / TGI / transformers.
20
 
21
- ## Usage
22
 
23
  ```python
24
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -30,19 +30,19 @@ model = AutoModelForCausalLM.from_pretrained(
30
  )
31
  ```
32
 
33
- With vLLM:
34
 
35
  ```bash
36
  vllm serve inspirebek/qwen3-4b-uzbek-v2-awq --quantization awq --dtype float16
37
  ```
38
 
39
- ## Quantization
40
 
41
- - method: AWQ (`autoawq` 0.2.9, GEMM version)
42
  - `w_bit=4, q_group_size=128, zero_point=True`
43
- - calibration: 128 Uzbek samples (2048 tokens each) from `fluency.jsonl`
44
 
45
- ## Sibling formats
46
 
47
  - [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2)
48
  - [`inspirebek/qwen3-4b-uzbek-v2-lora`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-lora)
 
16
 
17
  # qwen3-4b-uzbek-v2-awq
18
 
19
+ awq 4-bit activation-aware quant (~3.4 gb) of [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2). fast gpu inference via vllm / tgi / transformers.
20
 
21
+ ## usage
22
 
23
  ```python
24
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
30
  )
31
  ```
32
 
33
+ with vllm:
34
 
35
  ```bash
36
  vllm serve inspirebek/qwen3-4b-uzbek-v2-awq --quantization awq --dtype float16
37
  ```
38
 
39
+ ## quantization
40
 
41
+ - method: awq (`autoawq` 0.2.9, gemm version)
42
  - `w_bit=4, q_group_size=128, zero_point=True`
43
+ - calibration: 128 uzbek samples (2048 tokens each) from `fluency.jsonl`
44
 
45
+ ## sibling formats
46
 
47
  - [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2)
48
  - [`inspirebek/qwen3-4b-uzbek-v2-lora`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2-lora)