inspirebek commited on
Commit
d4f8635
·
verified ·
1 Parent(s): 93de1df

docs: add model card

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -2,7 +2,17 @@
2
  language:
3
  - uz
4
  - en
5
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
6
  library_name: transformers
7
  pipeline_tag: text-generation
8
  base_model: inspirebek/qwen3-4b-uzbek-v2
@@ -42,6 +52,25 @@ vllm serve inspirebek/qwen3-4b-uzbek-v2-awq --quantization awq --dtype float16
42
  - `w_bit=4, q_group_size=128, zero_point=True`
43
  - calibration: 128 uzbek samples (2048 tokens each) from `fluency.jsonl`
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ## sibling formats
46
 
47
  - [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2)
 
2
  language:
3
  - uz
4
  - en
5
+ license: cc-by-nc-4.0
6
+ datasets:
7
+ - yakhyo/uz-wiki
8
+ - tahrirchi/uz-books-v2
9
+ - tahrirchi/uz-crawl
10
+ - saillab/alpaca_uzbek_taco
11
+ - behbudiy/alpaca-cleaned-uz
12
+ - UAzimov/uzbek-instruct-llm
13
+ - CohereLabs/aya_collection_language_split
14
+ - med-alex/qa_mt_ru_to_uzn
15
+ - med-alex/qa_mt_tr_to_uzn
16
  library_name: transformers
17
  pipeline_tag: text-generation
18
  base_model: inspirebek/qwen3-4b-uzbek-v2
 
52
  - `w_bit=4, q_group_size=128, zero_point=True`
53
  - calibration: 128 uzbek samples (2048 tokens each) from `fluency.jsonl`
54
 
55
+ ## datasets
56
+
57
+ **stage a — fluency (continued pretraining):**
58
+
59
+ - [`yakhyo/uz-wiki`](https://huggingface.co/datasets/yakhyo/uz-wiki) · MIT
60
+ - [`tahrirchi/uz-books-v2`](https://huggingface.co/datasets/tahrirchi/uz-books-v2) · MIT
61
+ - [`tahrirchi/uz-crawl`](https://huggingface.co/datasets/tahrirchi/uz-crawl) · Apache-2.0
62
+
63
+ **stage b — instruct (sft):**
64
+
65
+ - [`saillab/alpaca_uzbek_taco`](https://huggingface.co/datasets/saillab/alpaca_uzbek_taco) · CC-BY-NC-4.0
66
+ - [`behbudiy/alpaca-cleaned-uz`](https://huggingface.co/datasets/behbudiy/alpaca-cleaned-uz) · CC-BY-4.0
67
+ - [`UAzimov/uzbek-instruct-llm`](https://huggingface.co/datasets/UAzimov/uzbek-instruct-llm) · Apache-2.0
68
+ - [`CohereLabs/aya_collection_language_split`](https://huggingface.co/datasets/CohereLabs/aya_collection_language_split) · Apache-2.0
69
+ - [`med-alex/qa_mt_ru_to_uzn`](https://huggingface.co/datasets/med-alex/qa_mt_ru_to_uzn) · unspecified
70
+ - [`med-alex/qa_mt_tr_to_uzn`](https://huggingface.co/datasets/med-alex/qa_mt_tr_to_uzn) · unspecified
71
+
72
+ > ⚠️ licensing note: `saillab/alpaca_uzbek_taco` is cc-by-nc-4.0, which restricts commercial use of derivative models. downstream users who need a fully permissive license should retrain without that subset.
73
+
74
  ## sibling formats
75
 
76
  - [`inspirebek/qwen3-4b-uzbek-v2`](https://huggingface.co/inspirebek/qwen3-4b-uzbek-v2)