RthItalia commited on
Commit
a008b2c
·
verified ·
1 Parent(s): 69b9998

Update model card for full single safetensors

Browse files
Files changed (1) hide show
  1. README.md +17 -47
README.md CHANGED
@@ -1,58 +1,28 @@
1
  ---
2
- language:
3
- - en
4
- - zh
5
- - it
6
  license: other
7
- tags:
8
- - quantization
9
- - qwen
10
- - qwen2.5
11
- - mixed-precision
12
- - inference
13
  library_name: transformers
14
- pipeline_tag: text-generation
 
 
 
 
 
15
  ---
16
 
17
- # NanoLLM Qwen v3.1
18
-
19
- NanoLLM v3.1 artifacts are compact overlay artifacts for Qwen2.5 models. The loader starts from the base model in bitsandbytes 8-bit mode, then replaces the modules that passed the NanoLLM cascade with `TrueQuantLinear` modules.
20
 
21
- ## Validated Artifacts
22
 
23
- | Model | Artifact | Zip size | Gate | Avg cosine | Min cosine | Locked / 8-bit pending |
24
- | --- | --- | ---: | --- | ---: | ---: | ---: |
25
- | Qwen2.5-3B-Instruct | `final_artifact_3B.zip` | 799,189,680 bytes | PASS | 0.990625 | 0.984375 | 143 / 109 |
26
- | Qwen2.5-7B-Instruct | `final_artifact_7B.zip` | 891,419,698 bytes | PASS | 0.990625 | 0.98046875 | 66 / 130 |
27
- | Qwen2.5-14B-Instruct | `final_artifact_Qwen2.5-14B-Instruct_pruned_pass.zip` | 1,482,019,132 bytes | PASS | 0.990625 | 0.98046875 | 76 / 260 |
28
-
29
- The current release gate checks average next-token-logit cosine similarity against the 8-bit reference: `avg >= 0.99`. Minimum cosine is reported as a diagnostic.
30
-
31
- ## Quick Start
32
 
33
  ```python
34
- from load_artifact import load_artifact
35
-
36
- model, tokenizer, spec = load_artifact("final_artifact_Qwen2.5-14B-Instruct")
37
- prompt = "Write a Python function to sort a list using bubble sort."
38
- inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
39
- outputs = model.generate(**inputs, max_new_tokens=160, do_sample=False)
40
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
41
  ```
42
 
43
- Requirements:
44
-
45
- ```bash
46
- pip install torch transformers accelerate bitsandbytes safetensors
47
- ```
48
-
49
- ## Runtime Notes
50
-
51
- - `build_reference_mode`: `8bit`
52
- - `reference_scope`: `original_baseline`
53
- - `pending_policy`: `leave_in_base_8bit`
54
- - `NANO_LOAD_4BIT=1` can be used experimentally to load the base model in 4-bit, but the release tests use 8-bit.
55
-
56
- ## License
57
-
58
- The NanoLLM quantization pipeline is proprietary/internal. Generated artifacts are published for research and evaluation subject to the repository license terms.
 
1
  ---
 
 
 
 
2
  license: other
 
 
 
 
 
 
3
  library_name: transformers
4
+ base_model: Qwen/Qwen2.5-3B-Instruct
5
+ tags:
6
+ - nanollm
7
+ - qwen2.5
8
+ - safetensors
9
+ - text-generation
10
  ---
11
 
12
+ # NanoLLM Qwen2.5-3B-Instruct v3.1
 
 
13
 
14
+ Self-contained full NanoLLM model is in `full_single/`.
15
 
16
+ Usage:
 
 
 
 
 
 
 
 
17
 
18
  ```python
19
+ from transformers import AutoModelForCausalLM, AutoTokenizer
20
+ repo_id = "RthItalia/NanoLLM-Qwen2.5-3B-v3.1"
21
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="full_single", use_fast=True)
22
+ model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="full_single", device_map="auto")
 
 
 
23
  ```
24
 
25
+ Validation against 8-bit reference:
26
+ - avg cosine: 0.98984375
27
+ - min cosine: 0.984375
28
+ - gate: avg >= 0.985