sh111111111111111 commited on
Commit
dba8f88
Β·
verified Β·
1 Parent(s): 86ae5f9

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Qwen3.5-4B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Qwen3.5-4B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Qwen3.5-4B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Qwen3.5-4B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
40
+ Qwen3.5-4B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3.5-4B-Q3_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5c3667c3eb22674b5d89184f5903784f338ff3fe94f19eb8c19249aa9f6149f
3
+ size 1980940384
Qwen3.5-4B-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:036473af3820990fc56ca6bff29d267acf3d0d0726ed1cfda8f2510ce683bdb3
3
+ size 2606333024
Qwen3.5-4B-Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee3f496e4ad5f814409c0aa4eb1841e59106b666ba03ec7ae6816d2f50563d14
3
+ size 2706322784
Qwen3.5-4B-Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57213a5168c922a21f6752a8c85593daf08c3618dfccaeae3560d08410266f77
3
+ size 3251025504
Qwen3.5-4B-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6716f24df57ead0245b3a0582584934799dbff532d6959d37671b0d5293b5eb8
3
+ size 4482403424
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: [en, zh]
3
+ license: apache-2.0
4
+ library_name: gguf
5
+ base_model: Qwen/Qwen3.5-4B
6
+ tags: [quantized, gguf, mixed-precision, shapelearn, shapelearn3, qwen3]
7
+ pipeline_tag: text-generation
8
+ ---
9
+
10
+ # Qwen3.5-4B β€” ShapeLearn3 Mixed-Precision GGUF
11
+
12
+ Mixed-precision GGUF quantizations of [Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B)
13
+ β€” the first ShapeLearn release of this model. ShapeLearn3 keeps the Hessian-sensitivity
14
+ front-end to set each level's bit budget, but hands the per-tensor allocation to an
15
+ **error-minimizing solver** (built on llama.cpp's `--target-bpw`) that distributes bits
16
+ across *every* tensor β€” including the hybrid DeltaNet/SSM tensors β€” to minimize
17
+ imatrix-weighted quantization error at the target size.
18
+
19
+ ## Available Quantizations
20
+
21
+ KLD vs the BF16 source is the primary quality metric (mean and the robust 99.9th
22
+ percentile); wikitext-2 perplexity is reported alongside.
23
+
24
+ | File | BPW | Size | wiki PPL ↓ | KL-mean ↓ | KL-99.9% ↓ | Use Case |
25
+ |---|---|---|---|---|---|---|
26
+ | [`Qwen3.5-4B-Q8_0.gguf`](./Qwen3.5-4B-Q8_0.gguf) | 8.5 | 4.48 GB | 8.638 | 0.0028 | 0.073 | Near-lossless reference |
27
+ | [`Qwen3.5-4B-Q6_K.gguf`](./Qwen3.5-4B-Q6_K.gguf) | 6.2 | 3.25 GB | 8.767 | 0.0125 | 0.396 | High quality |
28
+ | [`Qwen3.5-4B-Q5_K_M.gguf`](./Qwen3.5-4B-Q5_K_M.gguf) | 5.1 | 2.71 GB | 9.176 | 0.0255 | 0.803 | Balanced quality and size |
29
+ | [`Qwen3.5-4B-Q4_K_M.gguf`](./Qwen3.5-4B-Q4_K_M.gguf) | 5.0 | 2.61 GB | 9.246 | 0.0292 | 0.952 | Best quality-to-size ratio |
30
+ | [`Qwen3.5-4B-Q3_K_S.gguf`](./Qwen3.5-4B-Q3_K_S.gguf) | 3.8 | 1.98 GB | 9.358 | 0.0978 | 3.583 | Maximum compression |
31
+
32
+ **Recommended:** Q4_K_M β€” KL-mean 0.029 at 2.61 GB.
33
+
34
+ ## How It Compares
35
+
36
+ Same harness, same metrics, against a fixed per-suffix LP-recipe baseline at matched BPW
37
+ (within 0.5%) β€” the head-to-head that motivated ShapeLearn3's allocator:
38
+
39
+ | Level | LP recipe KL-mean | **ShapeLearn3** KL-mean |
40
+ |---|---|---|
41
+ | Q3_K_S | 0.1525 | **0.0978** |
42
+ | Q4_K_M | 0.0352 | **0.0292** |
43
+ | Q5_K_M | 0.0258 | **0.0255** |
44
+
45
+ The allocator wins KL divergence at every level β€” largest at aggressive quantization
46
+ (Q3_K_S: βˆ’36% KL-mean) β€” and our internal worst-token measurements improve at every level
47
+ as well. A key reason: it allocates bits to the hybrid DeltaNet tensors (`attn_qkv`,
48
+ `attn_gate`, `ssm_*`) that a standard 7-suffix recipe leaves at the base type β€” and our
49
+ tensor-health scan shows those exact tensors are the statistical outliers of this
50
+ architecture.
51
+
52
+ ## Key Sensitivity Findings (Qwen3.5-4B)
53
+
54
+ - **blk.3 (early layer) is most sensitive** β€” the same early-layer pattern as Qwen3.5-9B,
55
+ and the opposite of dense Qwen3-4B-Instruct (blk.34). The hybrid Qwen3.5 family
56
+ concentrates sensitivity early.
57
+ - Attention **K projections are consistently β‰₯ V** in sensitivity.
58
+ - **DeltaNet/SSM tensors are distribution outliers** (high kurtosis `ssm_conv1d`, shifted
59
+ `ssm_alpha/beta/out`, `attn_qkv`, `attn_gate` vs same-role peers) β€” covering them in the
60
+ allocation matters; `ssm_conv1d` itself is kept at F32 by llama.cpp.
61
+
62
+ ## How It Works
63
+
64
+ 1. **Hessian sensitivity** β€” compute H_diag = mean(XΒ²) per layer on calibration data; this
65
+ sets each level's overall bit budget.
66
+ 2. **Error-minimizing per-tensor allocation** β€” an imatrix-weighted solver (llama.cpp
67
+ `--target-bpw`) assigns a quant type to every tensor to minimize total quantization
68
+ error at the target BPW, covering attention, FFN, and the hybrid DeltaNet/SSM tensors.
69
+ 3. **imatrix** β€” importance matrix computed over wikitext guides the per-tensor error.
70
+ 4. **GGUF export** β€” produced with stock `llama-quantize`.
71
+
72
+ ## Usage
73
+
74
+ ```bash
75
+ hf download sh111111111111111/Qwen3.5-4B-ShapeLearn3-GGUF \
76
+ Qwen3.5-4B-Q4_K_M.gguf --local-dir .
77
+
78
+ llama-cli -m Qwen3.5-4B-Q4_K_M.gguf -cnv
79
+ llama-server -m Qwen3.5-4B-Q4_K_M.gguf --port 8080
80
+ ```
81
+
82
+ > Note: Qwen3.5 GGUFs are not currently runnable in Ollama (vision/mmproj handling is not
83
+ > yet supported there); use llama.cpp or LM Studio.
84
+
85
+ ## Benchmark Details
86
+
87
+ NVIDIA GB10 ATOM (128 GB unified memory, aarch64). llama.cpp with `--target-bpw`
88
+ (PR #15550). KLD via `llama-perplexity --kl-divergence` against BF16-source logits over
89
+ wikitext-2 (mean / median / 99.9th percentile reported; the single-token KL-max is
90
+ omitted as an unstable order statistic). wikitext-2 PPL via `llama-perplexity -c 2048`.
91
+ Downstream (HellaSwag / WinoGrande / ARC / MMLU) tracked internally.
92
+
93
+ ## License
94
+
95
+ Apache 2.0, inherited from [Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B).