prince-canuma commited on
Commit
b4b5c27
·
verified ·
1 Parent(s): 7290a05

Add MTP drafter model card

Browse files
Files changed (1) hide show
  1. README.md +56 -6
README.md CHANGED
@@ -1,19 +1,69 @@
1
  ---
2
- language: en
 
 
 
 
3
  library_name: mlx
 
4
  tags:
5
  - mlx
6
- pipeline_tag: image-text-to-text
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
- # mlx-community/gemma-4-31B-it-qat-assistant-bf16
10
 
11
- ## Use with mlx
 
 
 
 
12
 
13
  ```bash
14
- pip install -U mlx-vlm
 
 
 
 
 
15
  ```
16
 
 
 
17
  ```bash
18
- python -m mlx_vlm.generate --model mlx-community/gemma-4-31B-it-qat-assistant-bf16 --max-tokens 100 --temperature 0.0 --prompt "Describe this image." --image <path_to_image>
 
 
 
 
 
19
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: gemma
5
+ base_model:
6
+ - google/gemma-4-31B-it-qat-q4_0-unquantized-assistant
7
  library_name: mlx
8
+ pipeline_tag: text-generation
9
  tags:
10
  - mlx
11
+ - mlx-vlm
12
+ - gemma4_assistant
13
+ - gemma-4
14
+ - gemma-4-31b
15
+ - gemma
16
+ - mtp
17
+ - speculative-decoding
18
+ - draft-model
19
+ - bf16
20
+ inference: false
21
  ---
22
 
23
+ # gemma-4-31B-it-qat-assistant-bf16
24
 
25
+ This repository contains Multi-Token Prediction (MTP) drafter weights split from `google/gemma-4-31B-it-qat-q4_0-unquantized-assistant` for use with `mlx-vlm` speculative decoding.
26
+
27
+ This is not a standalone chat or text-generation model. Load it as the draft model alongside a compatible Gemma 4 31B target checkpoint.
28
+
29
+ ## Use with mlx-vlm
30
 
31
  ```bash
32
+ uv run mlx_vlm.generate \
33
+ --model google/gemma-4-31B-it \
34
+ --draft-model mlx-community/gemma-4-31B-it-qat-assistant-bf16 \
35
+ --draft-kind mtp \
36
+ --prompt "Describe this image." \
37
+ --max-tokens 256
38
  ```
39
 
40
+ For local weights:
41
+
42
  ```bash
43
+ uv run mlx_vlm.generate \
44
+ --model /path/to/target-model \
45
+ --draft-model /path/to/gemma-4-31B-mtp \
46
+ --draft-kind mtp \
47
+ --prompt "Describe this image." \
48
+ --max-tokens 256
49
  ```
50
+
51
+ ## Model Details
52
+
53
+ - Model type: `gemma4_assistant`
54
+ - Target architecture: Gemma 4 31B
55
+ - Precision: bf16
56
+ - Runtime: MLX / `mlx-vlm`
57
+ - Format: Safetensors with MLX-compatible config and tokenizer files
58
+
59
+ The stored tensors are bf16 MLX-compatible drafter weights.
60
+
61
+ ## Intended Use
62
+
63
+ Use this repo only as a speculative decoding drafter for compatible Gemma 4 31B checkpoints. The target model verifies drafted tokens, while this MTP model proposes candidate tokens per decoding step.
64
+
65
+ ## Limitations
66
+
67
+ This checkpoint requires runtime support for Gemma 4 MTP draft models in `mlx-vlm`. Standard standalone generation through generic Transformers APIs is not expected to work with this repository by itself.
68
+
69
+ Please refer to the upstream `google/gemma-4-31B-it` model card and license terms for model usage constraints.