Will there be MTP support?

#2
by prjsingularity - opened

Qwen 3.5 supports MTP, which is a very good booster for tps, but it's present in the new checkpoints, maybe not from those these project was made. so i'm curious, do you plan for implementation of MTP?

Ready.Art org

Qwen 3.5 supports MTP, which is a very good booster for tps, but it's present in the new checkpoints, maybe not from those these project was made. so i'm curious, do you plan for implementation of MTP?

Can you provide a step by step guide with axolotl?

At the time this model was trained we trained on all layers. MTP should be supported. If it's not, it's because axolotl didn't support it at the time.

Ready.Art org

Infact, it does appear the MTP layers are in the model weights.

What issue are you running into?

Ready.Art org

Hmm... do we need to recreate the GGUF for MTP support?

@gecfdo thoughts?

Hmm... do we need to recreate the GGUF for MTP support?

@gecfdo thoughts?

Old llama.cpp used to just drop all MTPs, so the old GGUFs should normally not have any MTPs.

Ready.Art org

Hmm... do we need to recreate the GGUF for MTP support?

@gecfdo thoughts?

Old llama.cpp used to just drop all MTPs, so the old GGUFs should normally not have any MTPs.

That's the conclusion I've come to after reviewing the GGUF data, yeah.

I'm mulling on how to handle this.

Also, getting ready to make training data for qwen 3.6 27B so... ;)

FYI, it is also possible to graft the MTP heads on as long as the finetune hasnt diverged too far from the norm. i successfully tested it out last night.

grab the convert.py and the 27B_MTP.gguf from here:https://huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF

then run it like this:
python.exe convert.py .\Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q5_K_S.gguf .\27B_MTP.gguf .\Qwen3.6-27B-NEO-CODE-HERE-2T-OT-MTP-Q5_K_S.gguf
Reading target: .\Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q5_K_S.gguf
Reading source: .\27B_MTP.gguf
Target tensors: 851, KVs: 42
Source tensors: 15, KVs: 24

Arch: qwen35
Target block_count: 64
Source block_count: 65, nextn_predict_layers: 1

Extra tensors to transplant: 15

Writing output: .\Qwen3.6-27B-NEO-CODE-HERE-2T-OT-MTP-Q5_K_S.gguf
Copying 866 tensors...
Copied 50/866 tensors
Copied 100/866 tensors
Copied 150/866 tensors
Copied 200/866 tensors
Copied 250/866 tensors
Copied 300/866 tensors
Copied 350/866 tensors
Copied 400/866 tensors
Copied 450/866 tensors
Copied 500/866 tensors
Copied 550/866 tensors
Copied 600/866 tensors
Copied 650/866 tensors
Copied 700/866 tensors
Copied 750/866 tensors
Copied 800/866 tensors
Copied 850/866 tensors
Copied 866/866 tensors

Output: .\Qwen3.6-27B-NEO-CODE-HERE-2T-OT-MTP-Q5_K_S.gguf
Size: 19.44 GB
Tensors: 866

Validating output...
Spot-checking tensor data integrity...
token_embd.weight: OK (28225bb43049ef5c)
blk.64.nextn.eh_proj.weight: OK (f89b048f279f958b)
OK β€” all checks passed

Done. Output: .\Qwen3.6-27B-NEO-CODE-HERE-2T-OT-MTP-Q5_K_S.gguf

Sign up or log in to comment