Does it support MTP?

by sharon8811 - opened 3 days ago

Discussion

sharon8811

3 days ago

Does it support MTP?

shakhizat

3 days ago

vllm serve nvidia/Qwen3.6-27B-NVFP4
--host 0.0.0.0
--port 8000
--tensor-parallel-size 1
--trust-remote-code
--kv-cache-dtype fp8
--attention-backend flashinfer
--moe-backend marlin
--gpu-memory-utilization 0.4
--max-model-len 262144
--max-num-seqs 4
--max-num-batched-tokens 8192
--enable-chunked-prefill
--async-scheduling
--enable-prefix-caching
--speculative-config '{"method":"mtp","num_speculative_tokens":3,"moe_backend":"triton"}'
--load-format fastsafetensors
--reasoning-parser qwen3
--tool-call-parser qwen3_xml
--enable-auto-tool-choice

darkmatter2222

2 days ago

not on DGX Spark: https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4/discussions/11

batsclamp

2 days ago

yes, MTP works on eugr's vllm. Tested.

model also heavily hallucinated on my setup so I switched back to Qwopus.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment