Does it support MTP?
#6
by sharon8811 - opened
Does it support MTP?
vllm serve nvidia/Qwen3.6-27B-NVFP4
--host 0.0.0.0
--port 8000
--tensor-parallel-size 1
--trust-remote-code
--kv-cache-dtype fp8
--attention-backend flashinfer
--moe-backend marlin
--gpu-memory-utilization 0.4
--max-model-len 262144
--max-num-seqs 4
--max-num-batched-tokens 8192
--enable-chunked-prefill
--async-scheduling
--enable-prefix-caching
--speculative-config '{"method":"mtp","num_speculative_tokens":3,"moe_backend":"triton"}'
--load-format fastsafetensors
--reasoning-parser qwen3
--tool-call-parser qwen3_xml
--enable-auto-tool-choice
yes, MTP works on eugr's vllm. Tested.
model also heavily hallucinated on my setup so I switched back to Qwopus.