doubled TPS NVFP4 vs FP8
🔥 1
#15 opened about 6 hours ago
by
sSeeMyRolexXx
Fallback to marlin kernel give wired/incorrect and sometime gable result
#14 opened about 11 hours ago
by
sapjunior
this model hallucinates easily
#13 opened about 14 hours ago
by
batsclamp
vLLM + HGX B200: Your GPU does not have native support for FP4 computation but FP4 quantization is being used.
👍 3
#12 opened 1 day ago
by
cgelias
Works via Ollama, but the ~40 tok/s MTP speedup requires raw llama-server (Ollama has no --spec-type flag) - plus vLLM lockup findings on DGX Spark
1
#11 opened 1 day ago
by
darkmatter2222
为什么全是!!!!!!!!!
3
#10 opened 1 day ago
by
jiecong
Why does this say 18B?
1
#9 opened 1 day ago
by
darkmatter2222
NVFP4 vs FP8 on an RTX PRO 6000 Blackwell (vLLM 0.24): sadly, FP4 is still not faster than FP8 - Qwen3.6‑27B
👀👍 11
1
#8 opened 1 day ago
by
janreges3
CUDA error: an illegal memory access was encountered using vllm(sm120)
1
#7 opened 1 day ago
by
shakhizat
Does it support MTP?
➕👍 3
3
#6 opened 2 days ago
by
sharon8811
will it work on rtx 5090 32gb vram ?
8
#5 opened 2 days ago
by
arunsahu44
nvidia/Qwen3.6-27B-NVFP4 outputs repeated "!" / "d" tokens on Blackwell SM120
➕ 1
2
#4 opened 2 days ago
by
richgua
Could we include BF16 as well as FP8 in the benchmarks please?
➕👀 5
#3 opened 2 days ago
by
haydonryan
Qwen-AgentWorld-35B-A3B-NVFP4?
👍 1
#2 opened 3 days ago
by
nagelanping
Approved
👍 1
1
#1 opened 3 days ago
by
jasionkajakub