nvidia
/

Qwen3.6-27B-NVFP4

Text Generation

Model Optimizer

8-bit precision

Model card Files Files and versions

Resources

View closed (0)

doubled TPS NVFP4 vs FP8

#15 opened about 6 hours ago by

Fallback to marlin kernel give wired/incorrect and sometime gable result

#14 opened about 11 hours ago by

this model hallucinates easily

#13 opened about 14 hours ago by

vLLM + HGX B200: Your GPU does not have native support for FP4 computation but FP4 quantization is being used.

#12 opened 1 day ago by

Works via Ollama, but the ~40 tok/s MTP speedup requires raw llama-server (Ollama has no --spec-type flag) - plus vLLM lockup findings on DGX Spark

#11 opened 1 day ago by

为什么全是！！！！！！！！！

#10 opened 1 day ago by

Why does this say 18B?

#9 opened 1 day ago by

NVFP4 vs FP8 on an RTX PRO 6000 Blackwell (vLLM 0.24): sadly, FP4 is still not faster than FP8 - Qwen3.6‑27B

#8 opened 1 day ago by

CUDA error: an illegal memory access was encountered using vllm(sm120)

#7 opened 1 day ago by

Does it support MTP?

#6 opened 2 days ago by

will it work on rtx 5090 32gb vram ?

#5 opened 2 days ago by

nvidia/Qwen3.6-27B-NVFP4 outputs repeated "!" / "d" tokens on Blackwell SM120

#4 opened 2 days ago by

Could we include BF16 as well as FP8 in the benchmarks please?

#3 opened 2 days ago by

Qwen-AgentWorld-35B-A3B-NVFP4?

#2 opened 3 days ago by

Approved

#1 opened 3 days ago by