vLLM - Looping prevention

#39
by janreges3 - opened

I tested the original FP8 version Qwen3.5-35B-A3B-FP8 and the AWQ version from cyankiwi and unfortunately both versions suffer from infinite looping in the default settings during reasoning and more complex prompts.

This setting of the default parameters will help, while the min_p setting probably has the greatest positive impact.

--override-generation-config '{"temperature": 1.0, "top_p": 1.0, "top_k": 40, "min_p": 0.2}'

I tested the original FP8 version Qwen3.5-35B-A3B-FP8 and the AWQ version from cyankiwi and unfortunately both versions suffer from infinite looping in the default settings during reasoning and more complex prompts.

This setting of the default parameters will help, while the min_p setting probably has the greatest positive impact.

--override-generation-config '{"temperature": 1.0, "top_p": 1.0, "top_k": 40, "min_p": 0.2}'

@janreges3
Have you tried using structured outputs? For JSON, any output fails for me too; only the unquantized model works.

Sign up or log in to comment