26b with w416a-ct

#3
by meganoob1337 - opened

hello thanks for this quantized model, any chance you will do the 26b at w4a16-ct as well? or did I just overlook it?

hello thanks for this quantized model, any chance you will do the 26b at w4a16-ct as well? or did I just overlook it?

I just happened to see this on the vllm page
" Note The 26B-A4B MoE model is not included — its small expert dimensions (704) cause excessive quality loss with 4-bit quantization. For the MoE model, use --quantization int8_per_channel_weight_only (online, no checkpoint needed) which provides ~47% memory savings with negligible quality impact."

Sign up or log in to comment