GGUF
conversational

Is faster than intel´s autoround?

#1
by Trilogix1 - opened

In my test your model is faster then intel autoround, any idea why? 7.3tps agains 4tps.
Any plan to autoround qwen 397b?

Trilogix1 changed discussion title from Is faster then intel´s autoround? to Is faster than intel´s autoround?

Unfortunately, I don't plan to do more GGUF autoround models. It requires too much memory and also takes time. Minimax required more than 128GB RAM and took around 6 hours to complete.

Regarding it being faster, it's because most of the layers in this model have been quantized as 4-bit, differently from what Q4_K_M usually does (which is using 4-bit and 6-bit layers mixed). This happened because AutoRound's script outputs 4-bit-optimized layers, and there's no need to upscale 4-bit to 6-bit.

Felladrin changed discussion status to closed

Sign up or log in to comment