iq3_xs spits out gibberish

#2
by Hansdudin202 - opened

Tested with the latest version of llama cpp, no reasoning template

Latitude org

Would you mind checking whether https://huggingface.co/bartowski/LatitudeGames_Equinox-31B-GGUF/tree/main 's quants suffer from the same issue?

I know Google models are weirdly fussy about what tensors get quanted below q8 but I couldn't tell you exactly which offhand

Edit: I think it's *.attn_q.weight? And you don't wanna go below q6_k ?
Idk a lot of the quant cookers keep their analysis to themselves

Not the person you asked, but I noticed an immediate and clear difference in writing quality, word choice, and scene tracking switching from the Latitude Q5 to Bartowski Q5. (But I did also go from Q5-M to Q5-L... but I'm not sure that would account for the improvement.) Maybe his imatrix thingy is super special?

EDIT: I'm using LMStudio, beta with beta runtime. Serving to SillyTavern. Settings as listed in model card.
Both do work, though. No gibberish.

Latitude org

It very well could be!

If more issues continue to be reported I'll redirect the readme to point to Bartowski since he's been my #1 quant guy for years now.

Google models...

Would you mind checking whether https://huggingface.co/bartowski/LatitudeGames_Equinox-31B-GGUF/tree/main 's quants suffer from the same issue?

I checked it out. The same problem

Checked latest version of llamacpp?
Checked you're using a sampler either suggested by latitude or by Google? (Temp 1 too p 0.95 top k 64 min p 0.0 for the latter)

Checked the sga 256 hash? LLM models esp gguf can be loaded and inferenced even if they've gotten corrupted in download.

I can confirm that the iq quants are completely broken, Bartowski's too. QKM works just fine.

Latitude org

I went ahead and deleted all the IQ quants for now.

IIRC all quants with IQ3_S tensors (so basically every IQ3) have issues with llama.cpp compiled with CUDA 13.2 and won't work correctly. Recompiling llama.cpp with an unaffected CUDA version should fix it. But apparently it affects LMStudio?

Can confirm. SHA 256 sum matches.
https://huggingface.co/mradermacher/Equinox-31B-i1-GGUF/blob/main/Equinox-31B.i1-IQ3_M.gguf
image

Update: llama.cpp, Windows x64 (CUDA 12) - CUDA 12.4 DLLs

Sign up or log in to comment