How, exactly, did you make these?

#1
by Downtown-Case - opened

I saw the ik_llama.cpp issue, and tried making an "unfused" bf16 GGUF by rolling back to Aes's branch before the fusing commit. But ik_llama.cpp still refuses to make a valid quantization.

Sorry, I just saw this. Not sure why it doesn't notify me when people comment on my repos sometimes.
Looks like these didn't upload properly (I'll resume them).

How, exactly, did you make these?

These quants here are just re-uploads of Aes's original quants from before he squashed the repo, unmodified.

tried making an "unfused" bf16 GGUF

For my "unfused" Mimo-V2.5-Pro quants gghfez/MiMo-V2.5-Pro-ik_llama-unfused-GGUF, I didn't quantize them myself, I simply took some other people's mainline quants and un-fused the attn_qkv.weight -> attn_q.weight, attn_k.weight, attn_v.weight

I didn't try to unfuse the BF16 and then quantize it as my script requires 2 * the source gguf size, and Mimo-V2.5-Pro is 2.05 TiB at BF16.
I just don't have 4.10TiB to do it.

I've fixed the uploads here.
Those are the only models I downloaded before the original repo got squashed.

Sign up or log in to comment