31B-Dense model missing from HuggingFace

#2
by smcleod - opened

Hello, The model's card and blog post mentions a '31B-Dense' model, but it seems that is missing from HuggingFace? A 31B dense model (with MTP) in a single GGUF would be absolutely killer for agentic coding.

maybe there is some US government request regarding this 31B model as well? :)

modelscope.cn does not have 31B model , why?
"US government request" - are you joking they released full 397b model

of course I am joking 😀 (although nothing would surprise me anymore...)

This model resolved what qwen 27b q8 XL could not, and it's fast. It's not replacement by far.
But where is 31B dense model to try against qwen 27b q8 XL?

I hope 31B is delayed so they can also finetune the QAT version...

70b version would be the best killer, distill it, please

if these fine tunes distillations is not a scam altogether why nobody distills the Kimi 2.7 coder 1T model to make Qwen 3.6 27B better? The model is opensource nobody rents a server and does the job, seriously? And on kickstarter gets tons of money for random crap.

Distillation is imo misrepresented by some as a trivial process to arbitrarily transfer broad model capability. Today's frontier models (even the open-weights ones) are so delicate that you can most easily "break" them by further training, unless you just want to transfer/encourage some very narrow behavior/characteristic.

seems the base is qwen3.5, maybe qwen3.6-27B is better for the dense model base?

@weisunding yes, I'm assuming they did the fine tuning back before Qwen 3.6 was released perhaps? Regardless their 31B is based on Gemma 4 which typically hasn't been as good as Qwen 3.5/3.6 for coding but being a larger active model it would be very interesting to see how much they could improve from the base model.

from qwen3.5 to qwen3.6 or gemma4 ,about 1 month, so....

Tested for days now. NVFP4 with q8 cache MTP 3:

Qwen3.6:

  • no loops
  • no stuck
  • better reasoning
  • 100t/s
  • 128k ctx

Ornith 1.0

  • 300t/s
  • some loops
  • more errors
  • sometimes stuck
  • 300t/s
  • 190K ctx

Well it is fast, but prone to errors. Have hopes for the dense 31b model.

@darksidewalker that's not the 31B dense model we're talking about here though?

Yes, I said I have high hopes for the dense model, since the moe fails

At this point I'm not sure the 31B dense model actually exists.

Sign up or log in to comment