Architectural Comparison: Qwen3.5-35B-A3B vs. Qwen3.6-35B-A3B

#12
by BuiDoan - opened

Hi,
Is Qwen3.6-35B-A3B the Post-trained version (SFT & RLHF) of Qwen3.5-35B-A3B?
I noticed that their architectures are completely identical.

Number of Parameters: 35B in total and 3B activated
Hidden Dimension: 2048
Token Embedding: 248320 (Padded)
Number of Layers: 40
Hidden Layout: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
...

Additionally, thank you for sharing such amazing models!
I am a big fan of the 27B version because of its powerful performance.
The balance of efficiency and intelligence in your recent releases is truly impressive

deleted

Yeah, point releases are normally the same architecture, just trained with slightly different data, or even just trained for longer with additional data.

Sign up or log in to comment