⚠️ BIG WARNING ⚠️

NOT TO BE USED AS IS, AND REQUIRES FINE-TUNING.

This upscaled model produces gibberish out of the box and currently has a default PPL of 500k.

Send me your support to help me feed the data beast! also taking comissions for universe specific models

Model Description

This model is an interleaved upscale of Qwen3.5-27B to 40B. It expands the base architecture from 64 to 96 layers using an interleaved copying technique.

Upscaling Details:

Layer Expansion: 64 to 96 layers.
Copying Strategy: Layers were copied in groups of 4 to successfully keep the 3>1 linear to full attention ratio.
Added Noise: Noise was purposefully introduced during the upscaling process to aid in future fine-tuning recovery.
✦ Layers o_proj, down_proj, and out_proj were mapped with σ = 0.000625.
✦ The remaining layers were mapped with σ = 0.0025.

Acknowledgements

Credit to Qwen for the powerful Qwen3 architecture and for releasing their work openly.

Qwen3.5-27B to 40B Upscale

⚠️ BIG WARNING ⚠️

Model Description

Acknowledgements