--- license: apache-2.0 base_model: Qwen/Qwen3.5-27B tags: - qwen - qwen3.5 - reasoning - chat - text-only - 40b - upscale ---

Qwen3.5-27B to 40B Upscale

⚠️ BIG WARNING ⚠️

NOT TO BE USED AS IS, AND REQUIRES FINE-TUNING.

This upscaled model produces gibberish out of the box and currently has a default PPL of 500k.

Send me your support to help me feed the data beast! also taking comissions for universe specific models

Support on Ko-fi

Model Description

This model is an interleaved upscale of Qwen3.5-27B to 40B. It expands the base architecture from 64 to 96 layers using an interleaved copying technique.

Upscaling Details:

  • Layer Expansion: 64 to 96 layers.
  • Copying Strategy: Layers were copied in groups of 4 to successfully keep the 3>1 linear to full attention ratio.
  • Added Noise: Noise was purposefully introduced during the upscaling process to aid in future fine-tuning recovery.
    ✦ Layers o_proj, down_proj, and out_proj were mapped with σ = 0.000625.
    ✦ The remaining layers were mapped with σ = 0.0025.

Acknowledgements

  • Credit to Qwen for the powerful Qwen3 architecture and for releasing their work openly.