Qwen3.5-9B-Franken-L24-27

A frankenmerged Qwen3.5-9B with layers 24-27 duplicated (32 → 36 layers). No retraining — just layer surgery.

Result: 4/10 → 7/10 on coding benchmarks. 75% capability improvement from copying 4 layers.

What is this?

This model was created by duplicating layers 24-27 (the "reasoning core" at 75-84% depth) of a Qwen3.5-9B-abliterated model. The duplicated layers give the model a second pass through its strongest reasoning circuit before generating output.

Based on research across 6 model architectures and 50+ experiments mapping where functional circuits live in transformers. Full writeup: r/LocalLLaMA post

Benchmark Results

15 LeetCode problems, 3 tiers, code executed against hidden test cases (not LLM-judged):

Model Score Speed
Qwen3.5-9B (original) 4/10 112 tok/s
This model (L24-27 dup) 7/10 ~102 tok/s

Problems gained: three_sum, word_break, longest_common_prefix. Nothing lost from baseline.

Key Findings

  • Layers 24-27 (75-84% depth) are the "reasoning core" in this architecture
  • Layers 18-21 (56-65%) are a "danger zone" — duplicating them drops score to 2/10
  • Stacking multiple circuits or tripling the best one makes things worse
  • Minimum 4 layers needed — 1-2 layers hurt rather than help
  • The danger zone at ~50% depth appears in every architecture tested (dense, MoE, hybrid)
  • Cross-model layer transplant does NOT work — matching dimensions isn't enough
  • Hybrid architectures (Mamba+MoE+Attention) are completely intolerant of duplication

Usage

from mlx_lm import load, generate

model, tokenizer = load("RockTalk/Qwen3.5-9B-Franken-L24-27")
response = generate(model, tokenizer, prompt="Write a function...", max_tokens=500)
print(response)

~9% slower than the 32-layer base due to 4 extra layers.

How it was made

Layer weights 24-27 were duplicated and appended at the same position, shifting all subsequent layers forward. Config updated to 36 layers. No training, no optimization, no fine-tuning.

Base model: lukey03/Qwen3.5-9B-abliterated-MLX-4bit

Drew Smith — Rocktalk Research

All experiments run on Mac Studio M3 Ultra (512GB) using MLX. No cloud compute. Just surgery.

Downloads last month
16
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RockTalk/Qwen3.5-9B-Franken-L24-27

Finetuned
Qwen/Qwen3.5-9B
Quantized
(296)
this model