---
license: apache-2.0
language:
- en
- zh
base_model:
- janhq/Jan-v1-2509
- Gen-Verse/Qwen3-4B-RA-SFT
- TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
- TeichAI/Qwen3-4B-Instruct-2507-Claude-Haiku-4.5-Distill
- TeichAI/Qwen3-4B-Instruct-2507-Gemini-3-Pro-Preview-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Claude-Haiku-4.5-High-Reasoning-Distill
- TeichAI/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill
- TeichAI/Qwen3-4B-Thinking-2507-MiMo-V2-Flash-Distill
- DavidAU/Qwen3-4B-Agent-Claude-Gemini-heretic
- DavidAU/Qwen3-4B-Apollo-V0.1-4B-Thinking-Heretic-Abliterated
pipeline_tag: text-generation
library_name: transformers
tags:
- coding
- research
- deep thinking
- 1M context
- 256k context
- Qwen3
- All use cases
- creative
- creative writing
- fiction writing
- plot generation
- sub-plot generation
- story generation
- scene continue
- storytelling
- fiction story
- science fiction
- all genres
- story
- writing
- vivid prosing
- vivid writing
- fiction
- roleplaying
- bfloat16
- finetune
- mergekit
- merge
---

# Qwen3-4B-Element8

Brainwaves:
```brainwave
mxfp4    0.533,0.731,0.854,0.689,0.402,0.762,0.657
qx64-hi  0.531,0.728,0.857,0.702,0.410,0.764,0.671
qx86-hi  0.540,0.725,0.866,0.708,0.430,0.769,0.669
bf16     0.542,0.731,0.866,0.706,0.428,0.765,0.655
```

The last two models merged into Element8:

## Qwen3-4B-Element6d
Brings MiniMax-M2.1 traces
```
mxfp4    0.536,0.718,0.856,0.691,0.400,0.775,0.673
qx64-hi  0.533,0.727,0.865,0.696,0.412,0.766,0.673
qx86-hi  0.536,0.730,0.865,0.704,0.424,0.771,0.665
bf16     0.536,0.731,0.868,0.704,0.434,0.769,0.673
```
## Qwen3-4B-Element7
Brings MiMo-V2-Flash traces
```
mxfp4    0.532,0.733,0.854,0.690,0.392,0.764,0.661
qx64-hi  0.532,0.729,0.857,0.699,0.414,0.766,0.662
qx86-hi  0.538,0.722,0.864,0.707,0.424,0.768,0.670
bf16     0.540,0.726,0.868,0.707,0.416,0.770,0.657
```
One of the interesting side effects of the merge is the resistance to quantization: there is almost no difference in cognitive performance betwen quants and full precision, it only shows in openbookqa, exactly where it should, because of the direct dependence on representative depth.

Base models in the merge, showing the highest performing quants in qx86-hi. 

The top 4 models are all Nightmedia models, each a successful merge similar to Element8. 

```
Agent               0.603,0.817,0.838,0.743,0.426,0.780,0.708
Agent-Claude        0.561,0.760,0.862,0.714,0.422,0.780,0.683
Engineer-trial17    0.569,0.774,0.849,0.705,0.440,0.773,0.642
Engineer3x-Trial122 0.556,0.757,0.850,0.642,0.436,0.754,0.611

Qwen3-4B-RA-SFT     0.515,0.715,0.856,0.615,0.436,0.754,0.629
Jan-v1-2509         0.435,0.540,0.729,0.588,0.388,0.730,0.633
Polaris-Alpha       0.488,0.653,0.846,0.515,0.378,0.683,0.576
Gemini-Flash        0.386,0.447,0.685,0.582,0.362,0.723,0.593
Apollo-Heretic      0.436,0.583,0.805,0.605,0.398,0.738,0.608
Claude-Haiku-4.5    0.469,0.607,0.842,0.560,0.394,0.705,0.585
Claude-Haiku-4.5-HI 0.416,0.513,0.719,0.581,0.372,0.722,0.603
Gemini-3-Pro        0.466,0.658,0.849,0.568,0.394,0.713,0.571
MiMo-V2-Flash       0.398,0.484,0.673,0.618,0.374,0.721,0.630
MiniMax-M2.1        0.391,0.449,0.627,0.588,0.356,0.718,0.609
```

These are the Qwen base model performance: where everyone started with their training. The goal of a merge is to stay well above those metrics if it is to have learned something in the process. By merging traces from different sources, the value of truth is dented, but moved towards a safe average, which explains why quants don't mind the squeeze.
```
Qwen3-4B-Thinking-2507-qx86-hi 0.372,0.414,0.625,0.518,0.366,0.698,0.612
Qwen3-4B-Instruct-2507-qx86-hi 0.447,0.593,0.843,0.448,0.390,0.690,0.554
```

For coding, reasoning, and high performance agentic, the Engineer, Architect, and Agent series from Nightmedia are hard to beat. Models with less merge steps are usually better at coding, the others better reasoners. Think tags do sometimes appear, when the model hits a hard spot. Those are rare and far apart.

We have models in the same performance envelope as 8B, 14B, 30B, and 6B/8B/13B/21B/42B brainstorming versions by DavidAU

The Element series are experimental, and generally fun as conversational models.

-G

P.S. It seems popular, so I made the source openly available


# Stages of development 

## Qwen3-4B-Engineer
multislerp
- janhq/Jan-v1-2509
- Gen-Verse/Qwen3-4B-RA-SFT
- TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
> qx86-hi 0.605,0.828,0.843,0.748,0.416,0.777,0.706


## Qwen3-4B-Engineer3x
multislerp
- Gen-Verse/Qwen3-4B-RA-SFT
- TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
> qx86-hi 0.615,0.835,0.852,0.745,0.420,0.780,0.704


## Qwen3-4B-Agent
multislerp, abliterated with heretic by DavidAU:
- Qwen3-Engineer3x-4B-Run2-Trial122-7-003
- Qwen3-Engineer-4b-run2-trial17-8-004
- Qwen3-4B-Apollo-V0.1-4B-Thinking-Heretic-Abliterated
> qx86-hi 0.603,0.817,0.838,0.743,0.426,0.780,0.708


## Qwen3-4B-Agent-Claude-Gemini-heretic
multislerp, abliterated with heretic by DavidAU:
- Qwen3-4B-Agent
- TeichAI/Qwen3-4B-Instruct-2507-Claude-Haiku-4.5-Distill
- TeichAI/Qwen3-4B-Instruct-2507-Gemini-3-Pro-Preview-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Claude-Haiku-4.5-High-Reasoning-Distill
> qx86-hi 0.561,0.760,0.862,0.714,0.422,0.780,0.683


## Qwen3-4B-Element6d
nuslerp (1.3/0.7)
- Qwen3-4B-Agent-Claude-Gemini-heretic
- TeichAI/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill
> qx86-hi 0.536,0.730,0.865,0.704,0.424,0.771,0.665


## Qwen3-4B-Element7
nuslerp (1.3/0.7)
- Qwen3-4B-Agent-Claude-Gemini-heretic
- TeichAI/Qwen3-4B-Thinking-2507-MiMo-V2-Flash-Distill
> qx86-hi 0.538,0.722,0.864,0.707,0.424,0.768,0.670


## Qwen3-4B-Element8
nuslerp (1.5/0.5)
- Qwen3-4B-Element6d
- Qwen3-4B-Element7
> qx86-hi 0.540,0.725,0.866,0.708,0.430,0.769,0.669

Performance numbers vary slightly with accumulated model traces. 

It makes for very interesting lines of conversation.

...let's just say it's different.