--- license: apache-2.0 language: - en - zh base_model: - janhq/Jan-v1-2509 - Gen-Verse/Qwen3-4B-RA-SFT - TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill - TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill - TeichAI/Qwen3-4B-Instruct-2507-Claude-Haiku-4.5-Distill - TeichAI/Qwen3-4B-Instruct-2507-Gemini-3-Pro-Preview-Distill - TeichAI/Qwen3-4B-Thinking-2507-Claude-Haiku-4.5-High-Reasoning-Distill - TeichAI/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill - TeichAI/Qwen3-4B-Thinking-2507-MiMo-V2-Flash-Distill - DavidAU/Qwen3-4B-Agent-Claude-Gemini-heretic - DavidAU/Qwen3-4B-Apollo-V0.1-4B-Thinking-Heretic-Abliterated pipeline_tag: text-generation library_name: transformers tags: - coding - research - deep thinking - 1M context - 256k context - Qwen3 - All use cases - creative - creative writing - fiction writing - plot generation - sub-plot generation - story generation - scene continue - storytelling - fiction story - science fiction - all genres - story - writing - vivid prosing - vivid writing - fiction - roleplaying - bfloat16 - finetune - mergekit - merge --- # Qwen3-4B-Element8 Brainwaves: ```brainwave mxfp4 0.533,0.731,0.854,0.689,0.402,0.762,0.657 qx64-hi 0.531,0.728,0.857,0.702,0.410,0.764,0.671 qx86-hi 0.540,0.725,0.866,0.708,0.430,0.769,0.669 bf16 0.542,0.731,0.866,0.706,0.428,0.765,0.655 ``` The last two models merged into Element8: ## Qwen3-4B-Element6d Brings MiniMax-M2.1 traces ``` mxfp4 0.536,0.718,0.856,0.691,0.400,0.775,0.673 qx64-hi 0.533,0.727,0.865,0.696,0.412,0.766,0.673 qx86-hi 0.536,0.730,0.865,0.704,0.424,0.771,0.665 bf16 0.536,0.731,0.868,0.704,0.434,0.769,0.673 ``` ## Qwen3-4B-Element7 Brings MiMo-V2-Flash traces ``` mxfp4 0.532,0.733,0.854,0.690,0.392,0.764,0.661 qx64-hi 0.532,0.729,0.857,0.699,0.414,0.766,0.662 qx86-hi 0.538,0.722,0.864,0.707,0.424,0.768,0.670 bf16 0.540,0.726,0.868,0.707,0.416,0.770,0.657 ``` One of the interesting side effects of the merge is the resistance to quantization: there is almost no difference in cognitive performance betwen quants and full precision, it only shows in openbookqa, exactly where it should, because of the direct dependence on representative depth. Base models in the merge, showing the highest performing quants in qx86-hi. The top 4 models are all Nightmedia models, each a successful merge similar to Element8. ``` Agent 0.603,0.817,0.838,0.743,0.426,0.780,0.708 Agent-Claude 0.561,0.760,0.862,0.714,0.422,0.780,0.683 Engineer-trial17 0.569,0.774,0.849,0.705,0.440,0.773,0.642 Engineer3x-Trial122 0.556,0.757,0.850,0.642,0.436,0.754,0.611 Qwen3-4B-RA-SFT 0.515,0.715,0.856,0.615,0.436,0.754,0.629 Jan-v1-2509 0.435,0.540,0.729,0.588,0.388,0.730,0.633 Polaris-Alpha 0.488,0.653,0.846,0.515,0.378,0.683,0.576 Gemini-Flash 0.386,0.447,0.685,0.582,0.362,0.723,0.593 Apollo-Heretic 0.436,0.583,0.805,0.605,0.398,0.738,0.608 Claude-Haiku-4.5 0.469,0.607,0.842,0.560,0.394,0.705,0.585 Claude-Haiku-4.5-HI 0.416,0.513,0.719,0.581,0.372,0.722,0.603 Gemini-3-Pro 0.466,0.658,0.849,0.568,0.394,0.713,0.571 MiMo-V2-Flash 0.398,0.484,0.673,0.618,0.374,0.721,0.630 MiniMax-M2.1 0.391,0.449,0.627,0.588,0.356,0.718,0.609 ``` These are the Qwen base model performance: where everyone started with their training. The goal of a merge is to stay well above those metrics if it is to have learned something in the process. By merging traces from different sources, the value of truth is dented, but moved towards a safe average, which explains why quants don't mind the squeeze. ``` Qwen3-4B-Thinking-2507-qx86-hi 0.372,0.414,0.625,0.518,0.366,0.698,0.612 Qwen3-4B-Instruct-2507-qx86-hi 0.447,0.593,0.843,0.448,0.390,0.690,0.554 ``` For coding, reasoning, and high performance agentic, the Engineer, Architect, and Agent series from Nightmedia are hard to beat. Models with less merge steps are usually better at coding, the others better reasoners. Think tags do sometimes appear, when the model hits a hard spot. Those are rare and far apart. We have models in the same performance envelope as 8B, 14B, 30B, and 6B/8B/13B/21B/42B brainstorming versions by DavidAU The Element series are experimental, and generally fun as conversational models. -G P.S. It seems popular, so I made the source openly available # Stages of development ## Qwen3-4B-Engineer multislerp - janhq/Jan-v1-2509 - Gen-Verse/Qwen3-4B-RA-SFT - TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill - TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill > qx86-hi 0.605,0.828,0.843,0.748,0.416,0.777,0.706 ## Qwen3-4B-Engineer3x multislerp - Gen-Verse/Qwen3-4B-RA-SFT - TeichAI/Qwen3-4B-Instruct-2507-Polaris-Alpha-Distill - TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill > qx86-hi 0.615,0.835,0.852,0.745,0.420,0.780,0.704 ## Qwen3-4B-Agent multislerp, abliterated with heretic by DavidAU: - Qwen3-Engineer3x-4B-Run2-Trial122-7-003 - Qwen3-Engineer-4b-run2-trial17-8-004 - Qwen3-4B-Apollo-V0.1-4B-Thinking-Heretic-Abliterated > qx86-hi 0.603,0.817,0.838,0.743,0.426,0.780,0.708 ## Qwen3-4B-Agent-Claude-Gemini-heretic multislerp, abliterated with heretic by DavidAU: - Qwen3-4B-Agent - TeichAI/Qwen3-4B-Instruct-2507-Claude-Haiku-4.5-Distill - TeichAI/Qwen3-4B-Instruct-2507-Gemini-3-Pro-Preview-Distill - TeichAI/Qwen3-4B-Thinking-2507-Claude-Haiku-4.5-High-Reasoning-Distill > qx86-hi 0.561,0.760,0.862,0.714,0.422,0.780,0.683 ## Qwen3-4B-Element6d nuslerp (1.3/0.7) - Qwen3-4B-Agent-Claude-Gemini-heretic - TeichAI/Qwen3-4B-Thinking-2507-MiniMax-M2.1-Distill > qx86-hi 0.536,0.730,0.865,0.704,0.424,0.771,0.665 ## Qwen3-4B-Element7 nuslerp (1.3/0.7) - Qwen3-4B-Agent-Claude-Gemini-heretic - TeichAI/Qwen3-4B-Thinking-2507-MiMo-V2-Flash-Distill > qx86-hi 0.538,0.722,0.864,0.707,0.424,0.768,0.670 ## Qwen3-4B-Element8 nuslerp (1.5/0.5) - Qwen3-4B-Element6d - Qwen3-4B-Element7 > qx86-hi 0.540,0.725,0.866,0.708,0.430,0.769,0.669 Performance numbers vary slightly with accumulated model traces. It makes for very interesting lines of conversation. ...let's just say it's different.