Nemo-12B-Crownelius-ST

Another experiment with the surgical style-tuning, heavily inspired by Gryphe's methodology (Gemma-4-31B-StyleTune).

I stumbled upon the concept of Style-Tuning by chance, and since the initial results were incredibly promising, I wanted to test the boundaries of this method on the Mistral-NeMo-12B architecture using a highly specialized, modern prose dataset.

Methodology & Concept

Instead of a traditional full fine-tune that alters the core reasoning layers of the network, this approach surgically calibrates a single component:

Total Freeze: All attention mechanisms and MLP layers (Layers 0–39) remain completely frozen. The underlying logic, instruction-following capabilities, and general world knowledge of Mistral-NeMo are preserved.
Targeted Vocabulary Shift: Training is strictly isolated to the lm_head (output projection).

Retraining only the lm_head doesn't make the model inherently smarter or dumber—it simply rewires its linguistic preferences. It swaps the generic, predictable "AI-slop" vocabulary for a rich, varied, and sophisticated prose delivery.