Status

Nemo-12B-Crownelius-ST

Another experiment with the surgical style-tuning, heavily inspired by Gryphe's methodology (Gemma-4-31B-StyleTune).

I stumbled upon the concept of Style-Tuning by chance, and since the initial results were incredibly promising, I wanted to test the boundaries of this method on the Mistral-NeMo-12B architecture using a highly specialized, modern prose dataset.

Methodology & Concept

Instead of a traditional full fine-tune that alters the core reasoning layers of the network, this approach surgically calibrates a single component:

  1. Total Freeze: All attention mechanisms and MLP layers (Layers 0–39) remain completely frozen. The underlying logic, instruction-following capabilities, and general world knowledge of Mistral-NeMo are preserved.
  2. Targeted Vocabulary Shift: Training is strictly isolated to the lm_head (output projection).

Retraining only the lm_head doesn't make the model inherently smarter or dumber—it simply rewires its linguistic preferences. It swaps the generic, predictable "AI-slop" vocabulary for a rich, varied, and sophisticated prose delivery.


The Dataset: Crownelius/Opus-4.5-3000x

For this specific run, I utilized Crownelius/Opus-4.5-3000x.

Training Details & Parameters

The training was executed with strict constraints to ensure the lm_head absorbed the stylistic texture without leading to semantic overfitting:

  • Epochs: 1
  • Learning Rate: 2e-4 (Linear Scheduler)
  • Target Modules: lm_head only

Key Observations & Results

  • A nice style change compared to vanilla. The Opus style is clearly noticeable even with 1 epoch and a 2e-4 learning rate.

Recommended Sampler Settings

  • Temperature: 0.7 - 0.9
  • Min_P: 0.05
  • Top_P: 0.95
  • Repetition Penalty: 1.05
Downloads last month
222
Safetensors
Model size
12B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ewald1976/nemo-crownelius-st-12b

Finetuned
(215)
this model
Finetunes
1 model
Quantizations
7 models

Dataset used to train ewald1976/nemo-crownelius-st-12b

Collection including ewald1976/nemo-crownelius-st-12b