michel-nano-v2

An ultra-compact 9-million parameter LLM trained from scratch. Based on the LLAMA architecture and trained on 6.5 Billion tokens of high-quality web data (Fineweb-edu and Fineweb-HQ) and synthetic textbooks and stories (Cosmopedia). michel-nano-v2 has a context length of 1024 tokens.

Data mixture

Dataset	Weight
`HuggingFaceFW/fineweb-edu`	50%
`epfml/FineWeb-HQ`	30%
`HuggingFaceTB/cosmopedia` (stories split)	20%

Tokenizer

michel-nano-v2 uses a custom bpe tokenizer trained on 100_000 samples from the training data mixture with a vocab size of 10k + chatml special tokens.

Benchmarks

All benchmarks are zero-shot and use normalized accuracy.

Maker	Model	Hellaswag	ARC (easy)	PIQA	BLiMP	Average
finnianx	Michel-Nano-v2	27.40%	35.90%	56.75%	72.52%	48.14%
Axiomic Labs	GPT-S-5M	27.39%	33.16%	57.13%	72.21%	47.47%
EleutherAI	pythia-31m	27.14%	33.88%	56.26%	67.78%	46.27%
EleutherAI	pythia-14m	26.20%	32.28%	55.88%	66.75%	45.28%
SupraLabs	Supra-Mini-v5-8M	26.38%	33.33%	54.03%	63.83%	44.39%
LH-Tech-AI	Spark-5M-Base-v4	27.03%	33.21%	53.43%	62.17%	43.96%
SupraLabs	Supra-Mini-v4-2M	25.52%	30.98%	51.90%	60.57%	42.24%

Intended Uses

michel-nano-v2 is intended to be a base for finetuning. it is a base model with zero post-training, making it very versatile.

Below are some example usecases:

Lightweight grammar and spell checking
Classification
Text extraction
Model Routing

Out-of-scope

Any tasks requiring more than basic logic
Instruction following (this is a base model)
Multilingual usage (without finetuning)

Downloads last month: 176

Safetensors

Model size

9.94M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

finnianx
/

michel-nano-v2

michel-nano-v2

Data mixture

Tokenizer

Benchmarks

Intended Uses

Below are some example usecases:

Out-of-scope

Datasets used to train finnianx/michel-nano-v2

Spaces using finnianx/michel-nano-v2 3

Collection including finnianx/michel-nano-v2

Michel V2