Safetensors
English
llama

michel-nano-v2

An ultra-compact 9-million parameter LLM trained from scratch. Based on the LLAMA architecture and trained on 6.5 Billion tokens of high-quality web data (Fineweb-edu and Fineweb-HQ) and synthetic textbooks and stories (Cosmopedia). michel-nano-v2 has a context length of 1024 tokens.

Data mixture

Dataset Weight
HuggingFaceFW/fineweb-edu 50%
epfml/FineWeb-HQ 30%
HuggingFaceTB/cosmopedia (stories split) 20%

Tokenizer

michel-nano-v2 uses a custom bpe tokenizer trained on 100_000 samples from the training data mixture with a vocab size of 10k + chatml special tokens.

Benchmarks

All benchmarks are zero-shot and use normalized accuracy.

Maker Model Hellaswag ARC (easy) PIQA BLiMP Average
finnianx Michel-Nano-v2 27.40% 35.90% 56.75% 72.52% 48.14%
Axiomic Labs GPT-S-5M 27.39% 33.16% 57.13% 72.21% 47.47%
EleutherAI pythia-31m 27.14% 33.88% 56.26% 67.78% 46.27%
EleutherAI pythia-14m 26.20% 32.28% 55.88% 66.75% 45.28%
SupraLabs Supra-Mini-v5-8M 26.38% 33.33% 54.03% 63.83% 44.39%
LH-Tech-AI Spark-5M-Base-v4 27.03% 33.21% 53.43% 62.17% 43.96%
SupraLabs Supra-Mini-v4-2M 25.52% 30.98% 51.90% 60.57% 42.24%

Intended Uses

michel-nano-v2 is intended to be a base for finetuning. it is a base model with zero post-training, making it very versatile.

Below are some example usecases:

  • Lightweight grammar and spell checking
  • Classification
  • Text extraction
  • Model Routing

Out-of-scope

  • Any tasks requiring more than basic logic

  • Instruction following (this is a base model)

  • Multilingual usage (without finetuning)

Downloads last month
176
Safetensors
Model size
9.94M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train finnianx/michel-nano-v2

Spaces using finnianx/michel-nano-v2 3

Collection including finnianx/michel-nano-v2