YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SVEN-10M
A 10M parameter language model trained from scratch on real data for ~$1.
SVEN-10M is the first checkpoint in the SVEN model family, built entirely from scratch - custom tokenizer, custom architecture, custom training loop. No fine-tuning. No LoRA. Trained on real public datasets on a single RTX 3090 GPU.
Model Details
| Architecture | Decoder-only transformer (LLaMA-style) |
| Parameters | 11,537,664 (~11.5M) |
| Context length | 256 tokens |
| Vocabulary | 32,000 (BPE, trained on this corpus) |
| Layers | 6 |
| Hidden size | 256 |
| Attention heads | 8 (2 KV heads, GQA) |
| Activation | SwiGLU |
| Positional encoding | RoPE |
| Normalization | RMSNorm |
| Training steps | 1,000 |
| Training loss | 10.41 โ 6.90 |
| Precision | bfloat16 |
| GPU | 1ร NVIDIA RTX 3090 (24GB) |
| Training cost | ~$1 |
Training Data
Trained on a curated English-only mix of 116M tokens from 5 public sources:
| Source | Documents | Content |
|---|---|---|
| FineWeb-Edu | 49,993 | High-quality educational web text |
| Wikipedia (EN) | 19,981 | English Wikipedia articles |
| OpenWebMath | 14,918 | Mathematical reasoning and problems |
| Python code | 18,390 | Python programming instructions |
| JavaScript code | 15,000 | JavaScript code samples |
| Total | 118,282 | 116M tokens |
All data filtered for English (ASCII ratio + common word check), quality-filtered, and deduplicated before training.
Tokenizer: Custom BPE tokenizer trained on this exact corpus using SentencePiece. 32,000 vocab size.
Architecture Notes
SVEN-10M uses a modern LLaMA-style architecture rather than the original GPT-2 design:
- RoPE instead of learned positional embeddings
- RMSNorm instead of LayerNorm (faster, no mean subtraction)
- SwiGLU instead of GELU (better gradient flow)
- Grouped Query Attention (2 KV heads vs 8 Q heads, 4ร memory saving)
- Weight-tied embeddings (input and output projection share weights)
- No bias in linear layers
Training Details
Optimizer: AdamW
Learning rate: 3e-4 (cosine decay to 3e-5)
Warmup steps: 100
Weight decay: 0.1
Gradient clip: 1.0
Batch size: 8
Sequence length: 256
Training steps: 1,000
Loss curve:
step 0: 10.41 (random initialization, expected โ log(32000) = 10.37)
step 100: 8.04 (fast early learning)
step 500: 7.00 (solid convergence)
step 1000: 6.90 (final)
Intended Use
SVEN-10M is a proof-of-concept research model and the smoke-test checkpoint for the SVEN model family.
It is intended for:
- Verifying the full from-scratch training pipeline works end to end
- Educational reference for how to train a small LLM from scratch
- Experimentation and architecture ablations at low cost
It is not intended for:
- Production use
- Any task requiring factual accuracy
- Code generation in real projects
- Replacing larger, properly trained models
At 11M parameters trained for 1,000 steps on 116M tokens, this model has seen far too little data to be reliable for any real task. It can generate English-like text and shows it has learned basic language patterns, but should be treated as a research artifact.
Limitations
- 11M parameters is very small - the model cannot hold much world knowledge
- Only 1,000 training steps - significantly undertrained
- 116M tokens is a small corpus for pretraining
- Context window of 256 tokens limits long-form understanding
- No instruction tuning, RLHF, or alignment of any kind
- Not evaluated on standard benchmarks (ARC, HellaSwag, PIQA)
What's Next
SVEN-175M is the full-scale model in this family:
- 175M parameters
- Same architecture, scaled up
- Trained on the same corpus for 150,000+ steps
- Target:
sriksven/sven-175m
Files
| File | Description |
|---|---|
model.pt |
Full model checkpoint (weights + optimizer state) |
tokenizer.model |
SentencePiece BPE tokenizer model |
tokenizer.vocab |
Tokenizer vocabulary |
config.yaml |
Model architecture config |
Citation
@misc{sven-10m,
author = {Sri Krishna Venkatesh},
title = {SVEN-10M: A 10M Parameter LLM Trained from Scratch},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/sriksven/sven-10m}
}
About
SVEN stands for Sri Krishna Venkatesh - hidden in plain sight.
Built from scratch. No shortcuts.
- Downloads last month
- 11