gpt2small-en-it-nanochat-lr2e4-bs6-wsd-fastdecay-step10000

This repo stages the best saved checkpoint from the local NanoChat EN/IT GPT-2-small-like run 20260517_stable-config-recipe-v5-gpt2small-lr2e4-batchmaxpossible-bs6-wsd-fastdecay.

What this is

model family: GPT-2-small-like decoder-only LM
parameters: ~136M
languages: English + Italian
context length: 2500
selected checkpoint: step_10000.pt
selection reason: lowest recorded validation loss among saved checkpoints in best_validation.json

Best validation

step: 10000
validation loss: 3.8945770748
validation perplexity: 49.1352684243
validation batches: 128

Important caveat

This checkpoint is the best validation checkpoint within this run family. It is a useful intermediate bilingual pretraining artifact, not a polished factual assistant model.

Training/data provenance

training config: training_config.yaml
tokenizer: tokenizer.json + tokenizer_meta.json
packed dataset root used by the run: /mnt/apps/llm-nanochat/datasets/202605011052_fresh_50_50_score100_2500_sourcebalanced
tokenizer root used by the run: /mnt/apps/llm-nanochat/tokenizers/tok_202605011052_fresh_50_50_score100_32k_fromscratch

Included files

step_10000.pt
step_10000.safetensors
step_10000.safetensors.json
training_config.yaml
tokenizer.json
tokenizer_meta.json
best_validation.json
eval_summary.json
probe_step10000_summary.json
full run telemetry snapshots: eval_metrics.jsonl, metrics.jsonl, probe_generations.jsonl

Probe reading at step 10000

The run includes probe telemetry, but the stored payload for this experiment is legacy/partial: the probe_generations.jsonl entries at step 10000 keep prompts and expected continuations, while generated text / target-rank fields are null. So this release does not make strong probe-quality claims from those rows.

Usage

This project uses a custom NanoChat inference/training stack. The easiest local UI in the source repo is the Chainlit checkpoint tester documented in the repo README.

Limitations

factual recall is still limited
generations may become repetitive
the model was selected by validation loss inside this run family, not by broad downstream benchmark performance
dataset redistribution for the full training corpus may have separate licensing constraints; this repo contains model artifacts, not the raw/prepared training corpus

Downloads last month: -; Downloads are not tracked for this model. How to track