gpt2small-en-it-nanochat-lr2e4-bs6-wsd-fastdecay-step10000

This repo stages the best saved checkpoint from the local NanoChat EN/IT GPT-2-small-like run 20260517_stable-config-recipe-v5-gpt2small-lr2e4-batchmaxpossible-bs6-wsd-fastdecay.

What this is

  • model family: GPT-2-small-like decoder-only LM
  • parameters: ~136M
  • languages: English + Italian
  • context length: 2500
  • selected checkpoint: step_10000.pt
  • selection reason: lowest recorded validation loss among saved checkpoints in best_validation.json

Best validation

  • step: 10000
  • validation loss: 3.8945770748
  • validation perplexity: 49.1352684243
  • validation batches: 128

Important caveat

This checkpoint is the best validation checkpoint within this run family. It is a useful intermediate bilingual pretraining artifact, not a polished factual assistant model.

Training/data provenance

  • training config: training_config.yaml
  • tokenizer: tokenizer.json + tokenizer_meta.json
  • packed dataset root used by the run: /mnt/apps/llm-nanochat/datasets/202605011052_fresh_50_50_score100_2500_sourcebalanced
  • tokenizer root used by the run: /mnt/apps/llm-nanochat/tokenizers/tok_202605011052_fresh_50_50_score100_32k_fromscratch

Included files

  • step_10000.pt
  • step_10000.safetensors
  • step_10000.safetensors.json
  • training_config.yaml
  • tokenizer.json
  • tokenizer_meta.json
  • best_validation.json
  • eval_summary.json
  • probe_step10000_summary.json
  • full run telemetry snapshots: eval_metrics.jsonl, metrics.jsonl, probe_generations.jsonl

Probe reading at step 10000

The run includes probe telemetry, but the stored payload for this experiment is legacy/partial: the probe_generations.jsonl entries at step 10000 keep prompts and expected continuations, while generated text / target-rank fields are null. So this release does not make strong probe-quality claims from those rows.

Usage

This project uses a custom NanoChat inference/training stack. The easiest local UI in the source repo is the Chainlit checkpoint tester documented in the repo README.

Limitations

  • factual recall is still limited
  • generations may become repetitive
  • the model was selected by validation loss inside this run family, not by broad downstream benchmark performance
  • dataset redistribution for the full training corpus may have separate licensing constraints; this repo contains model artifacts, not the raw/prepared training corpus
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support