---
language:
- en
- it
license: other
library_name: custom
pipeline_tag: text-generation
tags:
- nanochat
- gpt2-small
- bilingual
- english
- italian
- pretraining
---

# gpt2small-en-it-nanochat-lr2e4-bs6-wsd-fastdecay-step10000

This repo stages the best saved checkpoint from the local NanoChat EN/IT GPT-2-small-like run `20260517_stable-config-recipe-v5-gpt2small-lr2e4-batchmaxpossible-bs6-wsd-fastdecay`.

## What this is

- model family: GPT-2-small-like decoder-only LM
- parameters: ~136M
- languages: English + Italian
- context length: 2500
- selected checkpoint: `step_10000.pt`
- selection reason: lowest recorded validation loss among saved checkpoints in `best_validation.json`

## Best validation

- step: 10000
- validation loss: 3.8945770748
- validation perplexity: 49.1352684243
- validation batches: 128

## Important caveat

This checkpoint is the best validation checkpoint **within this run family**. It is a useful intermediate bilingual pretraining artifact, not a polished factual assistant model.

## Training/data provenance

- training config: `training_config.yaml`
- tokenizer: `tokenizer.json` + `tokenizer_meta.json`
- packed dataset root used by the run: `/mnt/apps/llm-nanochat/datasets/202605011052_fresh_50_50_score100_2500_sourcebalanced`
- tokenizer root used by the run: `/mnt/apps/llm-nanochat/tokenizers/tok_202605011052_fresh_50_50_score100_32k_fromscratch`

## Included files

- `step_10000.pt`
- `step_10000.safetensors`
- `step_10000.safetensors.json`
- `training_config.yaml`
- `tokenizer.json`
- `tokenizer_meta.json`
- `best_validation.json`
- `eval_summary.json`
- `probe_step10000_summary.json`
- full run telemetry snapshots: `eval_metrics.jsonl`, `metrics.jsonl`, `probe_generations.jsonl`

## Probe reading at step 10000

The run includes probe telemetry, but the stored payload for this experiment is legacy/partial: the `probe_generations.jsonl` entries at step `10000` keep prompts and expected continuations, while generated text / target-rank fields are null. So this release does **not** make strong probe-quality claims from those rows.

## Usage

This project uses a custom NanoChat inference/training stack. The easiest local UI in the source repo is the Chainlit checkpoint tester documented in the repo README.

## Limitations

- factual recall is still limited
- generations may become repetitive
- the model was selected by validation loss inside this run family, not by broad downstream benchmark performance
- dataset redistribution for the full training corpus may have separate licensing constraints; this repo contains model artifacts, not the raw/prepared training corpus