| --- |
| license: apache-2.0 |
| tags: |
| - build-small-hackathon |
| - pgsm |
| - exactstate-memory |
| - non-transformer |
| - language-model |
| - surprisal |
| - fineweb-edu |
| - tiny-model |
| - tiny-titan |
| - well-tuned |
| datasets: |
| - HuggingFaceFW/fineweb-edu |
| --- |
| |
| # PGSM Text Surprisal Editor Model |
|
|
| This repository contains the trained model weights used by the Hugging Face Space: |
|
|
| https://huggingface.co/spaces/build-small-hackathon/pgsm-text-surprisal-editor |
|
|
| ## Model Summary |
|
|
| PGSM Text Surprisal Editor is powered by a compact non-Transformer language model based on a custom ExactState Memory / PGSM architecture. |
|
|
| The model is used to score whole-word surprisal by evaluating how predictable each removed word is from its left and right context. |
|
|
| ## Architecture |
|
|
| - Architecture: PGSM / ExactState Memory |
| - Transformer blocks: 0 |
| - Self-attention layers: 0 |
| - Parameters: approximately 4 million |
| - Vocabulary: approximately 2k tokens |
| - Model file: `final_infer.pt` |
|
|
| This model does not use Transformer self-attention. Context is propagated through learned state transitions rather than pairwise attention computations. |
|
|
| ## Training |
|
|
| The model was fully trained by the author on approximately 19 billion tokens from FineWeb-Edu. |
|
|
| Training details: |
|
|
| - Training source: FineWeb-Edu |
| - Training scale: approximately 19B tokens |
| - Training type: full custom training by the author |
| - Base architecture: PGSM / ExactState Memory |
| - Off-the-shelf Transformer checkpoint used: none |
| - Final inference weights: `final_infer.pt` |
|
|
| ## Intended Use |
|
|
| This model is intended for the PGSM Text Surprisal Editor Space, where it powers whole-word surprisal heatmaps for pasted text. |
|
|
| The model is designed for experimentation, visualization, and language-analysis demos rather than production writing assistance or factual generation. |
|
|
| ## Limitations |
|
|
| - Very small model size compared with mainstream LLMs |
| - Compact vocabulary |
| - Designed for surprisal visualization, not general-purpose chat |
| - Outputs should be treated as model-analysis signals, not factual judgments |
| - Training and evaluation details are summarized here for hackathon review |
|
|
| ## Hackathon Context |
|
|
| This model supports the Hugging Face Build Small Hackathon submission: |
|
|
| - Track: Thousand Token Wood |
| - Badges: Tiny Titan, Well-Tuned, Off the Grid, Field Notes |
|
|
| The key goal is to demonstrate a very small, fully trained, non-Transformer language model running locally inside a Hugging Face Space. |