nilmeruo
/

SurpriseLensModel

build-small-hackathon

exactstate-memory

non-transformer

Model card Files Files and versions

SurpriseLensModel / README.md

nilmeruo's picture

Upload README.md

4e84da6 verified 17 days ago

|

History Blame Contribute Delete

2.43 kB

	---
	license: apache-2.0
	tags:
	- build-small-hackathon
	- pgsm
	- exactstate-memory
	- non-transformer
	- language-model
	- surprisal
	- fineweb-edu
	- tiny-model
	- tiny-titan
	- well-tuned
	datasets:
	- HuggingFaceFW/fineweb-edu
	---

	# PGSM Text Surprisal Editor Model

	This repository contains the trained model weights used by the Hugging Face Space:

	https://huggingface.co/spaces/build-small-hackathon/pgsm-text-surprisal-editor

	## Model Summary

	PGSM Text Surprisal Editor is powered by a compact non-Transformer language model based on a custom ExactState Memory / PGSM architecture.

	The model is used to score whole-word surprisal by evaluating how predictable each removed word is from its left and right context.

	## Architecture

	- Architecture: PGSM / ExactState Memory
	- Transformer blocks: 0
	- Self-attention layers: 0
	- Parameters: approximately 4 million
	- Vocabulary: approximately 2k tokens
	- Model file: `final_infer.pt`

	This model does not use Transformer self-attention. Context is propagated through learned state transitions rather than pairwise attention computations.

	## Training

	The model was fully trained by the author on approximately 19 billion tokens from FineWeb-Edu.

	Training details:

	- Training source: FineWeb-Edu
	- Training scale: approximately 19B tokens
	- Training type: full custom training by the author
	- Base architecture: PGSM / ExactState Memory
	- Off-the-shelf Transformer checkpoint used: none
	- Final inference weights: `final_infer.pt`

	## Intended Use

	This model is intended for the PGSM Text Surprisal Editor Space, where it powers whole-word surprisal heatmaps for pasted text.

	The model is designed for experimentation, visualization, and language-analysis demos rather than production writing assistance or factual generation.

	## Limitations

	- Very small model size compared with mainstream LLMs
	- Compact vocabulary
	- Designed for surprisal visualization, not general-purpose chat
	- Outputs should be treated as model-analysis signals, not factual judgments
	- Training and evaluation details are summarized here for hackathon review

	## Hackathon Context

	This model supports the Hugging Face Build Small Hackathon submission:

	- Track: Thousand Token Wood
	- Badges: Tiny Titan, Well-Tuned, Off the Grid, Field Notes

	The key goal is to demonstrate a very small, fully trained, non-Transformer language model running locally inside a Hugging Face Space.