---
datasets:
- phonemetransformers/IPA-BabyLM
language:
- en
base_model:
- openai-community/gpt2
---

GPT2 trained on the BabyLM 2024 training set using a BPE tokenizer with word boundaries removed.

Model trained for [From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes](https://arxiv.org/abs/2410.22906).