--- datasets: - phonemetransformers/IPA-BabyLM language: - en base_model: - openai-community/gpt2 --- GPT2 trained on the BabyLM 2024 training set using a BPE tokenizer with word boundaries removed. Model trained for [From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes](https://arxiv.org/abs/2410.22906).