Base + Language-Specific LangMAP — xglm-2_9b × fin_Latn

Unsupervised tokenization specialised for fin_Latn, derived from the xglm-2_9b base BPE tokenizer using the LangMAP framework.

This repository bundles:

Inference uses base + language-specific scores together (the LangMAP variant); do not use the bare overlay or base on its own.

Trained from job smoke.fin.xglm-2_9b.v256008 (vocab=256008, langs=[fin_Latn], iters=5, em_mode=soft, byte_fallback=True, seed-fix applied).

Loading

from tokenizers import Tokenizer
tok = Tokenizer.from_file("tokenizer.json")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support