It is a truncated version of NLLB-200-600M model (6 layers instead of 12, 512 hidden dimensions instead of 1024) with 175M parameters (131M of which are token embeddings).

This model was fine-tuned on the slone/nllb-200-10M-sample subset of the NLLB dataset with 175 languages, using only the samples with BLASER score above 3.5.

Because of its small size, it is really bad at translation, but can serve as a base model for further fine-tuning for a small number of languages. It is recommended to prune the vocabulary of this model before fine-tuning, to preserve only the tokens used with the intended languages.

Downloads last month: 5

slone
/

nllb-pruned-6L-512d-finetuned

Dataset used to train slone/nllb-pruned-6L-512d-finetuned