NLLB-600M LoRA adapter for Tajik → Persian Translation

This repository contains a LoRA adapter for the facebook/nllb-200-distilled-600M model, fine‑tuned on the TajikNLPWorld/TajPersParallelLexicalCorpus dataset for the translation task from Tajik (Cyrillic) to Persian (Arabic script).

Model Description

Base model: facebook/nllb-200-distilled-600M
Fine‑tuning method: LoRA (rank=8, alpha=32, dropout=0.1)
Training data: 33,652 sentence pairs
Evaluation metrics (test set):
- chrF: 53.0
- METEOR: 0.167
- BERTScore (F1): 0.915
- BLEU: 10.5

The fine‑tuned model significantly outperforms the zero‑shot baseline (chrF 0.0 → 53.0, BERTScore 0.69 → 0.915).

Usage

With PEFT (recommended)

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel

base_model_name = "facebook/nllb-200-distilled-600M"
adapter_path = "TajikNLPWorld/nllb-600m-tajik-persian-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(base_model_name, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_path)
model = model.merge_and_unload()   # optional

# Translate
tokenizer.src_lang = "tg_Cyrl"
text = "ришк"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With merged model (if available)

If you prefer a single model, you can merge the adapter with the base model before loading.

Files

adapter_config.json – LoRA configuration.
adapter_model.bin – LoRA weights.
results/ – Folder containing evaluation metrics, plots, and predictions.

Citation

If you use this model, please cite our work (to be added).

License

This model is released under the Apache 2.0 license, consistent with the base NLLB model.

Developed by: [Arabov Mullosharaf/ TajikNLP]
Contact: [cool.araby@gmail.com]

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for TajikNLPWorld/nllb-600m-tajik-persian-lora

Base model

facebook/nllb-200-distilled-600M

Adapter

(96)

this model

TajikNLPWorld
/

nllb-600m-tajik-persian-lora