Model Card: NLLB-200-distilled-600M Fine-tuned for Kikuyu-English Translation

Model Overview

Model Name: nickdee96/nllb-200-600M-kikuyu-english
Base Model: facebook/nllb-200-distilled-600M
Task: Neural Machine Translation (Kikuyu ↔ English)
Language Pair: Kikuyu (kik_Latn) → English (eng_Latn)
Model Type: Sequence-to-Sequence Transformer
License: Same as base model (CC-BY-NC)

Model Description

This model is a fine-tuned version of Facebook's NLLB-200-distilled-600M (No Language Left Behind) specifically optimized for Kikuyu to English translation. The model leverages the multilingual capabilities of NLLB-200 and has been fine-tuned on a curated dataset of Kikuyu-English parallel text pairs to improve translation quality for this under-resourced language pair.

Key Features

Specialized for Kikuyu: Fine-tuned specifically for Kikuyu (Gĩkũyũ) language translation
High-quality training data: Trained on 17,514 carefully filtered translation pairs
Optimized performance: Achieved low validation loss (0.0375) after fine-tuning
Production ready: Includes both HuggingFace Transformers format and optimized inference setup

Training Details

Dataset

Source: Kikuyu translation pairs dataset from LDRI language project
Total pairs: 17,514 valid translation pairs
Training split: 15,762 pairs (90%)
Validation split: 1,752 pairs (10%)
Data filtering: Removed short texts (<3 characters) and common non-linguistic responses
Domain: Conversational and agricultural content

Training Configuration

Base model: facebook/nllb-200-distilled-600M
Training epochs: 3
Batch size: 4 per device
Gradient accumulation steps: 8 (effective batch size: 32)
Learning rate: 5e-5
Scheduler: Cosine annealing
Warmup steps: 100
Max sequence length: 512 tokens
Optimization: FP16 precision for memory efficiency

Training Infrastructure

Hardware: CUDA-enabled GPU
Framework: HuggingFace Transformers 4.x
Training time: ~1 hour 14 minutes (4,454.86 seconds)
Memory optimization: FP16, gradient accumulation, device mapping

Performance

Training Metrics

Final training loss: 1.1482
Final validation loss: 0.0375
Training progression: Consistent loss reduction from 3.64 to 0.036 over training steps
Convergence: Model showed good convergence with stable validation loss

Translation Examples

Kikuyu	Model Translation	Reference
Ũrendia atĩa	How do you sell?	How you are selling
Nĩgetha ũkamenya kana hĩndĩ ĩrĩa ndahanda kĩmera gĩkĩ rĩ	So that you can know when I planted this summer	So that you can know when I plant in this season

Model Capabilities

Strong semantic understanding: Captures meaning accurately even with complex sentence structures
Context awareness: Handles conversational contexts and agricultural terminology
Code-switching handling: Can process mixed language inputs with [cs] markers
Robust to variations: Performs well on different sentence lengths and structures

Usage

HuggingFace Transformers

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("nickdee96/nllb-200-600M-kikuyu-english")
model = AutoModelForSeq2SeqLM.from_pretrained("nickdee96/nllb-200-600M-kikuyu-english")

# Create translation pipeline
translator = pipeline(
    "translation",
    model=model,
    tokenizer=tokenizer,
    src_lang="kik_Latn",
    tgt_lang="eng_Latn"
)

# Translate Kikuyu to English
result = translator("Ũrendia atĩa")
print(result[0]['translation_text'])

Direct Model Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("nickdee96/nllb-200-600M-kikuyu-english")
model = AutoModelForSeq2SeqLM.from_pretrained("nickdee96/nllb-200-600M-kikuyu-english")

# Tokenize input
inputs = tokenizer("Ũrendia atĩa", return_tensors="pt")

# Generate translation
outputs = model.generate(**inputs, max_length=512)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

Limitations and Considerations

Known Limitations

Domain specificity: Primarily trained on conversational and agricultural content
Data size: Limited by the size of available Kikuyu-English parallel data
Dialectal variations: May not capture all Kikuyu dialectal variations
Formal register: May be less effective on highly formal or technical texts
One-way optimization: Primarily optimized for Kikuyu→English direction

Ethical Considerations

Language preservation: Contributes to digital language preservation for Kikuyu
Cultural sensitivity: Trained on authentic conversational data respecting cultural context
Bias mitigation: Filtered training data to remove inappropriate content
Fair use: Intended for research and educational purposes

Recommended Use Cases

✅ Educational tools: Language learning applications
✅ Research: Computational linguistics and MT research
✅ Documentation: Helping preserve and translate Kikuyu texts
✅ Communication: Assisting with basic Kikuyu-English communication

Not Recommended For

❌ Critical translations: Legal, medical, or safety-critical contexts without human review
❌ Real-time interpretation: High-stakes real-time communication
❌ Commercial applications: Without proper validation and testing
❌ Formal documents: Official government or business documents without review

Technical Specifications

Model Architecture

Architecture: M2M100ForConditionalGeneration (NLLB variant)
Parameters: ~600M (distilled version)
Encoder layers: 12
Decoder layers: 12
Hidden size: 1024
Attention heads: 16
Vocabulary size: Model-specific NLLB vocabulary

Input/Output Specifications

Input language: Kikuyu (kik_Latn)
Output language: English (eng_Latn)
Max input length: 512 tokens
Tokenization: SentencePiece-based (NLLB tokenizer)
Special tokens: Language-specific prefix tokens

Performance Benchmarks

Inference speed: GPU-optimized for real-time translation
Memory requirements: ~2.5GB GPU memory for inference
Batch processing: Supports batched inference for efficiency

Citation and Acknowledgments

Citation

If you use this model in your research, please cite:

@model{kikuyu-english-nllb-2024,
  title={Fine-tuned NLLB-200 for Kikuyu-English Translation},
  author={nickdee96},
  year={2024},
  url={https://huggingface.co/nickdee96/nllb-200-600M-kikuyu-english},
  note={Fine-tuned from facebook/nllb-200-distilled-600M}
}

Acknowledgments

Base model: Facebook AI Research - NLLB-200 team
Training framework: HuggingFace Transformers
Dataset: LDRI Language Project
Community: Kikuyu language speakers and researchers

Related Work

NLLB-200: No Language Left Behind: Scaling Human-Centered Machine Translation
Original NLLB models: Available on HuggingFace Model Hub
Multilingual NMT: Research in low-resource language translation

Model Versions and Updates

Version History

v1.0 (Current): Initial fine-tuned release
- Training dataset: 17,514 pairs
- Base model: facebook/nllb-200-distilled-600M
- Training epochs: 3
- Validation loss: 0.0375

Future Updates

Potential expansion with additional training data
Bidirectional translation capability (English→Kikuyu)
Domain-specific fine-tuning variants
Performance optimizations and model compression

Contact and Support

For questions, issues, or collaboration opportunities:

Model creator: nickdee96
Repository: ldri-language
Issues: Please report issues through the repository's issue tracker

This model card was generated on September 26, 2025, and reflects the current state of the model at the time of publication.

Downloads last month: 68

Safetensors

Model size

0.6B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nickdee96/nllb-200-600m-kikuyu-english

Base model

facebook/nllb-200-distilled-600M

Finetuned

(304)

this model

Spaces using nickdee96/nllb-200-600m-kikuyu-english 2

Paper for nickdee96/nllb-200-600m-kikuyu-english

No Language Left Behind: Scaling Human-Centered Machine Translation

Paper • 2207.04672 • Published Jul 11, 2022 • 4