Model Card: NLLB-200-distilled-600M Fine-tuned for Kikuyu-English Translation

Model Overview

Model Name: nickdee96/nllb-200-600M-kikuyu-english
Base Model: facebook/nllb-200-distilled-600M
Task: Neural Machine Translation (Kikuyu ↔ English)
Language Pair: Kikuyu (kik_Latn) → English (eng_Latn)
Model Type: Sequence-to-Sequence Transformer
License: Same as base model (CC-BY-NC)

Model Description

This model is a fine-tuned version of Facebook's NLLB-200-distilled-600M (No Language Left Behind) specifically optimized for Kikuyu to English translation. The model leverages the multilingual capabilities of NLLB-200 and has been fine-tuned on a curated dataset of Kikuyu-English parallel text pairs to improve translation quality for this under-resourced language pair.

Key Features

  • Specialized for Kikuyu: Fine-tuned specifically for Kikuyu (Gĩkũyũ) language translation
  • High-quality training data: Trained on 17,514 carefully filtered translation pairs
  • Optimized performance: Achieved low validation loss (0.0375) after fine-tuning
  • Production ready: Includes both HuggingFace Transformers format and optimized inference setup

Training Details

Dataset

  • Source: Kikuyu translation pairs dataset from LDRI language project
  • Total pairs: 17,514 valid translation pairs
  • Training split: 15,762 pairs (90%)
  • Validation split: 1,752 pairs (10%)
  • Data filtering: Removed short texts (<3 characters) and common non-linguistic responses
  • Domain: Conversational and agricultural content

Training Configuration

  • Base model: facebook/nllb-200-distilled-600M
  • Training epochs: 3
  • Batch size: 4 per device
  • Gradient accumulation steps: 8 (effective batch size: 32)
  • Learning rate: 5e-5
  • Scheduler: Cosine annealing
  • Warmup steps: 100
  • Max sequence length: 512 tokens
  • Optimization: FP16 precision for memory efficiency

Training Infrastructure

  • Hardware: CUDA-enabled GPU
  • Framework: HuggingFace Transformers 4.x
  • Training time: ~1 hour 14 minutes (4,454.86 seconds)
  • Memory optimization: FP16, gradient accumulation, device mapping

Performance

Training Metrics

  • Final training loss: 1.1482
  • Final validation loss: 0.0375
  • Training progression: Consistent loss reduction from 3.64 to 0.036 over training steps
  • Convergence: Model showed good convergence with stable validation loss

Translation Examples

Kikuyu Model Translation Reference
Ũrendia atĩa How do you sell? How you are selling
Nĩgetha ũkamenya kana hĩndĩ ĩrĩa ndahanda kĩmera gĩkĩ rĩ So that you can know when I planted this summer So that you can know when I plant in this season

Model Capabilities

  • Strong semantic understanding: Captures meaning accurately even with complex sentence structures
  • Context awareness: Handles conversational contexts and agricultural terminology
  • Code-switching handling: Can process mixed language inputs with [cs] markers
  • Robust to variations: Performs well on different sentence lengths and structures

Usage

HuggingFace Transformers

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("nickdee96/nllb-200-600M-kikuyu-english")
model = AutoModelForSeq2SeqLM.from_pretrained("nickdee96/nllb-200-600M-kikuyu-english")

# Create translation pipeline
translator = pipeline(
    "translation",
    model=model,
    tokenizer=tokenizer,
    src_lang="kik_Latn",
    tgt_lang="eng_Latn"
)

# Translate Kikuyu to English
result = translator("Ũrendia atĩa")
print(result[0]['translation_text'])

Direct Model Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("nickdee96/nllb-200-600M-kikuyu-english")
model = AutoModelForSeq2SeqLM.from_pretrained("nickdee96/nllb-200-600M-kikuyu-english")

# Tokenize input
inputs = tokenizer("Ũrendia atĩa", return_tensors="pt")

# Generate translation
outputs = model.generate(**inputs, max_length=512)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

Limitations and Considerations

Known Limitations

  • Domain specificity: Primarily trained on conversational and agricultural content
  • Data size: Limited by the size of available Kikuyu-English parallel data
  • Dialectal variations: May not capture all Kikuyu dialectal variations
  • Formal register: May be less effective on highly formal or technical texts
  • One-way optimization: Primarily optimized for Kikuyu→English direction

Ethical Considerations

  • Language preservation: Contributes to digital language preservation for Kikuyu
  • Cultural sensitivity: Trained on authentic conversational data respecting cultural context
  • Bias mitigation: Filtered training data to remove inappropriate content
  • Fair use: Intended for research and educational purposes

Recommended Use Cases

  • Educational tools: Language learning applications
  • Research: Computational linguistics and MT research
  • Documentation: Helping preserve and translate Kikuyu texts
  • Communication: Assisting with basic Kikuyu-English communication

Not Recommended For

  • Critical translations: Legal, medical, or safety-critical contexts without human review
  • Real-time interpretation: High-stakes real-time communication
  • Commercial applications: Without proper validation and testing
  • Formal documents: Official government or business documents without review

Technical Specifications

Model Architecture

  • Architecture: M2M100ForConditionalGeneration (NLLB variant)
  • Parameters: ~600M (distilled version)
  • Encoder layers: 12
  • Decoder layers: 12
  • Hidden size: 1024
  • Attention heads: 16
  • Vocabulary size: Model-specific NLLB vocabulary

Input/Output Specifications

  • Input language: Kikuyu (kik_Latn)
  • Output language: English (eng_Latn)
  • Max input length: 512 tokens
  • Tokenization: SentencePiece-based (NLLB tokenizer)
  • Special tokens: Language-specific prefix tokens

Performance Benchmarks

  • Inference speed: GPU-optimized for real-time translation
  • Memory requirements: ~2.5GB GPU memory for inference
  • Batch processing: Supports batched inference for efficiency

Citation and Acknowledgments

Citation

If you use this model in your research, please cite:

@model{kikuyu-english-nllb-2024,
  title={Fine-tuned NLLB-200 for Kikuyu-English Translation},
  author={nickdee96},
  year={2024},
  url={https://huggingface.co/nickdee96/nllb-200-600M-kikuyu-english},
  note={Fine-tuned from facebook/nllb-200-distilled-600M}
}

Acknowledgments

  • Base model: Facebook AI Research - NLLB-200 team
  • Training framework: HuggingFace Transformers
  • Dataset: LDRI Language Project
  • Community: Kikuyu language speakers and researchers

Related Work

Model Versions and Updates

Version History

  • v1.0 (Current): Initial fine-tuned release
    • Training dataset: 17,514 pairs
    • Base model: facebook/nllb-200-distilled-600M
    • Training epochs: 3
    • Validation loss: 0.0375

Future Updates

  • Potential expansion with additional training data
  • Bidirectional translation capability (English→Kikuyu)
  • Domain-specific fine-tuning variants
  • Performance optimizations and model compression

Contact and Support

For questions, issues, or collaboration opportunities:

  • Model creator: nickdee96
  • Repository: ldri-language
  • Issues: Please report issues through the repository's issue tracker

This model card was generated on September 26, 2025, and reflects the current state of the model at the time of publication.

Downloads last month
68
Safetensors
Model size
0.6B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nickdee96/nllb-200-600m-kikuyu-english

Finetuned
(304)
this model

Spaces using nickdee96/nllb-200-600m-kikuyu-english 2

Paper for nickdee96/nllb-200-600m-kikuyu-english