Instructions to use DanielRegaladoCardoso/mayavoice-llama3.1-8b-lora-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DanielRegaladoCardoso/mayavoice-llama3.1-8b-lora-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/meta-llama-3.1-8b-instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "DanielRegaladoCardoso/mayavoice-llama3.1-8b-lora-v2") - Notebooks
- Google Colab
- Kaggle
🌽 MayaVoice — Machine Translation for Mayan Languages of Guatemala
MayaVoice is a machine translation system between Spanish and 14 Mayan languages of Guatemala. It is the first open-source large language model fine-tuned specifically for this linguistic family, designed to bridge the communication gap between Spanish-speaking services and the 6+ million Maya speakers across Guatemala.
Motivation
Guatemala is a multilingual country where 14 Mayan languages hold official recognition. Yet the linguistic barrier between Spanish and these languages severely limits access to healthcare, education, justice, and government services for millions of Maya speakers. MayaVoice aims to reduce this gap through AI-powered translation.
Supported Languages and Evaluation
The model was evaluated on a held-out test set of 300 sentences unseen during training:
| Language | Family | Speakers (est.) | N | BLEU | chrF |
|---|---|---|---|---|---|
| Mam | Mamean | 530,000 | 70 | 47.07 | 61.30 |
| Kaqchikel | K'ichean | 450,000 | 57 | 41.34 | 56.29 |
| Chuj | Q'anjob'alan | 65,000 | 11 | 31.32 | 54.78 |
| Q'eqchi' | K'ichean | 800,000 | 22 | 27.25 | 52.03 |
| Q'anjob'al | Q'anjob'alan | 170,000 | 23 | 23.38 | 43.92 |
| K'iche' | K'ichean | 1,100,000 | 18 | 21.32 | 38.57 |
| Tektiteko | Mamean | 5,000 | 20 | 15.76 | 43.78 |
| Sipakapense | K'ichean | 8,000 | 7 | 15.75 | 48.87 |
| Awakateko | Mamean | 20,000 | 12 | 15.46 | 34.46 |
| Poqomchi' | K'ichean | 115,000 | 7 | 15.15 | 31.88 |
| Poqomam | K'ichean | 50,000 | 24 | 11.21 | 48.97 |
| Achi | K'ichean | 150,000 | 4 | 8.30 | 44.68 |
| Itza' | Yucatecan | 1,000 | 7 | 4.02 | 17.21 |
| Tz'utujil | K'ichean | 90,000 | 18 | 2.86 | 28.81 |
| Weighted avg | ~3.5M | 300 | 40.28 | 55.07 |
Note on metrics: BLEU measures exact n-gram overlap; chrF measures character-level similarity, which is more appropriate for agglutinative languages like those of the Mayan family. Languages with lower N have wider confidence intervals.
Linguistic Observations
- Mamean languages (Mam, Tektiteko, Awakateko) and K'ichean languages (Kaqchikel, K'iche', Q'eqchi') show the strongest results, correlating with greater availability of parallel training data.
- Itza' (Yucatecan family) has the lowest performance, consistent with its status as a critically endangered language with very few digitized texts.
- chrF scores are substantially higher than BLEU across all languages, which is expected for morphologically complex languages where partial word matches capture translation quality more faithfully.
- Performance variation across the K'ichean branch (Kaqchikel 41.34 BLEU vs. Tz'utujil 2.86) highlights that genetic relatedness alone does not predict translation quality — data availability is the dominant factor.
Demo
Try MayaVoice live: 🌽 MayaVoice Space
Limitations
- Domain bias: The training corpus has a non-uniform domain distribution, which may affect quality on everyday conversational language.
- Low-resource languages: Itza', Tz'utujil, and Achi have significantly fewer training resources, reflected in their metrics.
- Hallucination risk: Like all generative models, MayaVoice may produce plausible-looking but incorrect translations. Human verification is recommended for critical use cases.
- Limited test set: 300 test sentences is a small evaluation set; per-language metrics with N < 10 are indicative, not conclusive.
Ethical Use
MayaVoice was developed to preserve and facilitate access to the Mayan languages of Guatemala. It should not be used to replace human translators in contexts where accuracy is critical (legal, medical), but rather as a support and accessibility tool.
Citation
@misc{regalado2025mayavoice,
author = {Regalado Cardoso, Daniel},
title = {MayaVoice: Machine Translation for 14 Mayan Languages of Guatemala},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/DanielRegaladoCardoso/mayavoice-llama3.1-8b-lora-v2}
}
Contact
- Developer: Daniel Regalado Cardoso
- Institution: University of Miami
- Demo: MayaVoice Space
- Downloads last month
- 4
Model tree for DanielRegaladoCardoso/mayavoice-llama3.1-8b-lora-v2
Base model
meta-llama/Llama-3.1-8BSpace using DanielRegaladoCardoso/mayavoice-llama3.1-8b-lora-v2 1
Evaluation results
- BLEU (macro avg)self-reported40.280
- chrF (macro avg)self-reported55.070