| --- |
| license: mit |
| language: |
| - en |
| - es |
| - fr |
| - ru |
| tags: |
| - grammar-correction |
| - multilingual |
| - mt5 |
| datasets: |
| - custom |
| pipeline_tag: text2text-generation |
| --- |
| |
| # Multilingual Grammar Corrector using mT5-small |
|
|
| This is a fine-tuned [`mT5-small`](https://huggingface.co/google/mt5-small) model for **multilingual grammar correction** in English 99%, Spanish 75%, French 70%, and Russian 80%. It was trained on synthetic and human-curated data to correct grammatical mistakes in short sentences. |
|
|
| ## ✨ Example |
|
|
| **Input:** |
| > She go to school yesterday. |
|
|
| **Output:** |
| > She went to school yesterday. |
|
|
| --- |
|
|
| ## 🧠 Model Details |
|
|
| - **Architecture:** mT5-small |
| - **Layers:** 8 |
| - **Heads:** 6 |
| - **Languages supported:** English, Spanish, French, Russian |
| - **Tokenization:** SentencePiece with special tokens `<pad>`, `</s>`, `<unk>` |
|
|
| ## 📦 How to Use |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
| |
| model = AutoModelForSeq2SeqLM.from_pretrained("your-username/Multilingual-Grammar-Corrector") |
| tokenizer = AutoTokenizer.from_pretrained("your-username/Multilingual-Grammar-Corrector") |
| |
| input_text = "She go to school yesterday." |
| inputs = tokenizer(input_text, return_tensors="pt") |
| |
| output = model.generate(**inputs, max_new_tokens=64) |
| corrected = tokenizer.decode(output[0], skip_special_tokens=True) |
| |
| print(corrected) # ➜ She went to school yesterday. |