--- language: - kk license: cc-by-sa-4.0 library_name: transformers tags: - gec - grammatical-error-correction - kazakh - mt5 - seq2seq base_model: stukenov/kazakh-gec-mt5-base-run12-kazsandra-new pipeline_tag: text2text-generation --- # kazakh-gec-mt5-base-run13-finetune Run 13: Latest and best mT5-base GEC model — final fine-tuning. ## Overview | Property | Value | |----------|-------| | **Task** | Kazakh Grammatical Error Correction | | **Architecture** | mt5-base (seq2seq) | | **Base model** | [stukenov/kazakh-gec-mt5-base-run12-kazsandra-new](https://huggingface.co/stukenov/kazakh-gec-mt5-base-run12-kazsandra-new) | | **Training data** | [kazakh-synthetic-gec-datasets](https://huggingface.co/datasets/stukenov/kazakh-synthetic-gec-datasets) | | **Language** | Kazakh (kk) | | **License** | CC-BY-SA-4.0 | Best mT5-base variant. Final fine-tuning stage. ## Usage ```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("stukenov/kazakh-gec-mt5-base-run13-finetune") model = AutoModelForSeq2SeqLM.from_pretrained("stukenov/kazakh-gec-mt5-base-run13-finetune") input_text = "gec: " + "Мен кеше мектепке бардым" inputs = tokenizer(input_text, return_tensors="pt", max_length=128, truncation=True) outputs = model.generate(**inputs, max_new_tokens=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Details - Fine-tuned from [stukenov/kazakh-gec-mt5-base-run12-kazsandra-new](https://huggingface.co/stukenov/kazakh-gec-mt5-base-run12-kazsandra-new) - Training data: 1M+ synthetic GEC pairs (correct Kazakh with introduced errors) - Task prefix: "gec: " ## Project Part of the [Kazakh GEC](https://huggingface.co/collections/stukenov) project, building grammatical error correction models for Kazakh. ## Citation ```bibtex @misc{tukenov2026gec, title={Kazakh Grammatical Error Correction with mT5}, author={Tukenov, Saken}, year={2026}, url={https://huggingface.co/stukenov/kazakh-gec-mt5-base-run13-finetune} } ``` ## License CC-BY-SA-4.0 ## Benchmark Results Evaluated on **100-example custom GEC test** (pure model inference, no pre/post pipeline). | Category | Score | |----------|-------| | Орфография (емле) | 0/30 (0%) | | Грамматика | 2/20 (10%) | | Пунктуация | 0/15 (0%) | | Смешанный | 0/20 (0%) | | Identity preservation | 3/15 (20%) | | **Total** | **5/100 (5%)** | ## Leaderboard (100-example custom benchmark) | Модель | Total | Емле/30 | Грамм/20 | Пункт/15 | Смеш/20 | Ident/15 | |--------|-------|---------|----------|----------|---------|---------| | **[sozkz-core-llama-600m-kk-gec-v1](https://huggingface.co/stukenov/sozkz-core-llama-600m-kk-gec-v1)** | **47%** | 15 | 12 | 3 | 2 | 15/15 | | [sozkz-fix-qwen-500m-kk-gec-v3](https://huggingface.co/stukenov/sozkz-fix-qwen-500m-kk-gec-v3) | 38% | 0 | 16 | 9 | 0 | 13/15 | | [sozkz-core-llama-300m-kk-gec-v4](https://huggingface.co/stukenov/sozkz-core-llama-300m-kk-gec-v4) | 37% | 9 | 6 | 4 | 3 | 15/15 | | [sozkz-fix-qwen-500m-kk-gec-v1](https://huggingface.co/stukenov/sozkz-fix-qwen-500m-kk-gec-v1) | 35% | 0 | 12 | 8 | 0 | 15/15 | | [sozkz-fix-qwen-500m-kk-gec-v2](https://huggingface.co/stukenov/sozkz-fix-qwen-500m-kk-gec-v2) | 30% | 0 | 11 | 7 | 0 | 12/15 | | [sozkz-core-llama-1b-kk-gec-v1](https://huggingface.co/stukenov/sozkz-core-llama-1b-kk-gec-v1) | 16% | 2 | 6 | 1 | 0 | 7/15 | | [sozkz-fix-qwen-500m-kk-gec-v4](https://huggingface.co/stukenov/sozkz-fix-qwen-500m-kk-gec-v4) | 5% | 0 | 1 | 4 | 0 | 0/15 | | [sozkz-fix-mt5b-kk-gec-run13-v1](https://huggingface.co/stukenov/sozkz-fix-mt5b-kk-gec-run13-v1) | 5% | 0 | 2 | 0 | 0 | 3/15 | | [sozkz-nllb-1b-kk-gec-v1](https://huggingface.co/stukenov/sozkz-nllb-1b-kk-gec-v1) | 1% | 0 | 1 | 0 | 0 | 0/15 | | [sozkz-nllb-1b-kk-pretrain-v1](https://huggingface.co/stukenov/sozkz-nllb-1b-kk-pretrain-v1) | 1% | 0 | 1 | 0 | 0 | 0/15 | | [sozkz-core-llama-300m-kk-gec-v3](https://huggingface.co/stukenov/sozkz-core-llama-300m-kk-gec-v3) | 1% | 0 | 1 | 0 | 0 | 0/15 | | sozkz-core-llama-300m-kk-gec-v1/v2a/v2b | 0–1% | 0 | 0 | 0 | 0 | 0–1 | | sozkz-fix-mt5-50m-kk-gec-v1 | 0% | 0 | 0 | 0 | 0 | 0/15 |