Text Generation
Transformers
Safetensors
Tamil
English
qwen2
tamil
qlora
instruction-tuning
morphology
dravidian
text-generation-inference
Instructions to use Tamil-ai/tamil-qwen25-7b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Tamil-ai/tamil-qwen25-7b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Tamil-ai/tamil-qwen25-7b-instruct")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Tamil-ai/tamil-qwen25-7b-instruct") model = AutoModelForCausalLM.from_pretrained("Tamil-ai/tamil-qwen25-7b-instruct") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Tamil-ai/tamil-qwen25-7b-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Tamil-ai/tamil-qwen25-7b-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tamil-ai/tamil-qwen25-7b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Tamil-ai/tamil-qwen25-7b-instruct
- SGLang
How to use Tamil-ai/tamil-qwen25-7b-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Tamil-ai/tamil-qwen25-7b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tamil-ai/tamil-qwen25-7b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Tamil-ai/tamil-qwen25-7b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Tamil-ai/tamil-qwen25-7b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Tamil-ai/tamil-qwen25-7b-instruct with Docker Model Runner:
docker model run hf.co/Tamil-ai/tamil-qwen25-7b-instruct
| language: | |
| - ta | |
| - en | |
| license: apache-2.0 | |
| base_model: Qwen/Qwen2.5-7B-Instruct | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - tamil | |
| - qwen2 | |
| - qlora | |
| - instruction-tuning | |
| - morphology | |
| - dravidian | |
| datasets: | |
| - Tamil-ai/samacheer-kalvi-tamil | |
| model-index: | |
| - name: Tamil-Qwen2.5-7B-Instruct | |
| results: [] | |
| # Tamil-Qwen2.5-7B-Instruct | |
| A Tamil-specialized instruction-tuned LLM built on Qwen2.5-7B-Instruct using QLoRA fine-tuning on 150K deduplicated Tamil instruction pairs. | |
| **Paper:** *"A Thousand Language Problem: Morphological Understanding in Linguistic AI"* | |
| ## Model Details | |
| | Property | Value | | |
| |----------|-------| | |
| | Base model | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | | |
| | Parameters | 7.6B | | |
| | Method | QLoRA (r=64, alpha=128, dropout=0.05) | | |
| | Training data | 150K deduplicated Tamil instruction-response pairs | | |
| | Tokenizer efficiency | 4.62x ratio (best among tested models for Tamil) | | |
| | Compute | RunPod RTX 5090, ~$5 total cost | | |
| | Sequence length | 1024 | | |
| | Batch size | 32 (effective) | | |
| | Epochs | 1 | | |
| ## Training Data | |
| 150,000 deduplicated instruction-response pairs from 5 Tamil datasets: | |
| - Tamil Alpaca | |
| - Tamil Orca | |
| - Tamil Dolly | |
| - Tamil-ai/samacheer-kalvi-tamil (morphological drills + grammar QA) | |
| - Additional Tamil instruction sets | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "Tamil-ai/tamil-qwen25-7b-instruct" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype="auto", | |
| device_map="auto", | |
| ) | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful Tamil language assistant."}, | |
| {"role": "user", "content": "வீடு என்ற சொல்லின் வேற்றுமை வடிவங்களைக் கூறுக."}, | |
| ] | |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=256) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ### 4-bit Quantized (for limited VRAM) | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "Tamil-ai/tamil-qwen25-7b-instruct", | |
| quantization_config=BitsAndBytesConfig(load_in_4bit=True), | |
| device_map="auto", | |
| ) | |
| ``` | |
| ## Why Qwen2.5? | |
| Tokenizer analysis across 6 base models showed Qwen2.5 has the best Tamil tokenization efficiency: | |
| | Model | Tamil Token Ratio | Verdict | | |
| |-------|------------------|---------| | |
| | **Qwen2.5** | **4.62x** | Best for Tamil | | |
| | Llama 3.1 | 5.8x | | | |
| | Gemma 2 | 6.1x | | | |
| | Mistral | 7.2x | | | |
| | Falcon | 10.5x | Worst | | |
| Lower ratio = fewer tokens per Tamil word = more efficient training and inference. | |
| ## Intended Use | |
| - Tamil question answering and instruction following | |
| - Tamil morphological analysis | |
| - Tamil grammar and linguistics tasks | |
| - Research on low-resource language LLMs | |
| ## Limitations | |
| - Trained primarily on instructional Tamil; may underperform on colloquial/slang | |
| - Morphological accuracy varies by category (see benchmark results) | |
| - English capabilities may degrade compared to base Qwen2.5 | |
| ## Citation | |
| ```bibtex | |
| @misc{tamilai2026, | |
| title={A Thousand Language Problem: Morphological Understanding in Linguistic AI}, | |
| author={Tamil-AI}, | |
| year={2026}, | |
| publisher={HuggingFace}, | |
| url={https://huggingface.co/Tamil-ai/tamil-qwen25-7b-instruct} | |
| } | |
| ``` | |