# Bengali AI Model ## 📊 Model Details - **Base Model**: microsoft/DialoGPT-medium - **Language**: Bengali (Bangla) - **Parameters**: ~355M parameters - **Training**: Adapted for Bengali instruction following - **Format**: PyTorch weights ## 🚀 Quick Start ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Load model tokenizer = AutoTokenizer.from_pretrained("./bangla_ai_ready") model = AutoModelForCausalLM.from_pretrained("./bangla_ai_ready") # Set pad token tokenizer.pad_token = tokenizer.eos_token # Generate response def generate_bengali_response(instruction): prompt = f"āύāĻŋāĻ°ā§āĻĻ⧇āĻļāύāĻž: {instruction} āωāĻ¤ā§āϤāϰ:" input_ids = tokenizer.encode(prompt, return_tensors="pt", max_length=400, truncation=True) with torch.no_grad(): outputs = model.generate( input_ids, max_length=input_ids.shape[1] + 100, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) return response[len(prompt):].strip() # Usage response = generate_bengali_response("āĻŦāĻžāĻ‚āϞāĻžāĻĻ⧇āĻļ⧇āϰ āϰāĻžāϜāϧāĻžāύ⧀ āϕ⧀?") print(response) ``` ## 📝 Example Usage ### Educational Queries ```python generate_bengali_response("āĻ—āĻŖāĻŋāϤ⧇āϰ āĻŽā§ŒāϞāĻŋāĻ• āύ⧀āϤāĻŋ āĻŦāϞ⧁āύ") generate_bengali_response("āĻŦāĻžāĻ‚āϞāĻž āϏāĻžāĻšāĻŋāĻ¤ā§āϝ⧇āϰ āχāϤāĻŋāĻšāĻžāϏ āĻŦāĻ°ā§āĻŖāύāĻž āĻ•āϰ⧁āύ") ``` ### General Knowledge ```python generate_bengali_response("āĻŦāĻžāĻ‚āϞāĻžāĻĻ⧇āĻļ⧇āϰ āϏāĻ‚āĻ¸ā§āĻ•ā§ƒāϤāĻŋ āϏāĻŽā§āĻĒāĻ°ā§āϕ⧇ āĻŦāϞ⧁āύ") generate_bengali_response("āĻ¸ā§āĻŦāĻžāĻ¸ā§āĻĨā§āϝāĻ•āϰ āĻĨāĻžāĻ•āĻžāϰ āωāĻĒāĻžāϝāĻŧ āĻŦāϞ⧁āύ") ``` ### Practical Advice ```python generate_bengali_response("āĻĻ⧈āύāĻ¨ā§āĻĻāĻŋāύ āĻœā§€āĻŦāύ⧇ āϏāĻŽāϝāĻŧ āĻŦā§āϝāĻŦāĻ¸ā§āĻĨāĻžāĻĒāύāĻžāϰ āϟāĻŋāĻĒāϏ āĻĻāĻŋāύ") ``` ## 🔧 Model Configuration - **Max Length**: 512 tokens - **Temperature**: 0.7 (for creative responses) - **Input Format**: "āύāĻŋāĻ°ā§āĻĻ⧇āĻļāύāĻž: {instruction} āωāĻ¤ā§āϤāϰ:" - **Language**: Bengali (Bangla script) ## 📁 Files - `pytorch_model.bin` - Model weights - `config.json` - Model configuration - `tokenizer.json` - Tokenizer configuration - `vocab.json` - Vocabulary - `merges.txt` - BPE merges - `README.md` - This documentation ## đŸŽ¯ Performance - **Speed**: ~1-2 seconds per response - **Language**: Optimized for Bengali - **Memory**: ~2GB RAM required - **Compatibility**: Python 3.8+, PyTorch 2.0+ ## 📜 License This model is based on microsoft/DialoGPT-medium and adapted for Bengali language use.