Instructions to use Nayana-cognitivelab/NayanaVQA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Nayana-cognitivelab/NayanaVQA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Nayana-cognitivelab/NayanaVQA") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Nayana-cognitivelab/NayanaVQA") model = AutoModelForImageTextToText.from_pretrained("Nayana-cognitivelab/NayanaVQA") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Nayana-cognitivelab/NayanaVQA with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Nayana-cognitivelab/NayanaVQA" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nayana-cognitivelab/NayanaVQA", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Nayana-cognitivelab/NayanaVQA
- SGLang
How to use Nayana-cognitivelab/NayanaVQA with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Nayana-cognitivelab/NayanaVQA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nayana-cognitivelab/NayanaVQA", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Nayana-cognitivelab/NayanaVQA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Nayana-cognitivelab/NayanaVQA", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio
How to use Nayana-cognitivelab/NayanaVQA with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Nayana-cognitivelab/NayanaVQA to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Nayana-cognitivelab/NayanaVQA to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Nayana-cognitivelab/NayanaVQA to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Nayana-cognitivelab/NayanaVQA", max_seq_length=2048, ) - Docker Model Runner
How to use Nayana-cognitivelab/NayanaVQA with Docker Model Runner:
docker model run hf.co/Nayana-cognitivelab/NayanaVQA
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Nayana-cognitivelab/NayanaVQA to start chattingUsing HuggingFace Spaces for Unsloth
# No setup required# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Nayana-cognitivelab/NayanaVQA to start chattingLoad model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="Nayana-cognitivelab/NayanaVQA",
max_seq_length=2048,
)🤖 Nayana VQA - Advanced Kannada Visual Question Answering Model
Developed by: CognitiveLab
License: Apache 2.0
Base Model: unsloth/gemma-3n-E4B-it
Architecture: Gemma 3n (4B parameters)
🌟 Model Overview
Nayana VQA is an advanced vision-language model specifically fine-tuned for Visual Question Answering (VQA) and Document Visual Question Answering (Document VQA) tasks. Built on the powerful Gemma 3n architecture, this model excels at understanding and answering questions about visual content, with a special focus on Kannada language support.
🌍 Supported Languages
- Kannada (kn) - Primary focus language
More languages coming soon! We are actively working on expanding language support to include additional 20 languages
🎯 Key Features
- Visual Question Answering: Accurate question answering from images in Kannada
- Document Understanding: Advanced comprehension of document layouts and content
- Multimodal Reasoning: Combines visual and textual understanding for complex queries
- Fast Inference: Optimized for real-time applications
- High Accuracy: Fine-tuned on diverse VQA datasets
- Easy Integration: Compatible with Transformers and Modal deployment
📋 Model Specifications
| Parameter | Value |
|---|---|
| Model Size | 4B parameters |
| Context Length | 32K tokens |
| Image Resolution | Flexible (optimized for documents and general images) |
| Precision | BFloat16 |
| Framework | Transformers + Unsloth |
🚀 Quick Start
Installation
pip install transformers torch pillow unsloth
Basic Usage
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch
# Load model and processor
model_id = "Nayana-cognitivelab/NayanaVQA"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
# System prompt
system_prompt = "You are Nayana, an advanced AI assistant developed by CognitiveLab. You specialize in vision-based tasks, particularly Visual Question Answering (VQA) and Document Visual Question Answering (Document VQA). You are highly accurate, fast, and reliable when working with visual content. You can understand and respond to questions about images in Kannada with high precision."
# Load and process image
image = Image.open("your_image.jpg")
user_question = "ಈ ಚಿತ್ರದಲ್ಲಿ ಏನಿದೆ?" # "What is in this image?" in Kannada
# Prepare messages
messages = [
{
"role": "system",
"content": [{"type": "text", "text": system_prompt}]
},
{
"role": "user",
"content": [
{"type": "text", "text": user_question},
{"type": "image", "image": image}
]
}
]
# Apply chat template
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
)
# Generate response
with torch.inference_mode():
outputs = model.generate(
**inputs,
max_new_tokens=1024,
temperature=1.0,
top_p=0.95,
top_k=64,
do_sample=True
)
# Decode response
response = processor.tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
)
print(response)
This model was trained 2x faster with Unsloth and Hugging Face's TRL library.
📜 Citation
@model{nayana_vqa_2024,
title={Nayana VQA: Advanced Kannada Visual Question Answering with Gemma 3n},
author={CognitiveLab},
year={2024},
url={https://huggingface.co/Nayana-cognitivelab/NayanaVQA}
}
- Downloads last month
- 4
Model tree for Nayana-cognitivelab/NayanaVQA
Base model
google/gemma-3n-E4B
Install Unsloth Studio (macOS, Linux, WSL)
# Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Nayana-cognitivelab/NayanaVQA to start chatting