Instructions to use Nayana-cognitivelab/NayanaVQA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Nayana-cognitivelab/NayanaVQA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Nayana-cognitivelab/NayanaVQA")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Nayana-cognitivelab/NayanaVQA")
model = AutoModelForImageTextToText.from_pretrained("Nayana-cognitivelab/NayanaVQA")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Nayana-cognitivelab/NayanaVQA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Nayana-cognitivelab/NayanaVQA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nayana-cognitivelab/NayanaVQA",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Nayana-cognitivelab/NayanaVQA

SGLang

How to use Nayana-cognitivelab/NayanaVQA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Nayana-cognitivelab/NayanaVQA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nayana-cognitivelab/NayanaVQA",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Nayana-cognitivelab/NayanaVQA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nayana-cognitivelab/NayanaVQA",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use Nayana-cognitivelab/NayanaVQA with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Nayana-cognitivelab/NayanaVQA to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Nayana-cognitivelab/NayanaVQA to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Nayana-cognitivelab/NayanaVQA to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Nayana-cognitivelab/NayanaVQA",
    max_seq_length=2048,
)

Docker Model Runner
How to use Nayana-cognitivelab/NayanaVQA with Docker Model Runner:
```
docker model run hf.co/Nayana-cognitivelab/NayanaVQA
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🤖 Nayana VQA - Advanced Kannada Visual Question Answering Model

Developed by: CognitiveLab
License: Apache 2.0
Base Model: unsloth/gemma-3n-E4B-it
Architecture: Gemma 3n (4B parameters)

🌟 Model Overview

Nayana VQA is an advanced vision-language model specifically fine-tuned for Visual Question Answering (VQA) and Document Visual Question Answering (Document VQA) tasks. Built on the powerful Gemma 3n architecture, this model excels at understanding and answering questions about visual content, with a special focus on Kannada language support.

🌍 Supported Languages

Kannada (kn) - Primary focus language

More languages coming soon! We are actively working on expanding language support to include additional 20 languages

🎯 Key Features

Visual Question Answering: Accurate question answering from images in Kannada
Document Understanding: Advanced comprehension of document layouts and content
Multimodal Reasoning: Combines visual and textual understanding for complex queries
Fast Inference: Optimized for real-time applications
High Accuracy: Fine-tuned on diverse VQA datasets
Easy Integration: Compatible with Transformers and Modal deployment

📋 Model Specifications

Parameter	Value
Model Size	4B parameters
Context Length	32K tokens
Image Resolution	Flexible (optimized for documents and general images)
Precision	BFloat16
Framework	Transformers + Unsloth

🚀 Quick Start

Installation

pip install transformers torch pillow unsloth

Basic Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch

# Load model and processor
model_id = "Nayana-cognitivelab/NayanaVQA"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# System prompt
system_prompt = "You are Nayana, an advanced AI assistant developed by CognitiveLab. You specialize in vision-based tasks, particularly Visual Question Answering (VQA) and Document Visual Question Answering (Document VQA). You are highly accurate, fast, and reliable when working with visual content. You can understand and respond to questions about images in Kannada with high precision."

# Load and process image
image = Image.open("your_image.jpg")
user_question = "ಈ ಚಿತ್ರದಲ್ಲಿ ಏನಿದೆ?"  # "What is in this image?" in Kannada

# Prepare messages
messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": system_prompt}]
    },
    {
        "role": "user", 
        "content": [
            {"type": "text", "text": user_question},
            {"type": "image", "image": image}
        ]
    }
]

# Apply chat template
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
)

# Generate response
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        temperature=1.0,
        top_p=0.95,
        top_k=64,
        do_sample=True
    )

# Decode response
response = processor.tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:], 
    skip_special_tokens=True
)
print(response)

This model was trained 2x faster with Unsloth and Hugging Face's TRL library.

📜 Citation

@model{nayana_vqa_2024,
  title={Nayana VQA: Advanced Kannada Visual Question Answering with Gemma 3n},
  author={CognitiveLab},
  year={2024},
  url={https://huggingface.co/Nayana-cognitivelab/NayanaVQA}
}