Instructions to use dheeyantra/dhee-nxtgen-qwen3-kannada-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dheeyantra/dhee-nxtgen-qwen3-kannada-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dheeyantra/dhee-nxtgen-qwen3-kannada-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dheeyantra/dhee-nxtgen-qwen3-kannada-v2")
model = AutoModelForCausalLM.from_pretrained("dheeyantra/dhee-nxtgen-qwen3-kannada-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use dheeyantra/dhee-nxtgen-qwen3-kannada-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dheeyantra/dhee-nxtgen-qwen3-kannada-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dheeyantra/dhee-nxtgen-qwen3-kannada-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dheeyantra/dhee-nxtgen-qwen3-kannada-v2

SGLang

How to use dheeyantra/dhee-nxtgen-qwen3-kannada-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dheeyantra/dhee-nxtgen-qwen3-kannada-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dheeyantra/dhee-nxtgen-qwen3-kannada-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dheeyantra/dhee-nxtgen-qwen3-kannada-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dheeyantra/dhee-nxtgen-qwen3-kannada-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dheeyantra/dhee-nxtgen-qwen3-kannada-v2 with Docker Model Runner:
```
docker model run hf.co/dheeyantra/dhee-nxtgen-qwen3-kannada-v2
```

Dhee-NxtGen-Qwen3-Kannada-v2

Model Description

Dhee-NxtGen-Qwen3-Kannada-v2 is a large language model designed for advanced Kannada language understanding and generation.
It is based on the Qwen3 architecture and fine-tuned for assistant-style, function-calling, and reasoning-based conversational tasks.

Developed by DheeYantra in collaboration with NxtGen Cloud Technologies Pvt. Ltd., this model is ideal for building intelligent Kannada chatbots, reasoning systems, and task-based dialogue agents.

Key Features

Fluent, context-aware Kannada text generation
Optimized for assistant-style and reasoning conversations
Handles open-ended generation, summarization, and Q&A
Fully compatible with 🤗 Hugging Face Transformers
Supports VLLM for high-performance inference

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "dheeyantra/dhee-nxtgen-qwen3-kannada-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Example prompt
prompt = """<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
ನೀವು ನನಗಾಗಿ ಒಂದು ಅಪಾಯಿಂಟ್ಮೆಂಟ್ ನಿಗದಿಪಡಿಸಬಹುದೇ?<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Uses & Limitations

Intended Uses

Kannada conversational chatbots and assistants
Function-calling and structured response generation
Story generation and summarization in Kannada
Natural dialogue systems for Indic AI applications

Limitations

May generate inaccurate or biased responses in rare cases
Performance can vary on out-of-domain or code-mixed inputs
Primarily optimized for Kannada; other languages may produce less fluent results

VLLM / High-Performance Serving Requirements

For high-throughput serving with vLLM, ensure the following environment:

GPU with compute capability ≥ 8.0 (e.g., NVIDIA A100)
PyTorch 2.1+ and CUDA toolkit installed
For V100 GPUs (sm70), vLLM GPU inference is not supported; CPU fallback is possible but slower.

Install dependencies:

pip install torch transformers vllm sentencepiece

Run vLLM server:

vllm serve   --model dheeyantra/dhee-nxtgen-qwen3-kannada-v2   --host 0.0.0.0   --port 8000

License

Released under the Apache 2.0 License.

Developed by DheeYantra in collaboration with NxtGen Cloud Technologies Pvt. Ltd.

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

F16

Collection including dheeyantra/dhee-nxtgen-qwen3-kannada-v2

Dhee-NxtGen-Qwen3-2B-v2

Collection

Dhee-NxtGen-Qwen3-1.7B-v2 is a multilingual LLM series by DheeYantra and NxtGen Cloud Technologies, based on Qwen3-1.7B and built for Indian languages • 14 items • Updated Mar 2 • 2