Instructions to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis",
	filename="gguf/qwen3-vl-2b-instruct.BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
# Run inference directly in the terminal:
llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
# Run inference directly in the terminal:
llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
# Run inference directly in the terminal:
./llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Use Docker

docker model run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

LM Studio
Jan
Ollama
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Ollama:
```
ollama run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
```

Unsloth Studio

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Docker Model Runner:
```
docker model run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
```

Lemonade

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Run and chat with the model

lemonade run user.Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis-BF16

List all available models

lemonade list

Qwen3-VL-2B-Instruct Vietnamese Bill Diagnosis (4-bit)

Mô hình Qwen3-VL-2B-Instruct Vietnamese Bill Diagnosis được fine-tune cho tác vụ trích xuất thông tin hóa đơn, phiếu thu, đơn thuốc và chuẩn đoán bệnh tiếng Việt.
Model hỗ trợ hiểu hình ảnh và văn bản, phù hợp cho các bài toán OCR nâng cao, document understanding và information extraction.

🔥 Điểm nổi bật

✅ Tối ưu cho tiếng Việt
✅ Fine-tune cho bill / invoice / prescription / diagnosis extraction
✅ Phiên bản 4-bit (bnb) giúp giảm VRAM khi inference
✅ Có thể chuyển sang GGUF để chạy local CPU
✅ Tương thích với transformers

📂 Cấu trúc Repository

/merged_16bit
Chứa trọng số bnb 4-bit để chạy với thư viện transformers + bitsandbytes.
/gguf
Phiên bản GGUF dành cho llama.cpp hoặc các engine suy luận local.

Bao gồm:
- Qwen3-VL-2B-Instruct-Vietnamese.Q4_K_M.gguf — bản nén 4-bit chất lượng cao
- Qwen3-VL-2B-Instruct-Vietnamese.mmproj.gguf — file projector xử lý hình ảnh

🚀 Hướng dẫn sử dụng

✅ Với Transformers

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "minhduc168/Qwen3-VL-2B-Instruct-Vietnamese",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "minhduc168/Qwen3-VL-2B-Instruct-Vietnamese"
)

⚠️ Lưu ý quan trọng khi dùng GGUF (Vision Model)

Đối với các model Vision-Language như Qwen3-VL, khi chuyển sang GGUF:

Bắt buộc cần 2 file:

1️⃣ Model chính (.gguf)
2️⃣ Projector (mmproj.gguf)

👉 Thiếu file projector → model không thể xử lý hình ảnh.

📊 Dataset

Bao gồm:

Hóa đơn bán lẻ
Phiếu thu
Đơn thuốc
Chứng từ tiếng Việt
Chuẩn đoán bệnh

Định dạng instruction-following giúp model trích xuất dữ liệu có cấu trúc chính xác hơn.

Format mẫu

📌 Example 1 – BILL_OCR

{
  "id": "bill_f18b567055ed4c5d85143770499b2f65",
  "image": "bill_001.jpg",
  "conversations": [
    {
      "from": "human",
      "value": "<image>\n[BILL_OCR] Trích xuất danh sách dịch vụ và thông tin liên quan từ bảng trong ảnh.\nKhông suy đoán hoặc bổ sung thông tin không có trong ảnh."
    },
    {
      "from": "gpt",
      "value": "[
        {
            "stt": 1,
            "ten_dich_vu": "STTOMET 1000mg + 50mg",
            "so_luong": "56",
            "don_vi_tinh": "Viên",
            "don_gia": "5.390",
            "thanh_tien": "301.840"
        },
        {
            "stt": 2,
            "ten_dich_vu": "JARDIANCE DUO 12.5mg + 1000mg",
            "so_luong": "56",
            "don_vi_tinh": "Viên",
            "don_gia": "17.976",
            "thanh_tien": "1.006.656"
        },
        {
            "stt": 3,
            "ten_dich_vu": "Atovze* 20mg+10mg",
            "so_luong": "56",
            "don_vi_tinh": "Viên",
            "don_gia": "6.409",
            "thanh_tien": "358.921"
        },
        {
            "stt": 4,
            "ten_dich_vu": "SADAPRON 300mg",
            "so_luong": "28",
            "don_vi_tinh": "Viên",
            "don_gia": "2.750",
            "thanh_tien": "77.000"
        }
    ]"
    }
  ]
}

📌 Example 2 – DIAGNOSIS_EXTRACT

{
  "id": "diag_db2888ccc0de4c4b8e9d39c3e08c104a",
  "image": "diag_001.jpg",
  "conversations": [
    {
      "from": "human",
      "value": "<image>\n[DIAGNOSIS_EXTRACT] Hãy trích xuất và cho biết chẩn đoán trong hình này."
    },
    {
      "from": "gpt",
      "value": "Răng 28: [K07 4] Răng mọc lệch"
    }
  ]
}

🎯 Use Cases

Trích xuất thông tin hóa đơn tự động
Structured OCR
Document AI tiếng Việt
Medical / pharmacy bill parsing
Fintech document processing

📌 Gợi ý phần cứng

Quantization	VRAM đề xuất
4-bit bnb	~6–8GB
GGUF Q4	Chạy được trên CPU (khuyến nghị ≥16GB RAM)

License

Apache-2.0

Downloads last month: 3

GGUF

Model size

2B params

Architecture

qwen3vl

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis

Base model

Qwen/Qwen3-VL-2B-Instruct

Quantized

unsloth/Qwen3-VL-2B-Instruct-bnb-4bit

Quantized

(3)

this model