Instructions to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis",
	filename="gguf/qwen3-vl-2b-instruct.BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
# Run inference directly in the terminal:
llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
# Run inference directly in the terminal:
llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
# Run inference directly in the terminal:
./llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Use Docker

docker model run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

LM Studio
Jan
Ollama
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Ollama:
```
ollama run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
```

Unsloth Studio

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Docker Model Runner:
```
docker model run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
```

Lemonade

How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16

Run and chat with the model

lemonade run user.Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis-BF16

List all available models

lemonade list

Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis

File size: 4,573 Bytes

---
language:
- vi
tags:
- vision-language
- qwen
- vlm
- gguf
- pytorch
license: apache-2.0
datasets:
- minhduc168/dataset-qwen-vlm-extract-bill-diagnosis
base_model:
- unsloth/Qwen3-VL-2B-Instruct-bnb-4bit
---

# Qwen3-VL-2B-Instruct Vietnamese Bill Diagnosis (4-bit)

Mô hình **Qwen3-VL-2B-Instruct Vietnamese Bill Diagnosis** được fine-tune cho tác vụ **trích xuất thông tin hóa đơn, phiếu thu, đơn thuốc và chuẩn đoán bệnh tiếng Việt**.  
Model hỗ trợ hiểu hình ảnh và văn bản, phù hợp cho các bài toán OCR nâng cao, document understanding và information extraction.

---

## 🔥 Điểm nổi bật
- ✅ Tối ưu cho **tiếng Việt**
- ✅ Fine-tune cho **bill / invoice / prescription / diagnosis extraction**
- ✅ Phiên bản **4-bit (bnb)** giúp giảm VRAM khi inference
- ✅ Có thể chuyển sang **GGUF** để chạy local CPU
- ✅ Tương thích với `transformers`

---

## 📂 Cấu trúc Repository

- **`/merged_16bit`**  
  Chứa trọng số **bnb 4-bit** để chạy với thư viện `transformers` + `bitsandbytes`.

- **`/gguf`**  
  Phiên bản GGUF dành cho `llama.cpp` hoặc các engine suy luận local.

  **Bao gồm:**
  - `Qwen3-VL-2B-Instruct-Vietnamese.Q4_K_M.gguf` — bản nén 4-bit chất lượng cao  
  - `Qwen3-VL-2B-Instruct-Vietnamese.mmproj.gguf` — file projector xử lý hình ảnh

---

## 🚀 Hướng dẫn sử dụng

### ✅ Với Transformers

```python
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "minhduc168/Qwen3-VL-2B-Instruct-Vietnamese",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "minhduc168/Qwen3-VL-2B-Instruct-Vietnamese"
)
```
## ⚠️ Lưu ý quan trọng khi dùng GGUF (Vision Model)

Đối với các model Vision-Language như **Qwen3-VL**, khi chuyển sang GGUF:

### Bắt buộc cần 2 file:
1️⃣ **Model chính** (`.gguf`)  
2️⃣ **Projector** (`mmproj.gguf`)  

👉 Thiếu file projector → model **không thể xử lý hình ảnh**.

---

## 📊 Dataset

**Bao gồm:**
- Hóa đơn bán lẻ  
- Phiếu thu  
- Đơn thuốc  
- Chứng từ tiếng Việt
- Chuẩn đoán bệnh 

Định dạng **instruction-following** giúp model trích xuất dữ liệu có cấu trúc chính xác hơn.

### Format mẫu

📌 Example 1 – BILL_OCR

```json
{
  "id": "bill_f18b567055ed4c5d85143770499b2f65",
  "image": "bill_001.jpg",
  "conversations": [
    {
      "from": "human",
      "value": "<image>\n[BILL_OCR] Trích xuất danh sách dịch vụ và thông tin liên quan từ bảng trong ảnh.\nKhông suy đoán hoặc bổ sung thông tin không có trong ảnh."
    },
    {
      "from": "gpt",
      "value": "[
        {
            "stt": 1,
            "ten_dich_vu": "STTOMET 1000mg + 50mg",
            "so_luong": "56",
            "don_vi_tinh": "Viên",
            "don_gia": "5.390",
            "thanh_tien": "301.840"
        },
        {
            "stt": 2,
            "ten_dich_vu": "JARDIANCE DUO 12.5mg + 1000mg",
            "so_luong": "56",
            "don_vi_tinh": "Viên",
            "don_gia": "17.976",
            "thanh_tien": "1.006.656"
        },
        {
            "stt": 3,
            "ten_dich_vu": "Atovze* 20mg+10mg",
            "so_luong": "56",
            "don_vi_tinh": "Viên",
            "don_gia": "6.409",
            "thanh_tien": "358.921"
        },
        {
            "stt": 4,
            "ten_dich_vu": "SADAPRON 300mg",
            "so_luong": "28",
            "don_vi_tinh": "Viên",
            "don_gia": "2.750",
            "thanh_tien": "77.000"
        }
    ]"
    }
  ]
}
```
📌 Example 2 – DIAGNOSIS_EXTRACT

```json
{
  "id": "diag_db2888ccc0de4c4b8e9d39c3e08c104a",
  "image": "diag_001.jpg",
  "conversations": [
    {
      "from": "human",
      "value": "<image>\n[DIAGNOSIS_EXTRACT] Hãy trích xuất và cho biết chẩn đoán trong hình này."
    },
    {
      "from": "gpt",
      "value": "Răng 28: [K07 4] Răng mọc lệch"
    }
  ]
}
```
---

## 🎯 Use Cases

- Trích xuất thông tin hóa đơn tự động  
- Structured OCR  
- Document AI tiếng Việt  
- Medical / pharmacy bill parsing  
- Fintech document processing  

---

## 📌 Gợi ý phần cứng

| Quantization | VRAM đề xuất |
|------------|------------|
| 4-bit bnb  | ~6–8GB |
| GGUF Q4    | Chạy được trên CPU (khuyến nghị ≥16GB RAM) |

---

## License

Apache-2.0