Instructions to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis", filename="gguf/qwen3-vl-2b-instruct.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16 # Run inference directly in the terminal: llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16 # Run inference directly in the terminal: llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16 # Run inference directly in the terminal: ./llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Use Docker
docker model run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
- LM Studio
- Jan
- Ollama
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Ollama:
ollama run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
- Unsloth Studio
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting
- Pi
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Docker Model Runner:
docker model run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
- Lemonade
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Run and chat with the model
lemonade run user.Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis-BF16
List all available models
lemonade list
708e7a7 aa149c5 708e7a7 aa149c5 708e7a7 aa149c5 708e7a7 02dadc4 708e7a7 02dadc4 708e7a7 7c36767 708e7a7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | ---
language:
- vi
tags:
- vision-language
- qwen
- vlm
- gguf
- pytorch
license: apache-2.0
datasets:
- minhduc168/dataset-qwen-vlm-extract-bill-diagnosis
base_model:
- unsloth/Qwen3-VL-2B-Instruct-bnb-4bit
---
# Qwen3-VL-2B-Instruct Vietnamese Bill Diagnosis (4-bit)
Mô hình **Qwen3-VL-2B-Instruct Vietnamese Bill Diagnosis** được fine-tune cho tác vụ **trích xuất thông tin hóa đơn, phiếu thu, đơn thuốc và chuẩn đoán bệnh tiếng Việt**.
Model hỗ trợ hiểu hình ảnh và văn bản, phù hợp cho các bài toán OCR nâng cao, document understanding và information extraction.
---
## 🔥 Điểm nổi bật
- ✅ Tối ưu cho **tiếng Việt**
- ✅ Fine-tune cho **bill / invoice / prescription / diagnosis extraction**
- ✅ Phiên bản **4-bit (bnb)** giúp giảm VRAM khi inference
- ✅ Có thể chuyển sang **GGUF** để chạy local CPU
- ✅ Tương thích với `transformers`
---
## 📂 Cấu trúc Repository
- **`/merged_16bit`**
Chứa trọng số **bnb 4-bit** để chạy với thư viện `transformers` + `bitsandbytes`.
- **`/gguf`**
Phiên bản GGUF dành cho `llama.cpp` hoặc các engine suy luận local.
**Bao gồm:**
- `Qwen3-VL-2B-Instruct-Vietnamese.Q4_K_M.gguf` — bản nén 4-bit chất lượng cao
- `Qwen3-VL-2B-Instruct-Vietnamese.mmproj.gguf` — file projector xử lý hình ảnh
---
## 🚀 Hướng dẫn sử dụng
### ✅ Với Transformers
```python
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
model = Qwen2VLForConditionalGeneration.from_pretrained(
"minhduc168/Qwen3-VL-2B-Instruct-Vietnamese",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"minhduc168/Qwen3-VL-2B-Instruct-Vietnamese"
)
```
## ⚠️ Lưu ý quan trọng khi dùng GGUF (Vision Model)
Đối với các model Vision-Language như **Qwen3-VL**, khi chuyển sang GGUF:
### Bắt buộc cần 2 file:
1️⃣ **Model chính** (`.gguf`)
2️⃣ **Projector** (`mmproj.gguf`)
👉 Thiếu file projector → model **không thể xử lý hình ảnh**.
---
## 📊 Dataset
**Bao gồm:**
- Hóa đơn bán lẻ
- Phiếu thu
- Đơn thuốc
- Chứng từ tiếng Việt
- Chuẩn đoán bệnh
Định dạng **instruction-following** giúp model trích xuất dữ liệu có cấu trúc chính xác hơn.
### Format mẫu
📌 Example 1 – BILL_OCR
```json
{
"id": "bill_f18b567055ed4c5d85143770499b2f65",
"image": "bill_001.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\n[BILL_OCR] Trích xuất danh sách dịch vụ và thông tin liên quan từ bảng trong ảnh.\nKhông suy đoán hoặc bổ sung thông tin không có trong ảnh."
},
{
"from": "gpt",
"value": "[
{
"stt": 1,
"ten_dich_vu": "STTOMET 1000mg + 50mg",
"so_luong": "56",
"don_vi_tinh": "Viên",
"don_gia": "5.390",
"thanh_tien": "301.840"
},
{
"stt": 2,
"ten_dich_vu": "JARDIANCE DUO 12.5mg + 1000mg",
"so_luong": "56",
"don_vi_tinh": "Viên",
"don_gia": "17.976",
"thanh_tien": "1.006.656"
},
{
"stt": 3,
"ten_dich_vu": "Atovze* 20mg+10mg",
"so_luong": "56",
"don_vi_tinh": "Viên",
"don_gia": "6.409",
"thanh_tien": "358.921"
},
{
"stt": 4,
"ten_dich_vu": "SADAPRON 300mg",
"so_luong": "28",
"don_vi_tinh": "Viên",
"don_gia": "2.750",
"thanh_tien": "77.000"
}
]"
}
]
}
```
📌 Example 2 – DIAGNOSIS_EXTRACT
```json
{
"id": "diag_db2888ccc0de4c4b8e9d39c3e08c104a",
"image": "diag_001.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\n[DIAGNOSIS_EXTRACT] Hãy trích xuất và cho biết chẩn đoán trong hình này."
},
{
"from": "gpt",
"value": "Răng 28: [K07 4] Răng mọc lệch"
}
]
}
```
---
## 🎯 Use Cases
- Trích xuất thông tin hóa đơn tự động
- Structured OCR
- Document AI tiếng Việt
- Medical / pharmacy bill parsing
- Fintech document processing
---
## 📌 Gợi ý phần cứng
| Quantization | VRAM đề xuất |
|------------|------------|
| 4-bit bnb | ~6–8GB |
| GGUF Q4 | Chạy được trên CPU (khuyến nghị ≥16GB RAM) |
---
## License
Apache-2.0
|