Instructions to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis", filename="gguf/qwen3-vl-2b-instruct.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16 # Run inference directly in the terminal: llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16 # Run inference directly in the terminal: llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16 # Run inference directly in the terminal: ./llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Use Docker
docker model run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
- LM Studio
- Jan
- Ollama
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Ollama:
ollama run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
- Unsloth Studio
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis to start chatting
- Pi
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Docker Model Runner:
docker model run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
- Lemonade
How to use minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16
Run and chat with the model
lemonade run user.Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis-BF16
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16# Run inference directly in the terminal:
llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16# Run inference directly in the terminal:
./llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16# Run inference directly in the terminal:
./build/bin/llama-cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16Use Docker
docker model run hf.co/minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16Qwen3-VL-2B-Instruct Vietnamese Bill Diagnosis (4-bit)
Mô hình Qwen3-VL-2B-Instruct Vietnamese Bill Diagnosis được fine-tune cho tác vụ trích xuất thông tin hóa đơn, phiếu thu, đơn thuốc và chuẩn đoán bệnh tiếng Việt.
Model hỗ trợ hiểu hình ảnh và văn bản, phù hợp cho các bài toán OCR nâng cao, document understanding và information extraction.
🔥 Điểm nổi bật
- ✅ Tối ưu cho tiếng Việt
- ✅ Fine-tune cho bill / invoice / prescription / diagnosis extraction
- ✅ Phiên bản 4-bit (bnb) giúp giảm VRAM khi inference
- ✅ Có thể chuyển sang GGUF để chạy local CPU
- ✅ Tương thích với
transformers
📂 Cấu trúc Repository
/merged_16bit
Chứa trọng số bnb 4-bit để chạy với thư việntransformers+bitsandbytes./gguf
Phiên bản GGUF dành chollama.cpphoặc các engine suy luận local.Bao gồm:
Qwen3-VL-2B-Instruct-Vietnamese.Q4_K_M.gguf— bản nén 4-bit chất lượng caoQwen3-VL-2B-Instruct-Vietnamese.mmproj.gguf— file projector xử lý hình ảnh
🚀 Hướng dẫn sử dụng
✅ Với Transformers
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
model = Qwen2VLForConditionalGeneration.from_pretrained(
"minhduc168/Qwen3-VL-2B-Instruct-Vietnamese",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"minhduc168/Qwen3-VL-2B-Instruct-Vietnamese"
)
⚠️ Lưu ý quan trọng khi dùng GGUF (Vision Model)
Đối với các model Vision-Language như Qwen3-VL, khi chuyển sang GGUF:
Bắt buộc cần 2 file:
1️⃣ Model chính (.gguf)
2️⃣ Projector (mmproj.gguf)
👉 Thiếu file projector → model không thể xử lý hình ảnh.
📊 Dataset
Bao gồm:
- Hóa đơn bán lẻ
- Phiếu thu
- Đơn thuốc
- Chứng từ tiếng Việt
- Chuẩn đoán bệnh
Định dạng instruction-following giúp model trích xuất dữ liệu có cấu trúc chính xác hơn.
Format mẫu
📌 Example 1 – BILL_OCR
{
"id": "bill_f18b567055ed4c5d85143770499b2f65",
"image": "bill_001.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\n[BILL_OCR] Trích xuất danh sách dịch vụ và thông tin liên quan từ bảng trong ảnh.\nKhông suy đoán hoặc bổ sung thông tin không có trong ảnh."
},
{
"from": "gpt",
"value": "[
{
"stt": 1,
"ten_dich_vu": "STTOMET 1000mg + 50mg",
"so_luong": "56",
"don_vi_tinh": "Viên",
"don_gia": "5.390",
"thanh_tien": "301.840"
},
{
"stt": 2,
"ten_dich_vu": "JARDIANCE DUO 12.5mg + 1000mg",
"so_luong": "56",
"don_vi_tinh": "Viên",
"don_gia": "17.976",
"thanh_tien": "1.006.656"
},
{
"stt": 3,
"ten_dich_vu": "Atovze* 20mg+10mg",
"so_luong": "56",
"don_vi_tinh": "Viên",
"don_gia": "6.409",
"thanh_tien": "358.921"
},
{
"stt": 4,
"ten_dich_vu": "SADAPRON 300mg",
"so_luong": "28",
"don_vi_tinh": "Viên",
"don_gia": "2.750",
"thanh_tien": "77.000"
}
]"
}
]
}
📌 Example 2 – DIAGNOSIS_EXTRACT
{
"id": "diag_db2888ccc0de4c4b8e9d39c3e08c104a",
"image": "diag_001.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\n[DIAGNOSIS_EXTRACT] Hãy trích xuất và cho biết chẩn đoán trong hình này."
},
{
"from": "gpt",
"value": "Răng 28: [K07 4] Răng mọc lệch"
}
]
}
🎯 Use Cases
- Trích xuất thông tin hóa đơn tự động
- Structured OCR
- Document AI tiếng Việt
- Medical / pharmacy bill parsing
- Fintech document processing
📌 Gợi ý phần cứng
| Quantization | VRAM đề xuất |
|---|---|
| 4-bit bnb | ~6–8GB |
| GGUF Q4 | Chạy được trên CPU (khuyến nghị ≥16GB RAM) |
License
Apache-2.0
- Downloads last month
- 3
4-bit
Model tree for minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis
Base model
Qwen/Qwen3-VL-2B-Instruct
Install (macOS, Linux)
# Start a local OpenAI-compatible server with a web UI: llama serve -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16# Run inference directly in the terminal: llama cli -hf minhduc168/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit-Vietnamese-bill-diagnosis:BF16