Instructions to use Wigtn/Qwen3-VL-2B-WigtnOCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Wigtn/Qwen3-VL-2B-WigtnOCR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Wigtn/Qwen3-VL-2B-WigtnOCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Wigtn/Qwen3-VL-2B-WigtnOCR") model = AutoModelForMultimodalLM.from_pretrained("Wigtn/Qwen3-VL-2B-WigtnOCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Wigtn/Qwen3-VL-2B-WigtnOCR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Wigtn/Qwen3-VL-2B-WigtnOCR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Wigtn/Qwen3-VL-2B-WigtnOCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Wigtn/Qwen3-VL-2B-WigtnOCR
- SGLang
How to use Wigtn/Qwen3-VL-2B-WigtnOCR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Wigtn/Qwen3-VL-2B-WigtnOCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Wigtn/Qwen3-VL-2B-WigtnOCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Wigtn/Qwen3-VL-2B-WigtnOCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Wigtn/Qwen3-VL-2B-WigtnOCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Wigtn/Qwen3-VL-2B-WigtnOCR with Docker Model Runner:
docker model run hf.co/Wigtn/Qwen3-VL-2B-WigtnOCR
license: apache-2.0
language:
- ko
- en
- zh
library_name: transformers
base_model: Qwen/Qwen3-VL-2B-Instruct
tags:
- document-parsing
- document-intelligence
- ocr
- vlm
- vision-language-model
- lora
- distillation
- korean
- qwen3-vl
- structure-preserving
- rag
- government-document
- table-extraction
pipeline_tag: image-text-to-text
datasets:
- Wigtn/KoGovDoc-Bench
metrics:
- teds
- hit@1
model-index:
- name: WigtnOCR-2B
results:
- task:
type: document-parsing
dataset:
name: OmniDocBench
type: opendatalab/OmniDocBench
metrics:
- name: Text NED
type: ned
value: 0.288
- name: Table TEDS
type: teds
value: 0.649
- name: Table TEDS-S
type: teds
value: 0.732
- name: Formula CDM F1
type: f1
value: 0.884
- name: Reading Order NED
type: ned
value: 0.211
- task:
type: document-parsing
dataset:
name: KoGovDoc-Bench
type: Wigtn/KoGovDoc-Bench
metrics:
- name: NED
type: ned
value: 0.285
- name: Hit@1
type: accuracy
value: 0.739
- name: MRR@10
type: mrr
value: 0.788
WigtnOCR-2B: Pseudo-Label Distillation for Structure-Preserving Document Parsing
Built by WIGTN Crew
A 2B VLM distilled from 30B teacher that matches its document parsing quality โ and achieves #1 retrieval among 6 parsers on Korean government documents.
โญ๏ธ Base Model: Qwen3-VL-2B-Instruct
โญ๏ธ Dataset: huggingface.co/datasets/Wigtn/KoGovDoc-Bench
โญ๏ธ GitHub: github.com/Hyeongseob91/research-vlm-based-document-parsing
Key Features
- 30B โ 2B Distillation: Matches or exceeds 30B teacher in 4/5 OmniDocBench categories via quality-filtered pseudo-labeling
- Table TEDS +12.6pp: Surpasses teacher on table structure recognition through selective training on high-quality GT
- #1 Retrieval: Best Hit@1 (0.739) and MRR@10 (0.788) among 6 parsers โ proving structured parsing improves RAG
- Korean Government Documents: Optimized for complex Korean government layouts (tables, forms, multi-column)
- Production-Ready: Single GPU serving via vLLM, 2B params, fast inference
Highlights
| Category | Metric | WigtnOCR-2B | vs 30B Teacher | vs PaddleOCR |
|---|---|---|---|---|
| Parsing | Text NED โ | 0.288 | -0.001 (matches) | โ |
| Tables | Table TEDS โ | 0.649 | +12.6pp | โ |
| Retrieval | Hit@1 โ | 0.739 | +2.3pp | +22.7pp |
| Retrieval | MRR@10 โ | 0.788 | +1.7pp | +19.6pp |
| Reliability | Skip Rate โ | 5.8% | -13.0pp from base | โ |
Quick Start
Transformers (Direct Inference)
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Wigtn/Qwen3-VL-2B-WigtnOCR",
torch_dtype="auto",
device_map="auto",
)
processor = AutoProcessor.from_pretrained("Wigtn/Qwen3-VL-2B-WigtnOCR")
image = Image.open("document_page.png")
messages = [
{"role": "system", "content": "You are WigtnOCR, a document parser. Convert the document image to well-structured Markdown."},
{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": "Convert this document page to Markdown. Preserve all headings, tables, formulas, and reading order."},
]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=4096)
output = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output)
vLLM (Production Serving)
vllm serve Wigtn/Qwen3-VL-2B-WigtnOCR \
--max-model-len 16384 \
--trust-remote-code
from openai import OpenAI
import base64
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
with open("document_page.png", "rb") as f:
img_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="Wigtn/Qwen3-VL-2B-WigtnOCR",
messages=[
{"role": "system", "content": "You are WigtnOCR, a document parser. Convert the document image to well-structured Markdown."},
{"role": "user", "content": [
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_b64}"}},
{"type": "text", "text": "Convert this document page to Markdown."},
]},
],
max_tokens=4096,
)
print(response.choices[0].message.content)
OmniDocBench Results
Evaluated on OmniDocBench (CVPR 2025) โ 1,355 pages across 9 document types.
| Metric | Qwen3-VL-2B | WigtnOCR-2B | Qwen3-VL-30B | Marker | Direction |
|---|---|---|---|---|---|
| Text NED | 0.364 | 0.288 | 0.289 | 0.218 | lower=better |
| Table TEDS | 0.561 | 0.649 | 0.523 | 0.586 | higher=better |
| Table TEDS-S | 0.667 | 0.732 | 0.657 | 0.658 | higher=better |
| Formula CDM F1 | 0.865 | 0.884 | 0.939 | 0.863 | higher=better |
| Formula ExpRate | 0.504 | 0.600 | 0.692 | 0.582 | higher=better |
| Reading Order NED | 0.300 | 0.211 | 0.227 | 0.165 | lower=better |
| Skip Rate | 18.8% | 5.8% | 5.5% | 0.4% | lower=better |
Student matches or exceeds 30B teacher in 4/5 metric categories. Table TEDS surpasses teacher by +12.6pp, suggesting quality-filtered distillation produces a stronger training signal than the teacher's average output.
KoGovDoc Retrieval Results
Semantic chunking (BGE-M3) โ FAISS retrieval on KoGovDoc-Bench โ 294 val pages, 564 queries, 6 parsers compared.
| Model | Type | Hit@1 โ | Hit@5 โ | MRR@10 โ | nDCG@10 โ |
|---|---|---|---|---|---|
| WigtnOCR-2B | VLM (ours) | 0.739 | 0.855 | 0.788 | 0.437 |
| Qwen3-VL-30B | VLM (teacher) | 0.716 | 0.839 | 0.771 | 0.411 |
| Marker | PDF parser | 0.711 | 0.853 | 0.771 | 0.412 |
| Qwen3-VL-2B | VLM (base) | 0.709 | 0.814 | 0.756 | 0.444 |
| MinerU | PDF parser | 0.608 | 0.789 | 0.682 | 0.384 |
| PaddleOCR | Pure OCR | 0.512 | 0.693 | 0.592 | 0.293 |
WigtnOCR-2B ranks #1 in Hit@1, Hit@5, and MRR@10 โ proving structured VLM parsing directly improves RAG retrieval over traditional OCR pipelines.
BC vs. Retrieval: An Interesting Finding
Chunk quality (BC/CS, MoC framework) does not predict retrieval performance.
| Model | BC โ | CS โ | Hit@1 โ |
|---|---|---|---|
| MinerU | 0.735 | 2.711 | 0.608 (5th) |
| WigtnOCR-2B | 0.706 | 2.859 | 0.739 (1st) |
| PaddleOCR | 0.654 | 3.420 | 0.512 (6th) |
MinerU produces the cleanest chunk boundaries but ranks 5th in retrieval. Text richness and structural fidelity matter more than boundary quality for end-to-end RAG.
KoGovDoc Parsing Quality
| Model | NED โ | Evaluated |
|---|---|---|
| WigtnOCR-2B | 0.285 | 289/294 |
| Qwen3-VL-30B (Teacher) | 0.334 | 294/294 |
| Qwen3-VL-2B (Base) | 0.390 | 294/294 |
WigtnOCR-2B surpasses its 30B teacher on Korean government documents.
Ablation Study
| Config | LoRA r | Epochs | Text NED โ | Table TEDS โ | TEDS-S โ | CDM F1 โ | RO NED โ | Skip % โ | Verdict |
|---|---|---|---|---|---|---|---|---|---|
| v1 (final) | 8 | 3 | 0.288 | 0.649 | 0.732 | 0.884 | 0.211 | 5.8% | Best overall |
| v2-best | 32 | 3 | 0.309 | 0.600 | 0.697 | โ | 0.215 | 0.7% | Table regression |
| v2-last | 32 | 5 | 0.306 | 0.610 | 0.695 | 0.892 | 0.214 | 0.0% | Overfitting on text |
Key findings:
- LoRA rank 8 outperforms rank 32 โ larger capacity leads to table structure regression (-4.9pp TEDS) despite marginally better formula recognition
- 3 epochs optimal โ 5 epochs causes overfitting (eval loss rises after epoch 3)
- v2 improves skip rate to 0% but at the cost of core parsing quality
- v1 selected as final model due to superior table/text quality which matters most for downstream RAG
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen3-VL-2B-Instruct |
| Teacher | Qwen3-VL-30B-A3B-Instruct (FP8) |
| Judge | Qwen3.5-122B-A10B-NVFP4 (text-only, 5-dim scoring) |
| Method | LoRA (rank=8, alpha=32, target=all linear layers) |
| Training samples | 2,667 (filtered from 4,501 pages, score โฅ 3/5) |
| Validation samples | 294 (held out) |
| Training time | 31 minutes |
| Framework | ms-swift + DeepSpeed ZeRO-2 |
| Epochs | 3 |
| Learning rate | 1e-4 |
| Batch size | 1 (gradient accumulation 8) |
| Hardware | 2 ร NVIDIA RTX PRO 6000 (98GB each) |
| Trainable params | 8.7M (0.4% of total) |
Training Data
| Dataset | Documents | Pages | Language | Source |
|---|---|---|---|---|
| KoGovDoc | 10 | 3,637 | Korean | Government publications |
| ArXivPapers | 39 | 864 | English | arXiv (cs.CL, cs.CV, cs.LG) |
| Total | 49 | 4,501 | Bilingual | โ |
GT generated by Qwen3-VL-30B, validated by Qwen3.5-122B with 74โ75% pass rate. Quality filtering removes hallucinations, repetitions, and chain-of-thought contamination.
Evaluation Stack
| Component | Tool | Purpose |
|---|---|---|
| Preprocessing | PyMuPDF | PDF โ page images (200 DPI) |
| Chunking | BGE-M3 (semantic) | Embedding-based boundary detection |
| BC/CS Metrics | Qwen2.5-1.5B | Perplexity computation (MoC, ACL 2025) |
| Embedding | BAAI/bge-m3 | Chunk โ vector |
| Retrieval | FAISS | Cosine similarity search |
Intended Use
- Korean government document digitization and parsing
- RAG pipeline preprocessing (PDF โ structured Markdown โ chunks โ retrieval)
- Academic paper parsing (tables, formulas, reading order)
- Bilingual (Korean + English) document processing
Limitations
- Optimized for Korean and English; other languages may have reduced quality
- Formula recognition still trails 30B teacher (CDM F1: 0.884 vs 0.939)
- Best results at 200 DPI; lower resolution degrades quality
- Skip rate 5.8% โ some complex pages may fail (v2 achieves 0% but with quality trade-offs)
Example Output
Comparison on a complex Korean government document page (kogov_001 p.9 โ survey tables + statistical charts + mixed layout).
| 30B Teacher | WigtnOCR-2B (Ours) | |
|---|---|---|
| Charts | [Figure: ...] placeholder |
Extracts data into tables |
| Content | 1,582 chars | 1,912 chars (+21%) |
| Tables | 3 tables | 4 tables (chart โ table) |
PDF Original
30B Teacher Output (Qwen3-VL-30B) โ 1,582 chars
- ์ง์ญ ์ฃผ๋ฏผ ์๊ฒฌ ๋ฐ ์์
## [๊ตฐ๋ฏผ ์ค๋ฌธ์กฐ์ฌ] ๊ตฐ๋ฏผ 478๋ช
๋์ ์ค๋ฌธ์กฐ์ฌ๋ก ๋์๋ฌธ์ ๋์ถ
- ๊ตฐ๋ฏผ ๋์ ์ค๋ฌธ์กฐ์ฌ ์ฌํญ
| No. | ์ค๋ฌธ ํญ๋ชฉ |
|-----|-----------|
| Q1 | ์ฑ๋ณ / ์ฐ๋ น / ์ง์ญ / ๋ถํธ์ฌํญ |
| Q2 | ์์ / ํ๊ฒฝ / ์๋์ง / ๊ตํต / ์ฐ์
/ ํ์ / ๋ณต์ง / ๋ฌธํ / ๊ด๊ด / ๋์
/ ๊ต์ก |
| Q3 | ์ค๋งํธ๋์ ์์ / ์ง์ญ / ์๋น์ค / ๋ฆฌ๋น๋ฉ |
### - ๊ตฐ๋ฏผ ์ค๋ฌธ๊ฒฐ๊ณผ
[Figure: ๋ณด๋ค ์์ ํ ๋ถ์ฌ๋ฅผ ์ํด ๊ฐ์ ํด์ผ ํ ๋ฌธ์ ]
[Figure: ์ค๋งํธ๋์ ์ฐ์ ๋์
์๋น์ค]
์๋ฃ : ๋ถ์ฌ๊ตฐ ์ค๋งํธ๋์๊ณํ(2023)
## [๋์ด์
์ธ ๋ณต์ง์คํ์กฐ์ฌ] ์ํ์์ ๊ฐ์ ์ ์ํด ํ์ํ ์ฌํญ ์ค๋ฌธ๊ฒฐ๊ณผ
| ํน์ฑ | ๋๋ก์์ ์์ค | ๋ณดํ์๊ธธ ์ ๋น | ๊ฐ๋ก๋ฑ ํ์ถฉ | CCTV ์ค์น | ์ฃผ๋ฏผ ๋ฐฉ๋ฒ ์์ฐฐ | ๋
ธํ์์ค | ์์ฌ ๊ท๊ฐ ์๋น์ค | ๊ธฐํ |
|------|-------------|-------------|------------|----------|--------------|---------|----------------|------|
| ๋์ด์ด | 10.1 | 21.0 | 23.1 | 25.7 | 8.1 | 8.2 | 3.4 | 0.3 |
| ์ | 10.7 | 20.8 | 20.5 | 28.1 | 8.4 | 7.2 | 4.2 | 0.1 |
| ๋ฉด | 9.5 | 21.2 | 25.8 | 23.3 | 7.8 | 9.3 | 2.7 | 0.4 |
| ๋์ด๊ฐ | 8.7 | 22.3 | 23.2 | 23.1 | 7.9 | 12.1 | 2.5 | 0.2 |
| ๋น๋์ด๊ฐ | 10.6 | 20.5 | 23.1 | 26.6 | 8.2 | 6.9 | 3.7 | 0.3 |
| 30๋ ์ดํ | 14.6 | 16.5 | 27.6 | 25.2 | 6.4 | 5.8 | 3.6 | 0.2 |
| 40๋ | 6.3 | 20.1 | 19.6 | 33.1 | 10.9 | 4.6 | 5.1 | 0.2 |
| 50๋ | 10.8 | 19.4 | 23.0 | 27.2 | 6.8 | 8.4 | 4.1 | 0.3 |
| 60๋ | 10.5 | 22.9 | 22.8 | 23.4 | 7.2 | 10.2 | 2.6 | 0.4 |
| 70๋ ์ด์ | 9.9 | 23.5 | 24.0 | 21.1 | 8.7 | 10.4 | 2.2 | 0.2 |
์๋ฃ : ๋์ด์งํฅ์ฒญ 2023 ๋์ด์
์ธ๋ฑ์ ๋ํ ๋ณต์ง์คํ์กฐ์ฌ
| ๊ตฌ๋ถ | ๋์๋ฌธ์ | ์ฃผ๋ฏผ ์์ | ์์ ์ฃผ๋ฏผ |
|------|----------|-----------|-----------|
| ๋ณต์ง | ๋
๊ฑฐ๋
ธ์ธ ๋๋ด | - ๋ถ์ฌ๊ตฐ ๋ณด๊ฑด๋ณต์ง ๋ถ์ผ ๊ฐ์ ์ฌํญ์ผ๋ก ์ง์ | 70๋ ๋จ์ฑ |
| ๋ณต์ง | ๋
๊ฑฐ๋
ธ์ธ ๋๋ด | - ์ธ๊ณต์ง๋ฅ ๋๋ด์๋น์ค ์๋ฒ ์ฌ์ฉ ํฌ๋ง | 60๋ ์ฌ์ฑ |
| ๋ณต์ง | ์์ค๋
ธํํ | - ๋ถ์ฌ๊ตฐ ์๋ฉด ๊ฒฝ๋ก๋น ๋ด ์์ค ๋
ธํํ | 80๋ ๋จ์ฑ |
| ๋ณต์ง | ์ฌ๊ฐ ์ฝํ
์ธ ๋ถ์กฑ | - ๊ฒฝ๋ก๋น ๋ด ์ฌ๊ฐ ์ฝํ
์ธ ๋ถ์กฑ | 60๋ ์ฌ์ฑ |
| ์์ | ์์ ์ธํ๋ผ ๋ถ์กฑ | - ๋ถ์ฌ์์ฅ ๊ทผ๊ต ๋
ธ์ธ ๋ณดํ์ ๊ตํต์ฌ๊ณ ์ํ ๋์ | 60๋ ๋จ์ฑ |
| ๊ด๊ด | ๊ด๊ด ์ฝํ
์ธ ๋ถ์กฑ | - ์๋
๋์ ์
๋ฐ์ดํธ๋์ง ์์ ๋ฐ๋ฌผ๊ด ๋ด ์ฝํ
์ธ | 50๋ ๋จ์ฑ |
WigtnOCR-2B Output (Ours) โ 1,912 chars
- ์ง์ญ ์ฃผ๋ฏผ ์๊ฒฌ ๋ฐ ์์
[๊ท ๋ฏผ ์ค๋ฌธ์กฐ์ฌ] ๊ท ๋ฏผ 478๋ช
๋์ ์ค๋ฌธ์กฐ์ฌ๋ก ๋์๋ฌธ์ ๋์ถ
- ๊ท ๋ฏผ ๋์ ์ค๋ฌธ์กฐ์ฌ ์ฌํญ
| No. | ์ค๋ฌธ ํญ๋ชฉ |
| --- | --- |
| Q1 | ์ฑ๋ณ / ์ฐ๋ น / ์ง์ญ / ๋ถํธ์ฌํญ |
| Q2 | ์์ / ํ๊ฒฝ / ์๋์ง / ๊ตํต / ์ฐ์
/ ํ์ / ๋ณด๊ฑด / ๋ณต์ง / ๋ฌธํ / ๊ด๊ด / ๋์
/ ๊ต์ก |
| Q3 | ์ค๋งํธ๋์ ์์ / ์ง์ญ / ์๋น์ค / ๋ฆฌ๋น๋ฉ |
- ๊ท ๋ฏผ ์ค๋ฌธ๊ฒฐ๊ณผ
| ๋ณด๋ค ์์ ํ ๋ถ์ฌ๋ฅผ ์ํด ๊ฐ์ ํด์ผ ํ ๋ฌธ์ | ์ค๋งํธ๋์ ์ฐ์ ๋์
์๋น์ค |
| --- | --- |
| ์์ค ๋
ธํํ | 34.1% |
| ๊ตํต์ฌ๊ณ ๋ค๋ฐ๊ตฌ๊ฐ | 13.7% |
| ์์ฐ์ฌํด๊ฐ์ | 12.8% |
| ์ฌ์ผ์๊ฐ ๋ฒ์ฃ | 10.0% |
| ํตํ ์์ | 9.3% |
| ์ธ์ฌ | 8.2% |
| ์ฌ๋ ์๊ฒฝ๋ณด | 8.7% |
| ๊ธฐํ | 3.4% |
| ์ค๋งํธ ๋ณด๊ฑด/์๋ฃ/๋ณต์ง | 17.4% |
| ์ค๋งํธ ๊ตํต | 15.7% |
| ์ค๋งํธ ํ๊ฒฝ/์๋์ง/์์์ | 10.5% |
| ์ค๋งํธ ๋ฌธํ/๊ด๊ด/์คํฌ์ธ | 10.1% |
| ์ค๋งํธ ๊ทผ๋ก/๊ณ ์ฉ | 9.9% |
| ์ค๋งํธ ํ์ | 8.9% |
| ์ค๋งํธ ๊ต์ก | 7.6% |
| ์ค๋งํธ ๋ฐฉ๋ฒ/๋ฐฉ์ฌ | 6.4% |
| ์ค๋งํธ ์์ค๋ฌผ๊ด๋ฆฌ | 4.5% |
| ์ค๋งํธ ์ฃผ๊ฑฐ | 3.2% |
| ์ค๋งํธ ๋ฌผ๋ฅ | 2.8% |
| ๊ธฐํ | 2.9% |
์๋ฃ : ๋ถ์ฌ๊ตฐ ์ค๋งํธ๋์๊ณํ(2023)
[๋์ด์
์ธ ๋ณต์ง์ค๋ก์กฐ์ฌ] ์ํ์์ ๊ฐ์ ์ ์ํด ํ์ํ ์ฌํญ ์ค๋ฌธ๊ฒฐ๊ณผ
| ํน์ฑ | ๋๋ก์์ ์์ค | ๋ณดํ์๊ธธ ์ ๋น | ๊ฐ๋ก๋ฑ ํ์ถฉ | CCTV ์ค์น | ์ฃผ๋ฏผ ๋ฐฉ๋ฒ์์ฐฐ | ๋
ธํ์์ค | ์์ฌ ๊ท๊ฐ ์๋น์ค | ๊ธฐํ |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ๋์ด์ด | 10.1 | 21.0 | 23.1 | 25.7 | 8.1 | 8.2 | 3.4 | 0.3 |
| ์ | 10.7 | 20.8 | 20.5 | 28.1 | 8.4 | 7.2 | 4.2 | 0.1 |
| ๋ฉด | 9.5 | 21.2 | 25.8 | 23.3 | 7.8 | 9.3 | 2.7 | 0.4 |
| ๋์ด๊ฐ | 8.7 | 22.3 | 23.2 | 23.1 | 7.9 | 12.1 | 2.5 | 0.2 |
| ๋น๋์ด๊ฐ | 10.6 | 20.5 | 23.1 | 26.6 | 8.2 | 6.9 | 3.7 | 0.3 |
| 30๋ ์ดํ | 14.6 | 16.5 | 27.6 | 25.2 | 6.4 | 5.8 | 3.6 | 0.2 |
| 40๋ | 6.3 | 20.1 | 19.6 | 33.1 | 10.9 | 4.6 | 5.1 | 0.2 |
| 50๋ | 10.8 | 19.4 | 23.0 | 27.2 | 6.8 | 8.4 | 4.1 | 0.3 |
| 60๋ | 10.5 | 22.9 | 22.8 | 23.4 | 7.2 | 10.2 | 2.6 | 0.4 |
| 70๋ ์ด์ | 9.9 | 23.5 | 24.0 | 21.1 | 8.7 | 10.4 | 2.2 | 0.2 |
์๋ฃ : ๋์ด์งํฅ์ฒญ 2023 ๋์ด์
์ธ๋ฑ์ ๋ํ ๋ณต์ง์ค๋ก์กฐ์ฌ
| ๊ตฌ๋ถ | ๋์๋ฌธ์ | ์ฃผ๋ฏผ ์์ | ์์ ์ฃผ๋ฏผ |
| --- | --- | --- | --- |
| ๋ณต์ง | ๋
๊ฑฐ๋
ธ์ธ ๋๋ด | - ๋ถ์ฌ๊ตฐ ๋ณด๊ฑด๋ณต์ง ๋ถ์ผ ๊ฐ์ ์ฌํญ์ผ๋ก ์ง์ | 70๋ ๋จ์ฑ |
| ๋ณต์ง | ๋
๊ฑฐ๋
ธ์ธ ๋๋ด | - ์ธ๊ณต์ง๋ฅ ๋๋ด์๋น์ค ์๋ฒ ์ฌ์ฉ ํธํ | 60๋ ์ฌ์ฑ |
| ๋ณต์ง | ์์ค๋
ธํํ | - ๋ถ์ฌ๊ตฐ ์๋ฉด ๊ฒฝ๋ก๋น ๋ด ์์ค ๋
ธํํ | 80๋ ๋จ์ฑ |
| ๋ณต์ง | ์ฌ๊ฐ ์ฝํ
์ธ ๋ถ์กฑ | - ๊ฒฝ๋ก๋น ๋ด ์ฌ๊ฐ ์ฝํ
์ธ ๋ถ์กฑ | 60๋ ์ฌ์ฑ |
| ์์ | ์์ ์ธํ๋ผ ๋ถ์กฑ | - ๋ถ์ฌ์์ฅ ๊ทผ๊ต ๋
ธ์ธ ๋ณดํ์ ๊ตํต์ฌ๊ณ ์ํ ๋์ | 60๋ ๋จ์ฑ |
| ๊ด๊ด | ๊ด๊ด ์ฝํ
์ธ ๋ถ์กฑ | - ์๋
๋์ ์
๋ฐ์ดํธ๋์ง ์์ ๋ฐ๋ฌผ๊ด ๋ด ์ฝํ
์ธ | 50๋ ๋จ์ฑ |
Key difference: The 30B teacher replaces charts with
[Figure: ...]placeholders, while WigtnOCR-2B extracts the actual data from charts into structured markdown tables โ producing 21% more content from the same page.
๐ Citation
If you use WigtnOCR in your research, please cite:
@software{wigtnocr2026,
title = {WigtnOCR: VLM-based Korean Government Document Parser using Teacher-Student Pseudo-GT Pipeline},
author = {WIGTN Crew},
year = {2026},
url = {https://huggingface.co/Wigtn/Qwen3-VL-2B-WigtnOCR}
}
๐ข About WIGTN Crew
WIGTN Crew is an AI-native open-source research crew based in Korea.
We build practical, domain-specialized AI tools โ starting with document intelligence for Korean government documents.
- ๐ Website: https://wigtn.com
- ๐ GitHub: https://github.com/wigtn
- ๐ค HuggingFace: https://huggingface.co/Wigtn