HarrisonKim's picture
docs: remove paper/EMNLP references (not yet submitted)
17e4017
|
Raw
History Blame Contribute Delete
22.7 kB
metadata
license: apache-2.0
language:
  - ko
  - en
  - zh
library_name: transformers
base_model: Qwen/Qwen3-VL-2B-Instruct
tags:
  - document-parsing
  - document-intelligence
  - ocr
  - vlm
  - vision-language-model
  - lora
  - distillation
  - korean
  - qwen3-vl
  - structure-preserving
  - rag
  - government-document
  - table-extraction
pipeline_tag: image-text-to-text
datasets:
  - Wigtn/KoGovDoc-Bench
metrics:
  - teds
  - hit@1
model-index:
  - name: WigtnOCR-2B
    results:
      - task:
          type: document-parsing
        dataset:
          name: OmniDocBench
          type: opendatalab/OmniDocBench
        metrics:
          - name: Text NED
            type: ned
            value: 0.288
          - name: Table TEDS
            type: teds
            value: 0.649
          - name: Table TEDS-S
            type: teds
            value: 0.732
          - name: Formula CDM F1
            type: f1
            value: 0.884
          - name: Reading Order NED
            type: ned
            value: 0.211
      - task:
          type: document-parsing
        dataset:
          name: KoGovDoc-Bench
          type: Wigtn/KoGovDoc-Bench
        metrics:
          - name: NED
            type: ned
            value: 0.285
          - name: Hit@1
            type: accuracy
            value: 0.739
          - name: MRR@10
            type: mrr
            value: 0.788
WigtnOCR Logo

WigtnOCR-2B: Pseudo-Label Distillation for Structure-Preserving Document Parsing

HF Model HF Dataset GitHub License: Apache 2.0 Python 3.11+ vLLM

Built by WIGTN Crew

A 2B VLM distilled from 30B teacher that matches its document parsing quality โ€” and achieves #1 retrieval among 6 parsers on Korean government documents.

Highlights

โญ๏ธ Base Model: Qwen3-VL-2B-Instruct
โญ๏ธ Dataset: huggingface.co/datasets/Wigtn/KoGovDoc-Bench
โญ๏ธ GitHub: github.com/Hyeongseob91/research-vlm-based-document-parsing


Key Features

  • 30B โ†’ 2B Distillation: Matches or exceeds 30B teacher in 4/5 OmniDocBench categories via quality-filtered pseudo-labeling
  • Table TEDS +12.6pp: Surpasses teacher on table structure recognition through selective training on high-quality GT
  • #1 Retrieval: Best Hit@1 (0.739) and MRR@10 (0.788) among 6 parsers โ€” proving structured parsing improves RAG
  • Korean Government Documents: Optimized for complex Korean government layouts (tables, forms, multi-column)
  • Production-Ready: Single GPU serving via vLLM, 2B params, fast inference

Highlights

Category Metric WigtnOCR-2B vs 30B Teacher vs PaddleOCR
Parsing Text NED โ†“ 0.288 -0.001 (matches) โ€”
Tables Table TEDS โ†‘ 0.649 +12.6pp โ€”
Retrieval Hit@1 โ†‘ 0.739 +2.3pp +22.7pp
Retrieval MRR@10 โ†‘ 0.788 +1.7pp +19.6pp
Reliability Skip Rate โ†“ 5.8% -13.0pp from base โ€”

Quick Start

Transformers (Direct Inference)

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Wigtn/Qwen3-VL-2B-WigtnOCR",
    torch_dtype="auto",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("Wigtn/Qwen3-VL-2B-WigtnOCR")

image = Image.open("document_page.png")

messages = [
    {"role": "system", "content": "You are WigtnOCR, a document parser. Convert the document image to well-structured Markdown."},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "Convert this document page to Markdown. Preserve all headings, tables, formulas, and reading order."},
    ]},
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=4096)
output = processor.batch_decode(output_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(output)

vLLM (Production Serving)

vllm serve Wigtn/Qwen3-VL-2B-WigtnOCR \
    --max-model-len 16384 \
    --trust-remote-code
from openai import OpenAI
import base64

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

with open("document_page.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="Wigtn/Qwen3-VL-2B-WigtnOCR",
    messages=[
        {"role": "system", "content": "You are WigtnOCR, a document parser. Convert the document image to well-structured Markdown."},
        {"role": "user", "content": [
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img_b64}"}},
            {"type": "text", "text": "Convert this document page to Markdown."},
        ]},
    ],
    max_tokens=4096,
)
print(response.choices[0].message.content)

OmniDocBench Results

OmniDocBench Results

Evaluated on OmniDocBench (CVPR 2025) โ€” 1,355 pages across 9 document types.

MetricQwen3-VL-2BWigtnOCR-2BQwen3-VL-30BMarkerDirection
Text NED0.3640.2880.2890.218lower=better
Table TEDS0.5610.6490.5230.586higher=better
Table TEDS-S0.6670.7320.6570.658higher=better
Formula CDM F10.8650.8840.9390.863higher=better
Formula ExpRate0.5040.6000.6920.582higher=better
Reading Order NED0.3000.2110.2270.165lower=better
Skip Rate18.8%5.8%5.5%0.4%lower=better

Student matches or exceeds 30B teacher in 4/5 metric categories. Table TEDS surpasses teacher by +12.6pp, suggesting quality-filtered distillation produces a stronger training signal than the teacher's average output.


KoGovDoc Retrieval Results

KoGovDoc Retrieval

Semantic chunking (BGE-M3) โ†’ FAISS retrieval on KoGovDoc-Bench โ€” 294 val pages, 564 queries, 6 parsers compared.

ModelTypeHit@1 โ†‘Hit@5 โ†‘MRR@10 โ†‘nDCG@10 โ†‘
WigtnOCR-2BVLM (ours)0.7390.8550.7880.437
Qwen3-VL-30BVLM (teacher)0.7160.8390.7710.411
MarkerPDF parser0.7110.8530.7710.412
Qwen3-VL-2BVLM (base)0.7090.8140.7560.444
MinerUPDF parser0.6080.7890.6820.384
PaddleOCRPure OCR0.5120.6930.5920.293

WigtnOCR-2B ranks #1 in Hit@1, Hit@5, and MRR@10 โ€” proving structured VLM parsing directly improves RAG retrieval over traditional OCR pipelines.


BC vs. Retrieval: An Interesting Finding

BC vs Retrieval

Chunk quality (BC/CS, MoC framework) does not predict retrieval performance.

ModelBC โ†‘CS โ†“Hit@1 โ†‘
MinerU0.7352.7110.608 (5th)
WigtnOCR-2B0.7062.8590.739 (1st)
PaddleOCR0.6543.4200.512 (6th)

MinerU produces the cleanest chunk boundaries but ranks 5th in retrieval. Text richness and structural fidelity matter more than boundary quality for end-to-end RAG.


KoGovDoc Parsing Quality

ModelNED โ†“Evaluated
WigtnOCR-2B0.285289/294
Qwen3-VL-30B (Teacher)0.334294/294
Qwen3-VL-2B (Base)0.390294/294

WigtnOCR-2B surpasses its 30B teacher on Korean government documents.


Ablation Study

ConfigLoRA rEpochsText NED โ†“Table TEDS โ†‘TEDS-S โ†‘CDM F1 โ†‘RO NED โ†“Skip % โ†“Verdict
v1 (final)830.2880.6490.7320.8840.2115.8%Best overall
v2-best3230.3090.6000.697โ€”0.2150.7%Table regression
v2-last3250.3060.6100.6950.8920.2140.0%Overfitting on text

Key findings:

  • LoRA rank 8 outperforms rank 32 โ€” larger capacity leads to table structure regression (-4.9pp TEDS) despite marginally better formula recognition
  • 3 epochs optimal โ€” 5 epochs causes overfitting (eval loss rises after epoch 3)
  • v2 improves skip rate to 0% but at the cost of core parsing quality
  • v1 selected as final model due to superior table/text quality which matters most for downstream RAG

Training Details

Parameter Value
Base model Qwen3-VL-2B-Instruct
Teacher Qwen3-VL-30B-A3B-Instruct (FP8)
Judge Qwen3.5-122B-A10B-NVFP4 (text-only, 5-dim scoring)
Method LoRA (rank=8, alpha=32, target=all linear layers)
Training samples 2,667 (filtered from 4,501 pages, score โ‰ฅ 3/5)
Validation samples 294 (held out)
Training time 31 minutes
Framework ms-swift + DeepSpeed ZeRO-2
Epochs 3
Learning rate 1e-4
Batch size 1 (gradient accumulation 8)
Hardware 2 ร— NVIDIA RTX PRO 6000 (98GB each)
Trainable params 8.7M (0.4% of total)

Training Data

Dataset Documents Pages Language Source
KoGovDoc 10 3,637 Korean Government publications
ArXivPapers 39 864 English arXiv (cs.CL, cs.CV, cs.LG)
Total 49 4,501 Bilingual โ€”

GT generated by Qwen3-VL-30B, validated by Qwen3.5-122B with 74โ€“75% pass rate. Quality filtering removes hallucinations, repetitions, and chain-of-thought contamination.


Evaluation Stack

Component Tool Purpose
Preprocessing PyMuPDF PDF โ†’ page images (200 DPI)
Chunking BGE-M3 (semantic) Embedding-based boundary detection
BC/CS Metrics Qwen2.5-1.5B Perplexity computation (MoC, ACL 2025)
Embedding BAAI/bge-m3 Chunk โ†’ vector
Retrieval FAISS Cosine similarity search

Intended Use

  • Korean government document digitization and parsing
  • RAG pipeline preprocessing (PDF โ†’ structured Markdown โ†’ chunks โ†’ retrieval)
  • Academic paper parsing (tables, formulas, reading order)
  • Bilingual (Korean + English) document processing

Limitations

  • Optimized for Korean and English; other languages may have reduced quality
  • Formula recognition still trails 30B teacher (CDM F1: 0.884 vs 0.939)
  • Best results at 200 DPI; lower resolution degrades quality
  • Skip rate 5.8% โ€” some complex pages may fail (v2 achieves 0% but with quality trade-offs)

Example Output

Comparison on a complex Korean government document page (kogov_001 p.9 โ€” survey tables + statistical charts + mixed layout).

30B Teacher WigtnOCR-2B (Ours)
Charts [Figure: ...] placeholder Extracts data into tables
Content 1,582 chars 1,912 chars (+21%)
Tables 3 tables 4 tables (chart โ†’ table)
PDF Original PDF Original โ€” kogov_001 page 9
30B Teacher Output (Qwen3-VL-30B) โ€” 1,582 chars
- ์ง€์—ญ ์ฃผ๋ฏผ ์˜๊ฒฌ ๋ฐ ์ˆ˜์š”

## [๊ตฐ๋ฏผ ์„ค๋ฌธ์กฐ์‚ฌ] ๊ตฐ๋ฏผ 478๋ช… ๋Œ€์ƒ ์„ค๋ฌธ์กฐ์‚ฌ๋กœ ๋„์‹œ๋ฌธ์ œ ๋„์ถœ
- ๊ตฐ๋ฏผ ๋Œ€์ƒ ์„ค๋ฌธ์กฐ์‚ฌ ์‚ฌํ•ญ

| No. | ์„ค๋ฌธ ํ•ญ๋ชฉ |
|-----|-----------|
| Q1 | ์„ฑ๋ณ„ / ์—ฐ๋ น / ์ง€์—ญ / ๋ถˆํŽธ์‚ฌํ•ญ |
| Q2 | ์•ˆ์ „ / ํ™˜๊ฒฝ / ์—๋„ˆ์ง€ / ๊ตํ†ต / ์‚ฐ์—… / ํ–‰์ • / ๋ณต์ง€ / ๋ฌธํ™” / ๊ด€๊ด‘ / ๋†์—… / ๊ต์œก |
| Q3 | ์Šค๋งˆํŠธ๋„์‹œ ์š”์†Œ / ์ง€์—ญ / ์„œ๋น„์Šค / ๋ฆฌ๋น™๋žฉ |

### - ๊ตฐ๋ฏผ ์„ค๋ฌธ๊ฒฐ๊ณผ

[Figure: ๋ณด๋‹ค ์•ˆ์ „ํ•œ ๋ถ€์—ฌ๋ฅผ ์œ„ํ•ด ๊ฐœ์„ ํ•ด์•ผ ํ•  ๋ฌธ์ œ]
[Figure: ์Šค๋งˆํŠธ๋„์‹œ ์šฐ์„ ๋„์ž… ์„œ๋น„์Šค]

์ž๋ฃŒ : ๋ถ€์—ฌ๊ตฐ ์Šค๋งˆํŠธ๋„์‹œ๊ณ„ํš(2023)

## [๋†์–ด์—…์ธ ๋ณต์ง€์‹คํƒœ์กฐ์‚ฌ] ์ƒํ™œ์•ˆ์ „ ๊ฐœ์„ ์„ ์œ„ํ•ด ํ•„์š”ํ•œ ์‚ฌํ•ญ ์„ค๋ฌธ๊ฒฐ๊ณผ

| ํŠน์„ฑ | ๋„๋กœ์•ˆ์ „์‹œ์„ค | ๋ณดํ–‰์ž๊ธธ ์ •๋น„ | ๊ฐ€๋กœ๋“ฑ ํ™•์ถฉ | CCTV ์„ค์น˜ | ์ฃผ๋ฏผ ๋ฐฉ๋ฒ” ์ˆœ์ฐฐ | ๋…ธํ›„์‹œ์„ค | ์•ˆ์‹ฌ ๊ท€๊ฐ€ ์„œ๋น„์Šค | ๊ธฐํƒ€ |
|------|-------------|-------------|------------|----------|--------------|---------|----------------|------|
| ๋†์–ด์ดŒ | 10.1 | 21.0 | 23.1 | 25.7 | 8.1 | 8.2 | 3.4 | 0.3 |
| ์ | 10.7 | 20.8 | 20.5 | 28.1 | 8.4 | 7.2 | 4.2 | 0.1 |
| ๋ฉด | 9.5 | 21.2 | 25.8 | 23.3 | 7.8 | 9.3 | 2.7 | 0.4 |
| ๋†์–ด๊ฐ€ | 8.7 | 22.3 | 23.2 | 23.1 | 7.9 | 12.1 | 2.5 | 0.2 |
| ๋น„๋†์–ด๊ฐ€ | 10.6 | 20.5 | 23.1 | 26.6 | 8.2 | 6.9 | 3.7 | 0.3 |
| 30๋Œ€ ์ดํ•˜ | 14.6 | 16.5 | 27.6 | 25.2 | 6.4 | 5.8 | 3.6 | 0.2 |
| 40๋Œ€ | 6.3 | 20.1 | 19.6 | 33.1 | 10.9 | 4.6 | 5.1 | 0.2 |
| 50๋Œ€ | 10.8 | 19.4 | 23.0 | 27.2 | 6.8 | 8.4 | 4.1 | 0.3 |
| 60๋Œ€ | 10.5 | 22.9 | 22.8 | 23.4 | 7.2 | 10.2 | 2.6 | 0.4 |
| 70๋Œ€ ์ด์ƒ | 9.9 | 23.5 | 24.0 | 21.1 | 8.7 | 10.4 | 2.2 | 0.2 |

์ž๋ฃŒ : ๋†์ดŒ์ง„ํฅ์ฒญ 2023 ๋†์–ด์—…์ธ๋“ฑ์— ๋Œ€ํ•œ ๋ณต์ง€์‹คํƒœ์กฐ์‚ฌ

| ๊ตฌ๋ถ„ | ๋„์‹œ๋ฌธ์ œ | ์ฃผ๋ฏผ ์ˆ˜์š” | ์ˆ˜์š” ์ฃผ๋ฏผ |
|------|----------|-----------|-----------|
| ๋ณต์ง€ | ๋…๊ฑฐ๋…ธ์ธ ๋Œ๋ด„ | - ๋ถ€์—ฌ๊ตฐ ๋ณด๊ฑด๋ณต์ง€ ๋ถ„์•ผ ๊ฐœ์„ ์‚ฌํ•ญ์œผ๋กœ ์ง€์  | 70๋Œ€ ๋‚จ์„ฑ |
| ๋ณต์ง€ | ๋…๊ฑฐ๋…ธ์ธ ๋Œ๋ด„ | - ์ธ๊ณต์ง€๋Šฅ ๋Œ๋ด„์„œ๋น„์Šค ์‹œ๋ฒ” ์‚ฌ์šฉ ํฌ๋ง | 60๋Œ€ ์—ฌ์„ฑ |
| ๋ณต์ง€ | ์‹œ์„ค๋…ธํ›„ํ™” | - ๋ถ€์—ฌ๊ตฐ ์๋ฉด ๊ฒฝ๋กœ๋‹น ๋‚ด ์‹œ์„ค ๋…ธํ›„ํ™” | 80๋Œ€ ๋‚จ์„ฑ |
| ๋ณต์ง€ | ์—ฌ๊ฐ€ ์ฝ˜ํ…์ธ  ๋ถ€์กฑ | - ๊ฒฝ๋กœ๋‹น ๋‚ด ์—ฌ๊ฐ€ ์ฝ˜ํ…์ธ  ๋ถ€์กฑ | 60๋Œ€ ์—ฌ์„ฑ |
| ์•ˆ์ „ | ์•ˆ์ „ ์ธํ”„๋ผ ๋ถ€์กฑ | - ๋ถ€์—ฌ์‹œ์žฅ ๊ทผ๊ต ๋…ธ์ธ ๋ณดํ–‰์ž ๊ตํ†ต์‚ฌ๊ณ  ์œ„ํ—˜ ๋†’์Œ | 60๋Œ€ ๋‚จ์„ฑ |
| ๊ด€๊ด‘ | ๊ด€๊ด‘ ์ฝ˜ํ…์ธ  ๋ถ€์กฑ | - ์ˆ˜๋…„ ๋™์•ˆ ์—…๋ฐ์ดํŠธ๋˜์ง€ ์•Š์€ ๋ฐ•๋ฌผ๊ด€ ๋‚ด ์ฝ˜ํ…์ธ  | 50๋Œ€ ๋‚จ์„ฑ |
WigtnOCR-2B Output (Ours) โ€” 1,912 chars
- ์ง€์—ญ ์ฃผ๋ฏผ ์˜๊ฒฌ ๋ฐ ์ˆ˜์š”

[๊ท ๋ฏผ ์„ค๋ฌธ์กฐ์‚ฌ] ๊ท ๋ฏผ 478๋ช… ๋Œ€์ƒ ์„ค๋ฌธ์กฐ์‚ฌ๋กœ ๋„์‹œ๋ฌธ์ œ ๋„์ถœ
- ๊ท ๋ฏผ ๋Œ€์ƒ ์„ค๋ฌธ์กฐ์‚ฌ ์‚ฌํ•ญ

| No. | ์„ค๋ฌธ ํ•ญ๋ชฉ |
| --- | --- |
| Q1 | ์„ฑ๋ณ„ / ์—ฐ๋ น / ์ง€์—ญ / ๋ถˆํŽธ์‚ฌํ•ญ |
| Q2 | ์•ˆ์ „ / ํ™˜๊ฒฝ / ์—๋„ˆ์ง€ / ๊ตํ†ต / ์‚ฐ์—… / ํ–‰์ • / ๋ณด๊ฑด / ๋ณต์ง€ / ๋ฌธํ™” / ๊ด€๊ด‘ / ๋†์—… / ๊ต์œก |
| Q3 | ์Šค๋งˆํŠธ๋„์‹œ ์š”์†Œ / ์ง€์—ญ / ์„œ๋น„์Šค / ๋ฆฌ๋น™๋žฉ |

- ๊ท ๋ฏผ ์„ค๋ฌธ๊ฒฐ๊ณผ

| ๋ณด๋‹ค ์•ˆ์ „ํ•œ ๋ถ€์—ฌ๋ฅผ ์œ„ํ•ด ๊ฐœ์„ ํ•ด์•ผ ํ•  ๋ฌธ์ œ | ์Šค๋งˆํŠธ๋„์‹œ ์šฐ์„ ๋„์ž… ์„œ๋น„์Šค |
| --- | --- |
| ์‹œ์„ค ๋…ธํ›„ํ™” | 34.1% |
| ๊ตํ†ต์‚ฌ๊ณ  ๋‹ค๋ฐœ๊ตฌ๊ฐ„ | 13.7% |
| ์ž์—ฐ์žฌํ•ด๊ฐ์‹œ | 12.8% |
| ์‹ฌ์•ผ์‹œ๊ฐ„ ๋ฒ”์ฃ„ | 10.0% |
| ํ†ตํ•™ ์•ˆ์ „ | 9.3% |
| ์ธ์žฌ | 8.2% |
| ์žฌ๋‚œ ์˜ˆ๊ฒฝ๋ณด | 8.7% |
| ๊ธฐํƒ€ | 3.4% |
| ์Šค๋งˆํŠธ ๋ณด๊ฑด/์˜๋ฃŒ/๋ณต์ง€ | 17.4% |
| ์Šค๋งˆํŠธ ๊ตํ†ต | 15.7% |
| ์Šค๋งˆํŠธ ํ™˜๊ฒฝ/์—๋„ˆ์ง€/์ˆ˜์ž์› | 10.5% |
| ์Šค๋งˆํŠธ ๋ฌธํ™”/๊ด€๊ด‘/์Šคํฌ์ธ  | 10.1% |
| ์Šค๋งˆํŠธ ๊ทผ๋กœ/๊ณ ์šฉ | 9.9% |
| ์Šค๋งˆํŠธ ํ–‰์ • | 8.9% |
| ์Šค๋งˆํŠธ ๊ต์œก | 7.6% |
| ์Šค๋งˆํŠธ ๋ฐฉ๋ฒ•/๋ฐฉ์žฌ | 6.4% |
| ์Šค๋งˆํŠธ ์‹œ์„ค๋ฌผ๊ด€๋ฆฌ | 4.5% |
| ์Šค๋งˆํŠธ ์ฃผ๊ฑฐ | 3.2% |
| ์Šค๋งˆํŠธ ๋ฌผ๋ฅ˜ | 2.8% |
| ๊ธฐํƒ€ | 2.9% |

์ž๋ฃŒ : ๋ถ€์—ฌ๊ตฐ ์Šค๋งˆํŠธ๋„์‹œ๊ณ„ํš(2023)

[๋†์–ด์—…์ธ ๋ณต์ง€์‹ค๋ก€์กฐ์‚ฌ] ์ƒํ™œ์•ˆ์ „ ๊ฐœ์„ ์„ ์œ„ํ•ด ํ•„์š”ํ•œ ์‚ฌํ•ญ ์„ค๋ฌธ๊ฒฐ๊ณผ

| ํŠน์„ฑ | ๋„๋กœ์•ˆ์ „์‹œ์„ค | ๋ณดํ–‰์ž๊ธธ ์ •๋น„ | ๊ฐ€๋กœ๋“ฑ ํ™•์ถฉ | CCTV ์„ค์น˜ | ์ฃผ๋ฏผ ๋ฐฉ๋ฒ•์ˆœ์ฐฐ | ๋…ธํ›„์‹œ์„ค | ์•ˆ์‹ฌ ๊ท€๊ฐ€ ์„œ๋น„์Šค | ๊ธฐํƒ€ |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ๋†์–ด์ดŒ | 10.1 | 21.0 | 23.1 | 25.7 | 8.1 | 8.2 | 3.4 | 0.3 |
| ์ | 10.7 | 20.8 | 20.5 | 28.1 | 8.4 | 7.2 | 4.2 | 0.1 |
| ๋ฉด | 9.5 | 21.2 | 25.8 | 23.3 | 7.8 | 9.3 | 2.7 | 0.4 |
| ๋†์–ด๊ฐ€ | 8.7 | 22.3 | 23.2 | 23.1 | 7.9 | 12.1 | 2.5 | 0.2 |
| ๋น„๋†์–ด๊ฐ€ | 10.6 | 20.5 | 23.1 | 26.6 | 8.2 | 6.9 | 3.7 | 0.3 |
| 30๋Œ€ ์ดํ•˜ | 14.6 | 16.5 | 27.6 | 25.2 | 6.4 | 5.8 | 3.6 | 0.2 |
| 40๋Œ€ | 6.3 | 20.1 | 19.6 | 33.1 | 10.9 | 4.6 | 5.1 | 0.2 |
| 50๋Œ€ | 10.8 | 19.4 | 23.0 | 27.2 | 6.8 | 8.4 | 4.1 | 0.3 |
| 60๋Œ€ | 10.5 | 22.9 | 22.8 | 23.4 | 7.2 | 10.2 | 2.6 | 0.4 |
| 70๋Œ€ ์ด์ƒ | 9.9 | 23.5 | 24.0 | 21.1 | 8.7 | 10.4 | 2.2 | 0.2 |

์ž๋ฃŒ : ๋†์ดŒ์ง„ํฅ์ฒญ 2023 ๋†์–ด์—…์ธ๋“ฑ์— ๋Œ€ํ•œ ๋ณต์ง€์‹ค๋ก€์กฐ์‚ฌ

| ๊ตฌ๋ถ„ | ๋„์‹œ๋ฌธ์ œ | ์ฃผ๋ฏผ ์ˆ˜์š” | ์ˆ˜์š” ์ฃผ๋ฏผ |
| --- | --- | --- | --- |
| ๋ณต์ง€ | ๋…๊ฑฐ๋…ธ์ธ ๋Œ๋ด„ | - ๋ถ€์—ฌ๊ตฐ ๋ณด๊ฑด๋ณต์ง€ ๋ถ„์•ผ ๊ฐœ์„ ์‚ฌํ•ญ์œผ๋กœ ์ง€์  | 70๋Œ€ ๋‚จ์„ฑ |
| ๋ณต์ง€ | ๋…๊ฑฐ๋…ธ์ธ ๋Œ๋ด„ | - ์ธ๊ณต์ง€๋Šฅ ๋Œ๋ด„์„œ๋น„์Šค ์‹œ๋ฒ” ์‚ฌ์šฉ ํ˜ธํ‰ | 60๋Œ€ ์—ฌ์„ฑ |
| ๋ณต์ง€ | ์‹œ์„ค๋…ธํ›„ํ™” | - ๋ถ€์—ฌ๊ตฐ ์๋ฉด ๊ฒฝ๋กœ๋‹น ๋‚ด ์‹œ์„ค ๋…ธํ›„ํ™” | 80๋Œ€ ๋‚จ์„ฑ |
| ๋ณต์ง€ | ์—ฌ๊ฐ€ ์ฝ˜ํ…์ธ  ๋ถ€์กฑ | - ๊ฒฝ๋กœ๋‹น ๋‚ด ์—ฌ๊ฐ€ ์ฝ˜ํ…์ธ  ๋ถ€์กฑ | 60๋Œ€ ์—ฌ์„ฑ |
| ์•ˆ์ „ | ์•ˆ์ „ ์ธํ”„๋ผ ๋ถ€์กฑ | - ๋ถ€์—ฌ์‹œ์žฅ ๊ทผ๊ต ๋…ธ์ธ ๋ณดํ–‰์ž ๊ตํ†ต์‚ฌ๊ณ  ์œ„ํ—˜ ๋†’์Œ | 60๋Œ€ ๋‚จ์„ฑ |
| ๊ด€๊ด‘ | ๊ด€๊ด‘ ์ฝ˜ํ…์ธ  ๋ถ€์กฑ | - ์ˆ˜๋…„ ๋™์•ˆ ์—…๋ฐ์ดํŠธ๋˜์ง€ ์•Š์€ ๋ฐ•๋ฌผ๊ด€ ๋‚ด ์ฝ˜ํ…์ธ  | 50๋Œ€ ๋‚จ์„ฑ |

Key difference: The 30B teacher replaces charts with [Figure: ...] placeholders, while WigtnOCR-2B extracts the actual data from charts into structured markdown tables โ€” producing 21% more content from the same page.


๐Ÿ“Ž Citation

If you use WigtnOCR in your research, please cite:

@software{wigtnocr2026,
  title   = {WigtnOCR: VLM-based Korean Government Document Parser using Teacher-Student Pseudo-GT Pipeline},
  author  = {WIGTN Crew},
  year    = {2026},
  url     = {https://huggingface.co/Wigtn/Qwen3-VL-2B-WigtnOCR}
}

๐Ÿข About WIGTN Crew

WIGTN Crew is an AI-native open-source research crew based in Korea.
We build practical, domain-specialized AI tools โ€” starting with document intelligence for Korean government documents.