Document Question Answering
Transformers
Safetensors
Vietnamese
internvl_chat
image-feature-extraction
custom_code
Instructions to use YuukiAsuna/Vintern-1B-v2-ViTable-docvqa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use YuukiAsuna/Vintern-1B-v2-ViTable-docvqa with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("document-question-answering", model="YuukiAsuna/Vintern-1B-v2-ViTable-docvqa", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("YuukiAsuna/Vintern-1B-v2-ViTable-docvqa", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Add report link and benchmarks
Browse files
README.md
CHANGED
|
@@ -9,7 +9,12 @@ base_model:
|
|
| 9 |
pipeline_tag: document-question-answering
|
| 10 |
library_name: transformers
|
| 11 |
---
|
| 12 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 15 |
Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)
|
|
@@ -17,11 +22,21 @@ Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2
|
|
| 17 |
|
| 18 |
## Benchmarks
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
##
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
**Citation:**
|
| 27 |
|
|
|
|
| 9 |
pipeline_tag: document-question-answering
|
| 10 |
library_name: transformers
|
| 11 |
---
|
| 12 |
+
# Vintern-1B-v2-ViTable-docvqa
|
| 13 |
+
|
| 14 |
+
<p align="center">
|
| 15 |
+
<a href="https://drive.google.com/file/d/1MU8bgsAwaWWcTl9GN1gXJcSPUSQoyWXy/view?usp=sharing"><b>Report Link</b>👁️</a>
|
| 16 |
+
</p>
|
| 17 |
+
|
| 18 |
|
| 19 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 20 |
Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)
|
|
|
|
| 22 |
|
| 23 |
## Benchmarks
|
| 24 |
|
| 25 |
+
<div align="center">
|
| 26 |
+
|
| 27 |
+
| Model | ANLS | Semantic Similarity | MLLM-as-judge (Gemini) |
|
| 28 |
+
|-----------------------------|------------------------|------------------------|------------------------|
|
| 29 |
+
| Gemini 1.5 Flash | 0.35 | 0.56 | 0.40 |
|
| 30 |
+
| Vintern-1B-v2 | 0.04 | 0.45 | 0.50 |
|
| 31 |
+
| Vintern-1B-v2-ViTable-docvq | **0.50** | **0.71** | **0.59** |
|
| 32 |
+
|
| 33 |
+
</div>
|
| 34 |
+
|
| 35 |
+
<!-- Code benchmark: to be written later -->
|
| 36 |
|
| 37 |
+
<!-- To be written later ## Usage
|
| 38 |
|
| 39 |
+
You can use this notebook <a href="https://colab.research.google.com/"> <img src="https://colab.research.google.com/img/colab_favicon_256px.png" width="30"></a> -->
|
| 40 |
|
| 41 |
**Citation:**
|
| 42 |
|