YuukiAsuna's picture
Update usage
47ea6c0 verified
metadata
license: mit
datasets:
  - YuukiAsuna/VietnameseTableVQA
language:
  - vi
base_model:
  - 5CD-AI/Vintern-1B-v2
pipeline_tag: document-question-answering
library_name: transformers

Vintern-1B-v2-ViTable-docvqa

Report Link👁️

Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)

Benchmarks

Model ANLS Semantic Similarity MLLM-as-judge (Gemini)
Gemini 1.5 Flash 0.35 0.56 0.40
Vintern-1B-v2 0.04 0.45 0.50
Vintern-1B-v2-ViTable-docvqa 0.50 0.71 0.59

Usage

Check out this 🤗 HF Demo, or you can open it in Colab:
Open In Colab

Citation:

@misc{doan2024vintern1befficientmultimodallarge,
      title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese}, 
      author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
      year={2024},
      eprint={2408.12480},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2408.12480}, 
}