mazafard/portugues_ocr_dataset_full
Viewer • Updated • 8.17k • 12
This repository contains a fine-tuned TrOCR model specifically trained for Optical Character Recognition (OCR) on Portuguese text. It's based on the microsoft/trocr-base-printed model and has been further trained on a dataset of Portuguese text images.
The model is a VisionEncoderDecoderModel from the Hugging Face Transformers library. It combines a vision encoder (to process images) and a text decoder (to generate text) for OCR tasks.
This model is intended for extracting text from images containing Portuguese text. It can be used for various applications, such as:
1. Install Dependencies:
bash pip install transformers datasets Pillow requests
2. Load the Model and Processor:
python from transformers import VisionEncoderDecoderModel, TrOCRProcessor from PIL import Image
model = VisionEncoderDecoderModel.from_pretrained("mazafard/trocr-finetuned_20250422_125947")
processor = TrOCRProcessor.from_pretrained("mazafard/trocr-finetuned_20250422_125947")
image = Image.open("path/to/your/image.png").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
Generate prediction
generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
training_args = TrainingArguments(
output_dir="./trocr-finetuned",
per_device_train_batch_size=56,
num_train_epochs=3,
save_steps=500,
logging_steps=50,
learning_rate=5e-5,
gradient_accumulation_steps=2,
fp16=True,
save_total_limit=2,
remove_unused_columns=False,
dataloader_num_workers=2,
)
Base model
microsoft/trocr-base-printed