--- base_model: Qwen/Qwen2.5-VL-7B-Instruct datasets: - custom-indian-invoice-dataset language: - en - hi - ta - ml - te - kn - bn license: apache-2.0 pipeline_tag: image-text-to-text tags: - qwen2.5-vl - vision-language-model - invoice-extraction - document-understanding - ocr - indian-invoices - gst - lora - peft - unsloth - fine-tuned --- --- # Qwen2.5-VL 7B — Indian Invoice Extraction Fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) specialized for extracting structured JSON from Indian GST invoices (B2B, B2C, export, IRN/ACK, multi-layout). Trained with QLoRA + Unsloth on an NVIDIA A100 80 GB. Merged via PEFT merge_and_unload(). --- ## Available Versions | Version | Link | Use case | |---|---|---| | Merged bfloat16 | [gouri100/Unsloth_Qwen-2.5_7B-Invoice-962](https://huggingface.co/gouri100/Unsloth_Qwen-2.5_7B-Invoice-962) | Full precision inference | | GGUF Q4_K_M | [gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF](https://huggingface.co/gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF) | llama.cpp / Ollama — light GPU | | GGUF Q8_0 | [gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF](https://huggingface.co/gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF) | llama.cpp / Ollama — higher quality | --- ## Model Summary | Property | Value | |---|---| | **Base model** | Qwen/Qwen2.5-VL-7B-Instruct | | **Fine-tuning method** | QLoRA (r=64, alpha=128) | | **Merge method** | PEFT merge_and_unload() — bfloat16 safetensors | | **Framework** | Unsloth + TRL SFTTrainer | | **Hardware** | NVIDIA A100 80 GB | | **Task** | Invoice image to Structured JSON | | **Input types** | JPG, PNG, PDF (page 1 at 200 DPI) | | **Languages** | English, Hindi, Tamil, Malayalam, Telugu, Kannada, Bengali | | **License** | Apache 2.0 | --- ## Training Dataset | Property | Value | |---|---| | **Total samples** | 962 | | **File types** | JPG, PNG, PDF | | **PDF handling** | Page 1 extracted at 200 DPI, resized to max 1280px | | **Invoice types** | B2B GST, B2C, Export, IRN/ACK | | **Annotation** | Manually labeled JSON per invoice | --- ## Output JSON Schema ```json { "metadata": { "invoice_no": "string", "invoice_date": "YYYY-MM-DD", "irn": "string | null", "ack_no": "string | null", "ack_date": "string | null" }, "supplier": { "name": "string", "gstin": "string", "address": "string", "state_code": "string" }, "buyer": { "name": "string", "gstin": "string", "address": "string", "state_code": "string" }, "line_items": [{ "sl_no": "number", "description": "string", "hsn_sac": "string", "qty": "number", "unit": "string", "rate": "number", "amount": "number" }], "tax": { "taxable_value": "number", "cgst_rate": "number", "cgst_amount": "number", "sgst_rate": "number", "sgst_amount": "number", "igst_rate": "number", "igst_amount": "number", "total_tax": "number", "grand_total": "number", "round_off": "number" } } ``` --- ## Training Configuration | Hyperparameter | Value | |---|---| | **Epochs** | 3 | | **Learning rate** | 0.0002 | | **LR scheduler** | Cosine | | **Warmup ratio** | 0.05 | | **Per device batch size** | 2 | | **Gradient accumulation steps** | 8 | | **Effective batch size** | 16 | | **Max sequence length** | 2048 | | **Precision** | bfloat16 | | **LoRA rank (r)** | 64 | | **LoRA alpha** | 128 | | **LoRA dropout** | 0.05 | | **LoRA target modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | **Vision layers fine-tuned** | Yes | | **Gradient checkpointing** | Unsloth optimized | --- ## Training Results | Metric | Value | |---|---| | **Final training loss** | 0.2594 | | **Total steps** | N/A | | **Training time** | 2243.16s (37.4 min) | | **Steps per second** | 0.082 | --- ## Inference ### With transformers (merged model) ```python from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor from PIL import Image import torch, json model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "gouri100/Unsloth_Qwen-2.5_7B-Invoice-962", torch_dtype = torch.bfloat16, device_map = 'auto', ) processor = AutoProcessor.from_pretrained("gouri100/Unsloth_Qwen-2.5_7B-Invoice-962") image = Image.open('invoice.jpg').convert('RGB') SYSTEM_PROMPT = ( 'You are an expert system for extracting structured data from invoices. ' 'Return ONLY valid JSON. Do NOT include explanations or extra text.' ) messages = [ {'role': 'system', 'content': [{'type': 'text', 'text': SYSTEM_PROMPT}]}, {'role': 'user', 'content': [ {'type': 'image', 'image': image}, {'type': 'text', 'text': 'Extract structured invoice data as JSON.'} ]} ] inputs = processor.apply_chat_template( messages, add_generation_prompt = True, tokenize = True, return_tensors = 'pt', return_dict = True, ).to(model.device) with torch.no_grad(): output_ids = model.generate( **inputs, max_new_tokens = 1024, temperature = 0.1, do_sample = False, ) decoded = processor.decode( output_ids[0][inputs['input_ids'].shape[1]:], skip_special_tokens = True, ) result = json.loads(decoded) print(json.dumps(result, indent=2, ensure_ascii=False)) ``` ### Load in 4-bit (lighter GPUs) ```python from transformers import BitsAndBytesConfig bnb_config = BitsAndBytesConfig( load_in_4bit = True, bnb_4bit_compute_dtype = torch.bfloat16, bnb_4bit_quant_type = 'nf4', bnb_4bit_use_double_quant = True, ) model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "gouri100/Unsloth_Qwen-2.5_7B-Invoice-962", quantization_config = bnb_config, device_map = 'auto', ) ``` ### From PDF ```python from pdf2image import convert_from_path pages = convert_from_path('invoice.pdf', dpi=200) image = pages[0] # then follow inference code above ``` ### With Ollama (GGUF) ```bash ollama run gouri100/Unsloth_Qwen-2.5_7B-Invoice-962-GGUF ``` --- ## Limitations - Optimized for Indian GST invoice formats — may underperform on foreign layouts - Scans below 100 DPI or heavily skewed images reduce accuracy - Handwritten invoices are not supported - Multi-page invoices: only page 1 was used during training - Always validate extracted JSON against your business logic before use --- ## Citation ```bibtex @misc{qwen2.5-vl-7b-indian-invoice, title = {Qwen2.5-VL-7B Fine-tuned for Indian Invoice Extraction}, author = {Your Name}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/gouri100/Unsloth_Qwen-2.5_7B-Invoice-962}} } ``` *Fine-tuned with [Unsloth](https://github.com/unslothai/unsloth) · Merged with [PEFT](https://github.com/huggingface/peft) · Trained on NVIDIA A100 80 GB*