--- title: ERP-DocIQ emoji: ๐Ÿ“„ colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 6.9.0 app_file: gradio_app.py pinned: false license: mit short_description: OCR/IDP + ERP NLQ chatbot on small models (MiniCPM) tags: - build-small-hackathon - backyard-ai - best-minicpm-build - best-agent - tiny-titan - minicpm - ocr - idp - nlq - rag - agent - fine-tuning --- # ERP-DocIQ โ€” back-office automation on small models > **Build Small Hackathon ยท Practical track (Backyard AI).** A useful, problem-solving app > that runs on hardware you own โ€” built entirely on **open models โ‰ค32B**, with **OpenBMB > MiniCPM** as the load-bearing model (and fine-tuned for the domain). ## ๐Ÿ’ก The idea Retail back-offices drown in paperwork and report requests. **ERP-DocIQ** is an open-source, UiPath-style assistant that **reads your documents**, **answers questions about your ERP in plain English**, and **automates the boring clicks** โ€” all on small models, no per-robot license, no data leaving your tenancy. ## ๐Ÿงฉ The business problem A retail brand's IT team runs **UiPath** for invoice/PO processing and portal automation. It's **expensive** (per-robot / AI-Unit licensing, six figures/yr at volume), **brittle** (selector recorders break on every UI change), **locked-in** (closed IDP models you can't swap or fine-tune), and **slow to change** (new layouts wait on a closed retraining toolchain). On top of that, every "what did we spend in Q2?" becomes a **BI ticket**. They wanted an AI-native, open, cheaper alternative they control. ## ๐Ÿ› ๏ธ The solution approach โ€” one Gradio app 1. **Read any document (OCR + IDP).** A hybrid pipeline โ€” OCR โ†’ classify โ†’ extract โ†’ normalize โ†’ enrich (RAG) โ†’ validate โ†’ post / human-review โ€” reads orders, receipts, invoices, contracts and complex multi-layer forms, **even messy scans/photos**, into structured ERP records. 2. **Ask your ERP reports (ERP DocIQ chatbot).** Natural-language **NLQ โ†’ SQL**, analytics, summaries and **"why"** reasoning over a simulated retail ERP (vendors ยท POs ยท invoices ยท GL ยท inventory ยท returns). Every figure comes from **real SQL over the data** โ€” the model only *phrases* the answer, it never invents numbers. 3. **Automate the clicks (agentic browser).** A self-correcting, multi-step browser agent drives a portal (dashboard โ†’ Procurement โ†’ Create Order โ†’ read the complex form) โ€” selector-free and self-healing, replacing fragile RPA recorders. ## ๐Ÿค– Models used โ€” small (โ‰ค32B), the right one for each job Three labs, eight models, all under the cap. **MiniCPM is the core.** | Lab | Model | Params | Role in ERP-DocIQ | |---|---|---:|---| | **OpenBMB** | **MiniCPM-V-4.6** | 8B | **primary OCR + vision extraction** (reads messy/rotated/scanned docs โ†’ JSON) | | **OpenBMB** | MiniCPM-o-4.5 | 8B | alt omni VLM | | **OpenBMB** | **MiniCPM3-4B** | **4B** | **ERP reasoning ยท NLQโ†’SQL ยท summarization โ€” and the fine-tune target** | | **Cohere** | Aya-Vision-8B / 32B | 8โ€“32B | alt OCR / VQA backend (23 languages) | | **Cohere** | Command R7B | 7B | alt RAG ยท NLQ ยท grounded reasoning | | **Black Forest Labs** | FLUX.1 [dev]/[schnell] | 12B | image *generation* โ†’ synthetic OCR stress docs (honestly, not an OCR model) | A single ~8B MiniCPM-V powers OCR **and** vision extraction **and** the grounded chat phrasing โ€” so the whole product runs on small, swappable, open weights. `GET /api/models` lists which are live. ## ๐ŸŽฏ Fine-tuning โ€” adapting a small model to the ERP domain We fine-tune **OpenBMB MiniCPM3-4B (4B)** on an instruction dataset built from the ERP knowledgebase (`results/erp_sft.jsonl`): - **Production:** a LoRA (PEFT/TRL `SFTTrainer`) recipe โ€” `scripts/finetune_erp.py --backend hf`. - **Offline CPU demo (runs anywhere):** trains the ERP NLQ-routing head on the same data with a real train/test split โ€” **untrained 8.3% โ†’ fine-tuned 91.7% (+83 pts)**, 100% routed-SQL execution. See `results/erp_finetune_report.json`. ## ๐Ÿ“ˆ Benefits / value delivered - **Significantly lower inference cost** vs per-document RPA licensing (model routing + prompt caching). - **No vendor lock-in, data stays on-prem** โ€” open weights you can run, read and fine-tune. - **Self-service ERP analytics** (NLQ) deflects routine BI/report tickets off the queue. - **Reads documents classic OCR can't** โ€” MiniCPM-V cuts character error **~5.6ร— vs Tesseract** (CER **2.6%** vs 14.7%; see `results/ocr_quality_report.json`); field-exact **90.7%**. - **Honest, measured** โ€” every claim is backed by a published eval/quality/fine-tune report. ## โœ… How it meets the Build Small criteria - **โ‰ค32B params** โ€” every model is โ‰ค32B; the reasoning/NLQ engine + fine-tune target is **4B**. - **Best MiniCPM Build** โ€” MiniCPM-V (OCR + vision) and **MiniCPM3-4B** (reasoning/NLQ + fine-tuned) are the core of the experience; vision/omni variants qualify. - **Best Agent** โ€” multi-step, self-healing agentic browser automation + an IDP state graph. - **Ships as a Gradio Space** in the Build Small org; runs offline (deterministic ERP engine + sidecar OCR) and upgrades to hosted MiniCPM when keys are set. ## โ–ถ๏ธ Use it (tabs) - **Process a document** โ€” pick an OCR backend (`auto`, `minicpm`, `tesseract`) + a sample (try `extreme_receipt_photo` or `complex_invoice_messy`) or upload your own โ†’ multi-layer fields + KPIs. - **ERP DocIQ (chat)** โ€” ask *"Why did spend rise in Q2 2026?"*, *"Top vendors by spend"*, *"late-payment rate"* โ†’ grounded answer + SQL + the fine-tuning panel. - **Search (RAG)** โ€” semantic vendor-master retrieval. **Web Automation** โ€” multi-step browser flow. ## ๐Ÿ“ฆ Published results (`results/`) - `ocr_quality_report.json` โ€” OCR CER/WER + field accuracy per backend. - `erp_finetune_report.json` + `erp_sft.jsonl` โ€” fine-tune metrics + the instruction dataset. ## โš™๏ธ Configure (Space โ†’ Settings โ†’ Variables and secrets) - Variables: `MINICPM_BASE_URL=https://api.modelbest.cn/v1`, `MINICPM_MODEL=MiniCPM-V-4.6-Instruct` - Secret: `MINICPM_API_KEY=โ€ฆ` (OpenBMB/ModelBest). Tesseract ships via `packages.txt`. - **Runs fully offline without a key** โ€” ERP DocIQ uses its deterministic SQL engine and OCR falls back to the sidecar, so every tab works. ## ๐ŸŽฅ Demo & social - **Demo video:** https://youtu.be/mWs7eRVH_GM - **Social post:** https://www.linkedin.com/posts/kaniskamandal_huggingface-buildsmall-buildsmall-share-7472163579094401024-fjp9/