# Teaching small AI models to run the back office Earlier this season we took part in **Build Small**, a hackathon run by **Hugging Face** and **Gradio**. The premise was refreshingly contrarian: don't reach for the biggest model you can rent — build something genuinely useful on models small enough to run on hardware you already own. That single rule changed how we thought about the whole project, and we came away convinced that "small" is a feature, not a compromise. Here's what we built, why it matters, and a few things we learned along the way that might be new to you too. ![The app, live on Hugging Face](https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ/resolve/main/blog/01-space.png) ## The business problem Walk into the back office of almost any retailer and you'll find two things in abundance: paperwork, and people waiting on answers. Invoices, purchase orders, receipts and contracts arrive as scans, photos and PDFs that someone has to read and key in by hand. And every time a manager asks "why did our spending jump last quarter?" or "which suppliers keep paying us late?", that question turns into a ticket for the analytics team and a wait of hours or days. Most companies patch this with heavy automation suites. They work, but they are expensive to license, they break the moment a vendor changes their invoice layout or a web page moves a button, and they lock you into one company's way of doing things. The intelligence lives somewhere else, on someone else's servers, behind someone else's bill. We wanted to know: could a small, open model — one light enough to run on a single machine — do this work instead? ## The solution approach We built an assistant for the retail back office that does three jobs, and we leaned on small models for every one of them. **It reads documents.** This was the first surprise. A modern *vision* model can look at a messy, crumpled, rotated invoice the way a person does — not by guessing at fonts, but by actually understanding the layout — and hand back clean, structured fields: vendor, dates, line items, totals. The trick we'd encourage anyone to borrow is to *combine* two sources of truth. When a document already carries a digital text layer, we use it directly because it's exact and free; when it's a scan or a photo, the vision model reads the image. Fusing the two gives you the accuracy of real text where it exists and the flexibility of a vision model where it doesn't. Classic optical character recognition alone simply can't keep up with the messy real-world documents that pile up in a back office. ![A messy scanned invoice read into clean fields](https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ/resolve/main/blog/02-read-document.png) **It answers questions in plain English.** This is the part we're proudest of, and the idea worth taking away. Instead of building yet another dashboard, we let people just *ask* — "why did spend rise last quarter?", "who are our top vendors?", "what's our late-payment situation?" The model translates the question into a database query, runs that query against the real data, and then explains the result in ordinary language. The important detail is the order of those steps: the numbers come from the database, not from the model's imagination. Because the model only narrates figures it was handed, it can't quietly make one up — a property that matters enormously the moment you trust a system with real money. We also kept a plain, rule-based path that answers the common questions even with no model running at all, so the assistant is dependable first and clever second. ![Ask in plain English; the model writes a query and explains the real answer](https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ/resolve/main/blog/03-ask-erp.png) **We taught a small model the language of the business.** Out of the box, a general model doesn't know your vendors, your accounts, or how your team phrases a question. So we *fine-tuned* one — we took a small open model and trained it further on examples drawn from this specific domain. Here's the encouraging lesson: fine-tuning is no longer the preserve of giant labs with warehouses of GPUs. Using a lightweight technique that only nudges a small slice of the model, we could adapt it cheaply — and we built a version of the training that even runs on an ordinary laptop, so the before-and-after improvement is something anyone can reproduce rather than take on faith. **It automates the clicks, too.** For the tasks that still live in a browser, a small model drives the page itself — reading what's on screen, deciding the next step, and acting — instead of following a brittle, pre-recorded script that snaps the first time the layout shifts. ### How we shipped it — Gradio and the open-AI toolkit The whole thing is wrapped in a **Gradio** app, which is what made it shareable in an afternoon: Gradio turned our Python functions into a clean, hosted interface with tabs, file uploads and a chat box, with no front-end work. Behind it, **OpenBMB's MiniCPM** family is the workhorse — one compact model handles the document reading *and* the reasoning, reached through a simple, standard interface that means we could swap in a local server without rewriting a line. We drew on **Cohere's** open models for language tasks, and used **Black Forest Labs'** image model in a neat sideways way — to *generate* deliberately nasty test documents (warped photos, faxes) so we could prove the reader holds up under pressure. Dependable open libraries did the plumbing: one for pulling text out of PDFs, another for traditional character recognition as a fallback, and small, honest building blocks for storage and search. Every model we used sits comfortably under the hackathon's size limit. ## The benefits When the intelligence is small enough to run on your own hardware, three good things follow, and none of them require a number to appreciate: - **It stays yours.** Sensitive documents and financials never have to leave your own machines. Privacy stops being a policy you hope for and becomes a property of where the software runs. - **It costs less to operate and never locks you in.** There's no per-robot meter ticking, and because everything is open and swappable, you can adopt a better model the week it ships instead of waiting for a vendor's roadmap. - **It changes at your pace.** A new document layout or a new kind of question is a quick adjustment — often just more examples — not a support case with an outside company. - **People get answers directly.** The folks closest to the work can ask their own questions and trust the replies, because every answer is grounded in the real data. The bigger lesson of Build Small is the one we didn't expect to feel so strongly: the frontier isn't the only place real work gets done. A handful of small, open models — chosen carefully and pointed at a concrete problem — can quietly run a back office, on a machine that fits under a desk. ## Thanks Our gratitude to the organizers, **Hugging Face** and **Gradio**, for running Build Small and for making the case that smaller, local and open is a direction worth taking seriously — and to the participating partners whose open models we leaned on, especially **OpenBMB**, **Cohere** and **Black Forest Labs**. ## Links & references - **Live app (try it):** https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ - **Project files & code on the Space:** https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ/tree/main - **Demo video:** https://youtu.be/mWs7eRVH_GM - **Source code (GitHub):** https://github.com/agency-world/Project-Aperture - **The hackathon — Build Small field guide:** https://huggingface.co/spaces/build-small-hackathon/field-guide