# Teaching small AI models to run the back office

Earlier this season we took part in **Build Small**, a hackathon run by **Hugging Face** and
**Gradio**. The premise was refreshingly contrarian: don't reach for the biggest model you can
rent — build something genuinely useful on models small enough to run on hardware you already own.
That single rule changed how we thought about the whole project, and we came away convinced that
"small" is a feature, not a compromise.

Here's what we built, why it matters, and a few things we learned along the way that might be new
to you too.

![The app, live on Hugging Face](https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ/resolve/main/blog/01-space.png)

## The business problem

Walk into the back office of almost any retailer and you'll find two things in abundance:
paperwork, and people waiting on answers. Invoices, purchase orders, receipts and contracts
arrive as scans, photos and PDFs that someone has to read and key in by hand. And every time a
manager asks "why did our spending jump last quarter?" or "which suppliers keep paying us late?",
that question turns into a ticket for the analytics team and a wait of hours or days.

Most companies patch this with heavy automation suites. They work, but they are expensive to
license, they break the moment a vendor changes their invoice layout or a web page moves a button,
and they lock you into one company's way of doing things. The intelligence lives somewhere else,
on someone else's servers, behind someone else's bill.

We wanted to know: could a small, open model — one light enough to run on a single machine — do
this work instead?

## The solution approach

We built an assistant for the retail back office that does three jobs, and we leaned on small
models for every one of them.

**It reads documents.** This was the first surprise. A modern *vision* model can look at a messy,
crumpled, rotated invoice the way a person does — not by guessing at fonts, but by actually
understanding the layout — and hand back clean, structured fields: vendor, dates, line items,
totals. The trick we'd encourage anyone to borrow is to *combine* two sources of truth. When a
document already carries a digital text layer, we use it directly because it's exact and free; when
it's a scan or a photo, the vision model reads the image. Fusing the two gives you the accuracy of
real text where it exists and the flexibility of a vision model where it doesn't. Classic optical
character recognition alone simply can't keep up with the messy real-world documents that pile up
in a back office.

![A messy scanned invoice read into clean fields](https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ/resolve/main/blog/02-read-document.png)

**It answers questions in plain English.** This is the part we're proudest of, and the idea worth
taking away. Instead of building yet another dashboard, we let people just *ask* — "why did spend
rise last quarter?", "who are our top vendors?", "what's our late-payment situation?" The model
translates the question into a database query, runs that query against the real data, and then
explains the result in ordinary language. The important detail is the order of those steps: the
numbers come from the database, not from the model's imagination. Because the model only narrates
figures it was handed, it can't quietly make one up — a property that matters enormously the moment
you trust a system with real money. We also kept a plain, rule-based path that answers the common
questions even with no model running at all, so the assistant is dependable first and clever second.

![Ask in plain English; the model writes a query and explains the real answer](https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ/resolve/main/blog/03-ask-erp.png)

**We taught a small model the language of the business.** Out of the box, a general model doesn't
know your vendors, your accounts, or how your team phrases a question. So we *fine-tuned* one — we
took a small open model and trained it further on examples drawn from this specific domain. Here's
the encouraging lesson: fine-tuning is no longer the preserve of giant labs with warehouses of
GPUs. Using a lightweight technique that only nudges a small slice of the model, we could adapt it
cheaply — and we built a version of the training that even runs on an ordinary laptop, so the
before-and-after improvement is something anyone can reproduce rather than take on faith.

**It automates the clicks, too.** For the tasks that still live in a browser, a small model drives
the page itself — reading what's on screen, deciding the next step, and acting — instead of
following a brittle, pre-recorded script that snaps the first time the layout shifts.

### How we shipped it — Gradio and the open-AI toolkit

The whole thing is wrapped in a **Gradio** app, which is what made it shareable in an afternoon:
Gradio turned our Python functions into a clean, hosted interface with tabs, file uploads and a
chat box, with no front-end work. Behind it, **OpenBMB's MiniCPM** family is the workhorse — one
compact model handles the document reading *and* the reasoning, reached through a simple,
standard interface that means we could swap in a local server without rewriting a line. We drew on
**Cohere's** open models for language tasks, and used **Black Forest Labs'** image model in a neat
sideways way — to *generate* deliberately nasty test documents (warped photos, faxes) so we could
prove the reader holds up under pressure. Dependable open libraries did the plumbing: one for
pulling text out of PDFs, another for traditional character recognition as a fallback, and small,
honest building blocks for storage and search. Every model we used sits comfortably under the
hackathon's size limit.

## The benefits

When the intelligence is small enough to run on your own hardware, three good things follow, and
none of them require a number to appreciate:

- **It stays yours.** Sensitive documents and financials never have to leave your own machines.
  Privacy stops being a policy you hope for and becomes a property of where the software runs.
- **It costs less to operate and never locks you in.** There's no per-robot meter ticking, and
  because everything is open and swappable, you can adopt a better model the week it ships instead
  of waiting for a vendor's roadmap.
- **It changes at your pace.** A new document layout or a new kind of question is a quick
  adjustment — often just more examples — not a support case with an outside company.
- **People get answers directly.** The folks closest to the work can ask their own questions and
  trust the replies, because every answer is grounded in the real data.

The bigger lesson of Build Small is the one we didn't expect to feel so strongly: the frontier
isn't the only place real work gets done. A handful of small, open models — chosen carefully and
pointed at a concrete problem — can quietly run a back office, on a machine that fits under a desk.

## Thanks

Our gratitude to the organizers, **Hugging Face** and **Gradio**, for running Build Small and for
making the case that smaller, local and open is a direction worth taking seriously — and to the
participating partners whose open models we leaned on, especially **OpenBMB**, **Cohere** and
**Black Forest Labs**.

## Links & references
- **Live app (try it):** https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ
- **Project files & code on the Space:** https://huggingface.co/spaces/build-small-hackathon/ERP-DocIQ/tree/main
- **Demo video:** https://youtu.be/mWs7eRVH_GM
- **Source code (GitHub):** https://github.com/agency-world/Project-Aperture
- **The hackathon — Build Small field guide:** https://huggingface.co/spaces/build-small-hackathon/field-guide