Spaces:
Running on Zero
title: HearthNet-Nemotron
emoji: π¬
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 6.16.0
python_version: '3.10'
app_file: app_nemotron.py
pinned: true
short_description: Nemotron document intelligence β HearthNet companion
tags:
- nemotron
- nvidia
- document-intelligence
- off-brand
- tiny-titan
license: apache-2.0
π¬ HearthNet Β· Document Intelligence
Companion Space to π₯ HearthNet β the main community AI mesh. This Space extends the mesh with NVIDIA Nemotron-powered document intelligence: structured extraction, Q&A, summarisation, and one-click RAG ingest into any mesh node. When no
NVIDIA_API_KEYis set, falls back to SmolLM2-135M locally (no API key needed).
NVIDIA Nemotron Document Intelligence Β· Part of the HearthNet Mesh
Local-First Β· Peer-to-Peer Β· Offline-Capable Β· Emergency-Ready
Build Small Hackathon entry β Backyard AI track Β· π Tiny Titan Β· π€ Best Agent π«₯ press e or a to see the easter egg.
πΊ Demo video: HF Space Recording Β· Simple Show Demo
π£ Social post: tweet on x tweet on x
June 14 bug-fix release: 8 critical bugs fixed β seed corpus now actually ingested, node lifecycle corrected (
stop()previously silently no-oped), sticky session memory leak patched, corpus writes go to the right directory. See hackathon_final_step.md for the full analysis.
The Idea
What happens to your neighbourhood's AI when the power grid flickers, the ISP goes down, or the cloud API bill hits?
HearthNet answers: nothing changes. It keeps running.
Every household with a Raspberry Pi, an old laptop, or any device running Python becomes a node in a local AI mesh. Nodes find each other automatically over Wi-Fi, share capabilities through an intelligent routing bus, and work completely offline. When the internet is available, nodes automatically route requests to the best provider β whether local, nearby on LAN, or across the internet via relay. You see exactly which node answered.
- A neighbourhood of 10 homes gets 10Γ the AI capacity of any single device
- Offline-first: all features work without internet; internet is optional for mesh expansion
- Transparent routing: every Ask/Chat/RAG request shows which node served it (local or remote)
- Ask questions, share knowledge, send messages, coordinate emergency response β all offline
- No cloud account, no API key, no monthly bill β hardware you already own
Features
Agent Mode (ReAct tool calling)
Flip the Agent mode toggle in the Ask tab and the model stops being a chatbot and starts being an agent: it plans, calls real mesh tools over several steps, reads the results, and only then answers. Every step is shown live β Thought β Tool β Observation β Answer.
The agent's tools are bound to real capabilities already on the bus (no mock handlers):
search_corpus (RAG), list_corpora, translate, list_marketplace, route_expert (MoE), and identify_plant (vision). The loop uses a JSON action: protocol that works even on tiny models with no native function-calling β so a 4B MiniCPM or Nemotron Mini can drive the same agent as a 49B reasoner. There is also a fully in-browser WebLLM agent (WebGPU, zero server) for true offline tool use.
π‘ Try the browser agent: press
a(or just typehearthnet) anywhere on the dashboard to open the in-browser WebLLM agent showcase. Pressefor the live mesh/news ticker,Escto close.
π§ Intelligent Routing (NEW)
When you ask a question, the bus scores available LLM nodes by latency, load, and reliability. Your request goes to the best node right now β whether it's local, your neighbour's device, or a peer across the internet. Failover is automatic: if the preferred node can't help, the next-best provider takes over invisibly.
Routing Trace shows you exactly where your request was served:
- π Local: Answered by this device
- π Remote (node-id): Routed to a peer node (LAN or internet)
- β Error: No suitable provider found
π¬ Chat Over LAN & Internet
Direct 1:1 messages work completely offline on your Wi-Fi. Connect to the internet (via relay hub on HF Space) and chat with anyone in the mesh, regardless of network. No accounts, no passwordsβjust show them your QR code.
π Federated RAG
Share a corpus of documents with your community. Any node can search across all available corpora automatically, with results ranked by relevance. Works offline on local copies; syncs and queries remote corpora when internet is available.
π€ MOE Expert Routing
Nodes advertise their specialisations. Queries automatically route to the best experts in your mesh for better answers.
π¨ Emergency Mode
When connectivity drops, the UI automatically switches to degraded mode. Nodes keep working offline. When restored, changes sync. Perfect for neighbourhood coordination during outages.
Screenshots
Ask the Mesh![]() |
Live Peer Topology![]() |
Routing Trace![]() |
Community Marketplace![]() |
Direct Messages![]() |
Invite QR Code![]() |
Emergency Mode![]() |
All 8 Tabs![]() |
π¦ Downloads & Builds
Get HearthNet for your platform:
| Platform | Download | Format | Size | Notes |
|---|---|---|---|---|
| Android (PWA) | Web App | Web | ~5MB | Install from browser - no download needed |
| Android (Native) | app-debug.apk | APK | 3.56MB | Native Android app via USB or direct install |
| Windows Desktop | HearthNet.exe | EXE | 212MB | Standalone executable - download & run |
| Linux Desktop | python build/quickstart.py linux |
AppImage | ~120MB | Build on Linux or use script |
| macOS Desktop | python build/quickstart.py macos |
.app | ~200MB | Native macOS app bundle |
| Python (Any OS) | Source | Python | - | python app.py - full mesh node |
| Docker | Dockerfile | Container | 2GB | docker run -p 7860:7860 hearthnet:latest |
| Guides & Docs | BUILD_GUIDE.md | Markdown | - | How to build for each platform |
Recommended Paths:
- π Fastest (5 min): PWA Web App - instant, no install
- π» Desktop (3 min): Download EXE/AppImage and run
- π³ Server: Docker container deployment
- π See BUILD_GUIDE.md for detailed instructions
Quick Start
# Clone and install
git clone https://huggingface.co/spaces/build-small-hackathon/HearthNet
cd HearthNet
pip install -e ".[dev]"
# Run
python app.py # open http://127.0.0.1:7860
With llama.cpp (recommended β fast, offline)
# 1. Download a GGUF model
wget https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
# 2. Start llama.cpp server
./llama-server -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -p 8080
# 3. Run HearthNet (auto-detects llama.cpp on port 8080)
python app.py
Why llama.cpp?
- β‘ Fast inference on CPU (no GPU required)
- πΎ Runs the best models offline (8B params fits on 16GB RAM)
- π§ GGUF format is efficient and portable
- π No API key, no cloud, no latency
Alternative: Ollama
ollama pull llama3.2:3b # any Ollama model works
python app.py # auto-detects Ollama
On Android (PWA - Recommended)
# 1. Start HearthNet on your computer (Windows, Mac, or Linux)
python app.py
# 2. Find your computer IP address
# Windows: ipconfig | grep IPv4
# Mac/Linux: ifconfig | grep "inet " | grep -v 127
# 3. Open on Android device in Chrome/Firefox:
# http://<YOUR_IP>:7860
# 4. Tap menu β "Install app" or "Add to Home screen"
π± Full Android Setup Guide: ANDROID_DEPLOYMENT_GUIDE.md
- β PWA (instant, no build)
- π§ Native APK (optional, advanced)
Connect your local node to the live HF Space
# Get an invite code from the Space Settings tab
# Then redeem it locally:
python -m hearthnet.cli invite redeem \
"hnvite://v1/hf-space-1c95381d?host=build-small-hackathon-hearthnet.hf.space&port=443&transport=https&level=member"
python -m hearthnet.cli peers # Space node should appear
How It Works
Capability Bus
Every feature is a named capability on the bus. Any node can call any capability; the bus routes to the best available provider automatically:
# LLM inference β routes to fastest/best node in the mesh
result = await bus.call("llm.chat", (1, 0), {
"input": {"messages": [{"role": "user", "content": "What plants grow near water?"}]}
})
# RAG β routes to the node holding that corpus
result = await bus.call("rag.query", (1, 0), {
"params": {"corpus": "community"},
"input": {"query": "emergency water purification", "k": 3}
})
# Or from the CLI β no Python needed
python -m hearthnet.cli call llm.chat 1 0 '{"input":{"messages":[{"role":"user","content":"Hello!"}]}}'
python -m hearthnet.cli capabilities # list all available capabilities across mesh
Zero-Config Discovery
# Device 1 β already running
python app.py
# Device 2 β same Wi-Fi
python app.py
# Both nodes see each other in ~5 seconds (mDNS + UDP broadcast)
# No IP addresses, no router config, no firewall rules
Intelligent Routing & Failover
When you ask a question:
- Scoring: Bus evaluates all LLM providers by latency, load, reliability, and local preference
- Selection: Request goes to the best provider
- Failover: If that node can't help (error or unavailable), automatically try the next-best alternative
- Tracing: Result includes
_routed_viashowing which node served it
# Node A has no LLM backend (would normally fail)
# Node B has llama.cpp running
# You ask Node A a question β Node A routes to Node B β B answers β A shows you the result
# Tracing shows: "_routed_via": "node-b-id"
result = await bus.call("llm.chat", (1, 0), {...})
# result includes "_routed_via": "node-b-id" β Shows the true origin
MoE Expert Routing
Nodes advertise specialisations. Queries route to the best expert automatically:
# A medical Raspberry Pi registers itself:
await bus.call("moe.register", (1, 0), {
"input": {
"expert_id": "model:medical-pi",
"topic_tags": ["first_aid", "medication", "triage"],
"confidence_score": 0.90,
}
})
# Any node's medical query now routes there:
result = await bus.call("moe.route", (1, 0), {
"input": {"query": "emergency first aid for burns", "top_k": 3}
})
# β {"candidates": [{"expert_id": "model:medical-pi", "score": 0.94}]}
Offline Model Distribution
A node without internet pulls model weights from a LAN peer, chunk by chunk:
models = await bus.call("model.list", (1, 0), {"input": {}})
job = await bus.call("model.pull", (1, 0), {
"input": {"model_name": "llama3.2:3b", "source_node": "peer-id"}
})
# Progress via model.status; BLAKE3 content-addressed so never duplicated
What Makes This "Tiny"
The HF Space demo uses MiniCPM3-4B β 4B params, strong instruction following, under the 32B Tiny Titan limit. Set MODEL_ID=HuggingFaceTB/SmolLM2-135M-Instruct to run 135M ultra-light mode on Pi-class devices.
For local installs, any GGUF model works (1Bβ8B for significantly better quality). The architecture is model-agnostic; the routing layer handles the rest.
Real semantic RAG, not a toy: when sentence-transformers is installed the
embedding service loads BAAI/bge-small-en-v1.5 (~130 MB, CPU-friendly) so
rag.query performs genuine semantic retrieval. Without it, the service falls
back to a deterministic hash embedder and says so β no silent fakery.
Why this qualifies for Tiny Titan: A full mesh of 10 Raspberry Pi 4 nodes (4 GB RAM each) can run:
- 135M model locally per node (always available, zero latency)
- Load-balanced routing for larger models across the mesh
- Full offline capability: discovery, RAG, chat, marketplace β no internet needed
Local AI Backends
No mocks. No fake responses. Real local inference only.
HearthNet prioritizes local, private models. Cloud backends are opt-in only (env vars).
Local Backends (Primary)
| Backend | Activation | Notes |
|---|---|---|
| llama.cpp (recommended) | Start server on port 8080 + auto-detect | Any GGUF model; fastest on CPU |
| Ollama | ollama pull llama3.2:3b + auto-detect |
70+ models, easy management |
| HF Transformers | Default on HF Space (no config needed) | MiniCPM3-4B (override with MODEL_ID) |
| OpenBMB / MiniCPM | MINICPM_URL env var (local server) |
Local-first, OpenAI-compatible API |
Optional Cloud Backends (Opt-In via Env Vars)
| Backend | Activation | Notes |
|---|---|---|
| NVIDIA Nemotron | NVIDIA_API_KEY env var |
For RTX nodes: nemotron-70b/mini-4b |
| Modal | MODAL_ENDPOINT env var |
Serverless GPU inference |
| OpenAI API | OPENAI_API_KEY env var |
Fallback only; not recommended for offline mesh |
All configured backends are registered on the llm.chat capability. The routing bus selects the best backend based on:
- Local first: llama.cpp, Ollama, HF Transformers always preferred
- Load & latency: If you have multiple local nodes, asks route to the least-busy one
- Failover: If local is unavailable and you have internet, remote nodes or cloud backends are tried
If no suitable backend is available: clear error message returned. Never silent, never fabricated.
Security
- Ed25519 β all node manifests and invite links signed with PyNaCl
- X3DH + Double Ratchet β end-to-end encrypted chat (M23)
- BLAKE3 β content-addressed file blobs (tamper-evident)
- localhost-only CLI β all admin HTTP restricted to 127.0.0.1
- Capability token
expclaim β checked inbus.handle_call()before routing; expired tokens receive{"error": "token_expired"}without hitting any handler - Token signature verification β Ed25519 signature checking is implemented in
AuthService(auth.token.verify) and is available on the bus. The HTTP transport (/bus/v1/call) currently passes tokens tohandle_call()where expiry is enforced; full per-request signature verification on inbound HTTP calls is a planned hardening step. - Bandit HIGH findings: 0 (verified in CI)
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gradio UI (8 tabs) β
β Ask Β· Chat Β· Mesh Β· Marketplace Β· Files Β· Emergency Β· β
β Settings Β· Getting Started β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β Capability Bus (M03) β
β route Β· score Β· trace β
ββββββ¬βββββββ¬βββββββ¬βββββββ
β β β
ββββββββββββΌβ ββββΌββββ ββΌβββββββββββ ββββββββββββββ
β LLM (M04) β β RAG β β MoE (M27) β β Chat (M10) β
βllama.cpp β β(M05) β β Expert β β Marketplaceβ
β Ollama β βSQLiteβ β Registry β β (M06) Filesβ
βHF Transfm β βEmbed β βββββββββββββ ββββββββββββββ
βββββββ¬ββββββ ββββ¬ββββ
βββββββ¬βββββββ
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β Transport (X01) Β· Discovery (M02 mDNS/UDP) β
β Events (X02 SQLite/Lamport) Β· E2E Encrypt (M23) β
β Identity (M01 Ed25519) Β· Observability (X03) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Module Reference
Phase 1 β Core (M01βM13, X01βX04) Β· 17 modules
| Module | Description | Status |
|---|---|---|
| M01 | Node identity (Ed25519, manifests, canonical JSON) | β |
| M02 | Peer discovery (mDNS, UDP broadcast, PeerRegistry) | β |
| M03 | Capability bus (schema validation, routing, tracing) | β |
| M04 | LLM service (llama.cpp, Ollama, HF Transformers, cloud fallback) | β |
| M05 | RAG (chunker, SQLite/ChromaDB vector store, IngestPipeline, federated scatter-gather) | β |
| M06 | Marketplace (event-sourced, Lamport-clocked posts) | β |
| M07 | File blobs (BLAKE3 hash, content-addressed, chunked transfer) | β |
| M08 | Gradio UI (8 tabs: Ask, Chat, Mesh, Marketplace, Files, Emergency, Settings, Getting Started) | β |
| M09 | Emergency mode (async connectivity probe, auto-degrade on offline) | β |
| M10 | Chat (event-backed 1:1 direct messaging, Lamport delivery order) | β |
| M11 | Embeddings (embed.text, SentenceTransformer bge-small-en-v1.5, SimpleHashBackend fallback, batch support) |
β |
| M12 | CLI (click, ask / peers / marketplace / call / capabilities) | β |
| M13 | Onboarding (invite QR, hnvite:// deep links, PyNaCl signing) | β |
| X01 | Transport (FastAPI server, 12 REST endpoints, TLS) | β |
| X02 | Events (SQLite, Lamport clocks, ReplayEngine, snapshots) | β |
| X03 | Observability (structured JSON logging, metrics, distributed tracing) | β |
| X04 | Config (typed frozen dataclasses, TOML, env overlay) | β |
Phase 2 β Advanced (M14βM25, X05βX07) Β· 18 modules
| Module | Description | Status |
|---|---|---|
| M14 | Federation (bilateral cross-community trust, manifest signing) | β |
| M15 | Relay tier (NAT traversal, keepalive, push token registry) | β |
| M16 | Capability tokens (Ed25519 JWS-style hntoken://v1/ format) | β |
| M17 | OCR (Tesseract + TrOCR backends, graceful degradation) | β |
| M18 | Translation (NLLB backend, LRU cache, 4000-char limit) | β |
| M19 | STT/TTS (Whisper local STT, Edge TTS synthesis) | β |
| M20 | Vision (Florence-2 image describe, structured output) | β |
| M21 | Tool calls (LLM mid-generation bus dispatch, ToolExecutor, plant_identify) | β |
| M22 | Mobile native (Flutter contract, hnapp:// invites, push authority) | β |
| M23 | E2E encryption (X3DH key agreement, Double Ratchet, AEAD envelope) | β |
| M24 | Reranking (BGE + CrossEncoder backends, 100-doc limit) | β |
| M25 | Group chat (ThreadService, ThreadViewStore, event-sourced threads) | β |
| X05 | DHT (Kademlia node, 256-bucket routing table, bootstrap) | β |
| X06 | WebSocket upgrade (bidirectional pubsub, WsClient) | β |
| X07 | Federated metrics (NodeMetricsTick, MetricsAggregator, OTLP) | β |
Phase 3 β Experimental (M26βM31, X08βX09) Β· feature-flag gated
| Module | Description | Status |
|---|---|---|
| M26 | Distributed inference (ShardDescriptor, PipelineOrchestrator, model.pull) | registered |
| M27 | MoE routing (ExpertRegistry, MoeRouter, moe.route/register/list) | registered |
| M28 | Federated learning (FedLearnCoordinator, RoundManifest, gradient aggregation) | experimental |
| M29 | LoRa beacons (32-byte frames, 868 MHz offline signaling) | experimental |
| M30 | Evidence graph / EBKH (ClaimStore, attestations, disputes) | experimental |
| M31 | Civil defense NRW (AuditChain, role certs, structured alerts) | experimental |
| X08 | Tensor transport (chunked binary tensor streaming) | experimental |
| X09 | Conformance suite (protocol test harness) | experimental |
π§ͺ Testing & Coverage
Comprehensive Test Suite: 390+ Tests
HearthNet includes rigorous tests for all core capabilities:
| Suite | Count | Coverage |
|---|---|---|
| Phase 1 Core (M01-M13, X01-X04) | 120+ | Bus routing, discovery, identity, emergency mode |
| Intelligent Routing (NEW) | 8+ | Failover, latency scoring, tracing, stamping |
| Chat & Messaging (M10) | 35+ | Direct messages, cross-node delivery, event-sourced |
| RAG & Search (M05) | 25+ | Local corpus, semantic search, federated queries |
| LLM Service (M04) | 20+ | Multiple backends (llama.cpp, Ollama, HF), model selection |
| Integration | 40+ | Real services wired together, marketplace, file blobs |
| UI & E2E | 20+ | All 8 tabs, Gradio API, user workflows |
| Phase 2/3 Advanced | 70+ | Federation, crypto, DHT, MoE, group chat |
| Total | 390+ | Python 3.13 Β· pytest-asyncio Β· Full async test suite |
Run Tests Locally
# Full suite
python -m pytest tests/ -v
# Specific module (e.g., routing tests)
python -m pytest tests/test_bus_failover.py -v
# With coverage report
python -m pytest tests/ --cov=hearthnet --cov-report=term-missing
# Skip slow E2E tests
python -m pytest tests/ --ignore=tests/test_e2e_user_stories.py -v
All tests pass on Python 3.13 with pytest-asyncio.
Focus areas:
- β Well-covered: Bus routing, identity, chat, discovery, emergency mode
- π― Strong: LLM service, RAG pipeline, marketplace, event system
- π Expanding: Transport layer, UI advanced features, observability metrics
π Deployment & Source
| Resource | Purpose |
|---|---|
| HF Space (Primary) | https://huggingface.co/spaces/build-small-hackathon/HearthNet |
| GitHub (Mirror/CI) | https://github.com/ckal/HearthNet |
Deployment Architecture:
- π‘ HF Space: Live demo, PWA app, binary downloads (exe, apk, etc.)
- π GitHub: Source repository, CI/CD, releases, issue tracking
- π Sync: Changes push to both simultaneously
Build Artifacts Available:
- Windows EXE: dist/HearthNet.exe (212 MB)
- Android APK: build/android/.../app-debug.apk (3.56 MB)
- Build scripts: BUILD_GUIDE.md for EXE, AppImage, .app, Docker
Contributing & Docs
| Resource | Link |
|---|---|
| Architecture | ARCHITECTURE.md |
| System overview | docs/00-OVERVIEW.md |
| Capability contract | docs/CAPABILITY_CONTRACT.md |
| Roadmap | docs/roadmap.md |
| Task tracker | tasks.md |
| Phase 2+3 specs | docs/p2_p3/ |
Hackathon Entry
Track: ποΈ Backyard AI (Practical)
Why HearthNet wins:
π Tiny Titan: Runs on MiniCPM3-4B (4B params, under 32B limit). Ultra-light mode with SmolLM2-135M (135M) via MODEL_ID env var for Raspberry Pi and edge devices.
π€ Best Agent: Capability bus + intelligent routing = distributed agentic system. Nodes score, select, and failover to the best provider autonomously. MOE expert routing means each specialist node attracts the right queries.
Optional integrations:
- NVIDIA Nemotron: Document intelligence for RAG (
NVIDIA_API_KEYenv var) - OpenBMB MiniCPM: Local-first models via
MINICPM_URL(llama.cpp-compatible) - Modal: Serverless GPU as remote node (
MODAL_ENDPOINTenv var)
Links
| π€ HF Space (Live) | https://huggingface.co/spaces/build-small-hackathon/HearthNet |
| π GitHub (Source) | https://github.com/ckal/HearthNet |
| π Architecture | docs/ARCHITECTURE.md |
| π§ͺ Tests | python -m pytest tests/ -v |
Built with open source models and the belief that communities should own their AI.
Small model. Big mesh. Real resilience.







