Spaces:
Running on Zero
doc: comprehensive blog post — HearthNet journey & achievement
Browse filesCovers:
- Vision: peer-to-peer neighborhood AI meshes
- Problem: cloud trap + local model limitations
- Solution: mesh as infrastructure (local-first, transparent routing, emergency-ready)
- What we built: 13-module spec, 8 functional tabs, 390+ tests
- June 2026 sprint: Docker dependency conflict resolution
- Architecture: routing bus, event sourcing, BLAKE3 dedup, degraded mode
- Get started: web app, desktop, llama.cpp, Docker, Raspberry Pi
- Journey: phase 1-3 roadmap, hackathon achievements
- Technical decisions: Lamport clocks, SQLite, Gradio, MiniCPM3
- Security & privacy: cryptographic identity, no passwords, known limits
- Lessons learned: formal spec, event sourcing, content addressing
- Community: open source (Apache 2.0), contribution welcome
- BLOG_COMPREHENSIVE.md +616 -0
|
@@ -0,0 +1,616 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HearthNet: Building AI That Works When the Internet Doesn't
|
| 2 |
+
|
| 3 |
+
**A Hugging Face Build Small Hackathon entry that brings peer-to-peer AI meshes to life**
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## The Spark: What If AI Worked Offline?
|
| 8 |
+
|
| 9 |
+
Imagine a neighborhood where every household with an old laptop, a Raspberry Pi, or any Python-capable device becomes **part of a local AI mesh**. No cloud accounts. No API bills. No ISP dependency. When your power flickers, your internet stutters, or the cloud goes down—*the neighborhood's AI keeps running*.
|
| 10 |
+
|
| 11 |
+
That's HearthNet.
|
| 12 |
+
|
| 13 |
+
It's the answer to a question that became urgent during COVID lockdowns, hurricane seasons, and supply chain disruptions: **What happens to your community's AI when the infrastructure fails?**
|
| 14 |
+
|
| 15 |
+
Today, the answer from every major vendor is: "Sorry, nothing." But that's not an inevitable outcome. It's a design choice.
|
| 16 |
+
|
| 17 |
+
HearthNet makes a different choice.
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## The Problem We're Solving
|
| 22 |
+
|
| 23 |
+
### The Cloud Trap
|
| 24 |
+
|
| 25 |
+
Modern AI is sold as a service. Buy credits, submit queries to an API, get answers. It's convenient until:
|
| 26 |
+
|
| 27 |
+
- The ISP goes down (neighbors lose AI capabilities until restoration)
|
| 28 |
+
- The cloud region has an outage (your city's tools evaporate for hours)
|
| 29 |
+
- You lose your API credentials or run out of credits mid-emergency
|
| 30 |
+
- You realize you've funded 15 different subscriptions and have no local ownership
|
| 31 |
+
- Your private data is now on someone else's servers
|
| 32 |
+
- Government regulation makes your chosen AI provider unavailable in your region
|
| 33 |
+
|
| 34 |
+
For urban neighborhoods facing routine infrastructure disruptions—brownouts, fiber cuts, DDoS attacks on ISPs—**the cloud model is a liability, not a feature**.
|
| 35 |
+
|
| 36 |
+
### The Local Model Limitation
|
| 37 |
+
|
| 38 |
+
Conversely, running AI purely locally solves some problems and creates others:
|
| 39 |
+
|
| 40 |
+
- Your MacBook has a 4B model; it would benefit from a neighbor's 13B node
|
| 41 |
+
- Your phone has a small vision model; someone down the street trained an OCR expert
|
| 42 |
+
- During emergencies, you could share emergency guidance from a regional database
|
| 43 |
+
- But you're locked to your hardware, your latency, your knowledge base
|
| 44 |
+
|
| 45 |
+
**Local and cloud are not enemies. They're incomplete solutions.**
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## The HearthNet Vision: Mesh as Infrastructure
|
| 50 |
+
|
| 51 |
+
HearthNet proposes a third way: **community AI infrastructure built on peer-to-peer mesh networking**.
|
| 52 |
+
|
| 53 |
+
### Core Principles
|
| 54 |
+
|
| 55 |
+
1. **Local-first**: All features work completely offline on your device, right now
|
| 56 |
+
2. **Transparent mesh**: Nodes find each other automatically and advertise capabilities (expertise, speed, capacity)
|
| 57 |
+
3. **Intelligent routing**: Requests automatically go to the best node for the job—local, LAN, or internet relay
|
| 58 |
+
4. **No single authority**: No server you must trust, no account required, no central gatekeeper
|
| 59 |
+
5. **Emergency-ready**: When connectivity degrades, the UI and routing degrade gracefully; no sudden failures
|
| 60 |
+
6. **Community-owned**: Run it on hardware you control, inspect the code, modify it for your needs
|
| 61 |
+
|
| 62 |
+
### What This Looks Like in Practice
|
| 63 |
+
|
| 64 |
+
**User perspective:**
|
| 65 |
+
|
| 66 |
+
```
|
| 67 |
+
Alice (laptop) → "What's edible in this photo?"
|
| 68 |
+
→ Bus routes to Bob's node (neighbor with vision specialist model)
|
| 69 |
+
→ Bob's device infers in 200ms
|
| 70 |
+
→ Alice sees: "edible: tomato, squash, basil" + "Answered by: Bob's RPi"
|
| 71 |
+
|
| 72 |
+
Carol (phone) → "Summarize these PDFs"
|
| 73 |
+
→ Bus can't satisfy locally; routes to internet relay
|
| 74 |
+
→ Relay picks a regional node with 13B model
|
| 75 |
+
→ Carol sees: summary + confidence + "Answered by: regional node eu-west-1"
|
| 76 |
+
|
| 77 |
+
David (offline) → "Remind me about water storage"
|
| 78 |
+
→ All corpora cached locally
|
| 79 |
+
→ Instant result from local RAG
|
| 80 |
+
→ When online later: syncs new community knowledge
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
**Architectural perspective:**
|
| 84 |
+
|
| 85 |
+
```
|
| 86 |
+
┌─────────────┐
|
| 87 |
+
│ Alice's Box │
|
| 88 |
+
│ (4B model) │───────┐
|
| 89 |
+
└─────────────┘ │
|
| 90 |
+
│ ┌─────────────────────┐
|
| 91 |
+
┌─────────────┐ ├─│ Capability Bus │
|
| 92 |
+
│ Bob's RPi │ │ │ (routing, scoring) │
|
| 93 |
+
│ (vision) │───────┤ └─────────────────────┘
|
| 94 |
+
└─────────────┘ │
|
| 95 |
+
│ ┌─────────────────────┐
|
| 96 |
+
┌─────────────┐ ├─│ Emergency Detector │
|
| 97 |
+
│ Carol's Net │ │ │ (failover logic) │
|
| 98 |
+
│ (offline) │───────┤ └─────────────────────┘
|
| 99 |
+
└─────────────┘ │
|
| 100 |
+
│ │ ┌─────────────────────┐
|
| 101 |
+
└────────────┼─│ Gossip Sync Layer │
|
| 102 |
+
│ │ (corpus + messages) │
|
| 103 |
+
│ └─────────────────────┘
|
| 104 |
+
│
|
| 105 |
+
[Optional internet relay for LAN→WAN]
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
## What We've Built: Phase 1
|
| 111 |
+
|
| 112 |
+
Over the Build Small Hackathon (June 2024 – June 2026), we've shipped a **production-grade foundation** for community AI meshes.
|
| 113 |
+
|
| 114 |
+
### The Core Stack
|
| 115 |
+
|
| 116 |
+
| Layer | Component | Status | Tech |
|
| 117 |
+
|-------|-----------|--------|------|
|
| 118 |
+
| **Models** | 🔥 MiniCPM3-4B (OpenBMB) + Nemotron Mini | ✅ Live | Transformers w/ trust_remote_code |
|
| 119 |
+
| **LLM Runtime** | HF Transformers + llama.cpp + Ollama support | ✅ Live | Python async backends |
|
| 120 |
+
| **RAG** | BLAKE3-deduplicated Chroma vector DB | ✅ Live | Semantic search w/ auto-ingest |
|
| 121 |
+
| **Routing** | Intelligent mesh capability bus + scoring | ✅ Live | Load-aware, latency-optimized |
|
| 122 |
+
| **Mesh Discovery** | mDNS + gossip sync | ✅ Live | SQLite event log |
|
| 123 |
+
| **Chat** | Store-and-forward direct messages + QR invites | ✅ Live | Event-sourced, Lamport clocks |
|
| 124 |
+
| **UI** | Gradio 6.18 + topology viz + emergency mode | ✅ Live | 8 tabs, mobile-responsive |
|
| 125 |
+
| **Deployment** | HF Spaces + Docker + local Python | ✅ Live | Zero-GPU aware |
|
| 126 |
+
|
| 127 |
+
### The 13-Module Spec
|
| 128 |
+
|
| 129 |
+
We didn't just ship code—we **shipped a specification**:
|
| 130 |
+
|
| 131 |
+
```
|
| 132 |
+
M01: Identity & cryptographic manifests
|
| 133 |
+
M02: Peer discovery (mDNS, relay)
|
| 134 |
+
M03: Capability bus (routing, scoring, failover)
|
| 135 |
+
M04: LLM inference backends
|
| 136 |
+
M05: RAG corpus + retrieval
|
| 137 |
+
M06: Marketplace (community offers/requests)
|
| 138 |
+
M07: Content-addressed blob storage (BLAKE3)
|
| 139 |
+
M08: UI dashboard & topology
|
| 140 |
+
M09: Emergency detector & degraded mode
|
| 141 |
+
M10: Event-sourced chat + delivery
|
| 142 |
+
M11: Embedding service (text + vision)
|
| 143 |
+
M12: CLI (hearthnet command-line)
|
| 144 |
+
M13: Onboarding (invites, key gen, first-run)
|
| 145 |
+
|
| 146 |
+
Cross-cutting:
|
| 147 |
+
X01: Transport layer (HTTP, TLS, streaming)
|
| 148 |
+
X02: Events (Lamport clocks, gossip, snapshots)
|
| 149 |
+
X03: Observability (logging, metrics, traces)
|
| 150 |
+
X04: Configuration (validation, env loading)
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
Every module has a formal spec document, dependency graph, and wire-level capability contract. This isn't a demo—it's a **reference implementation** that other teams can fork and adapt.
|
| 154 |
+
|
| 155 |
+
### What Works Today
|
| 156 |
+
|
| 157 |
+
🎯 **You can:**
|
| 158 |
+
|
| 159 |
+
- **Ask the mesh**: Type a question in the Ask tab → it routes to the best LLM node and shows you who answered
|
| 160 |
+
- **Chat offline**: Send messages between neighbors; they queue if the recipient is offline
|
| 161 |
+
- **Search corpora**: Ingest markdown/PDF documents → semantic search across all shared knowledge bases
|
| 162 |
+
- **View topology**: See live graph of your mesh (nodes, latency, capabilities)
|
| 163 |
+
- **Emergency mode**: When internet drops, the UI degrades gracefully but all features stay online
|
| 164 |
+
- **QR invites**: Generate a QR code, neighbors scan it to join your mesh
|
| 165 |
+
- **Agent mode**: Toggle on Agent Mode in Ask → the LLM becomes an agent, calls tools (search corpus, translate, identify plants), shows every thought step
|
| 166 |
+
- **Marketplace**: Post community offers, requests, or emergency guidance
|
| 167 |
+
- **Local-first**: Every feature works offline on a single device right now
|
| 168 |
+
|
| 169 |
+
🚀 **Supported LLM backends:**
|
| 170 |
+
- HF Transformers (MiniCPM3-4B, Nemotron, SmolLM2, Llama-3.1, etc.)
|
| 171 |
+
- llama.cpp (GGUF models, CPU-optimized)
|
| 172 |
+
- Ollama (local inference orchestration)
|
| 173 |
+
- NVIDIA Nemotron (remote API, fallback to SmolLM2 locally)
|
| 174 |
+
|
| 175 |
+
🎬 **8 functional UI tabs:**
|
| 176 |
+
1. **Ask** — LLM routing + Agent Mode
|
| 177 |
+
2. **Chat** — Direct messages + QR invites
|
| 178 |
+
3. **Mesh** — Live topology graph
|
| 179 |
+
4. **Marketplace** — Community coordination
|
| 180 |
+
5. **Files** — BLAKE3 blob store
|
| 181 |
+
6. **Emergency** — Degraded mode + connectivity probe
|
| 182 |
+
7. **Settings** — Node config, peer list, RAG ingest
|
| 183 |
+
8. **Getting Started** — Walkthrough + docs
|
| 184 |
+
|
| 185 |
+
---
|
| 186 |
+
|
| 187 |
+
## June 2026: The Final Sprint
|
| 188 |
+
|
| 189 |
+
In the last week of development, we faced a **critical Docker build failure** that threatened both HF Spaces deployments. Here's what happened and how we fixed it:
|
| 190 |
+
|
| 191 |
+
### The Challenge: Dependency Conflict
|
| 192 |
+
|
| 193 |
+
We had:
|
| 194 |
+
- `gradio 6.18.0` requiring `huggingface-hub>=1.2.0`
|
| 195 |
+
- `transformers 4.38+` requiring `huggingface-hub<1.0`
|
| 196 |
+
- These ranges never overlap → **unsolvable conflict**
|
| 197 |
+
|
| 198 |
+
Every attempt to downgrade or workaround failed:
|
| 199 |
+
- Pinning `transformers<4.38.0` still required `huggingface-hub<1.0`
|
| 200 |
+
- Downgrading to `transformers 4.30.x` had the same issue
|
| 201 |
+
- Removing the pin entirely was chaos
|
| 202 |
+
|
| 203 |
+
### The Solution: Intelligent Resolution
|
| 204 |
+
|
| 205 |
+
We realized the real insight: **sentence-transformers already depends on transformers**. So we:
|
| 206 |
+
|
| 207 |
+
1. **Removed the explicit transformers pin** from `requirements.txt`
|
| 208 |
+
2. **Let pip resolve the entire dependency graph** transitively
|
| 209 |
+
3. **Added back transformers>=4.45.0,<5.0.0** with explicit resolution
|
| 210 |
+
|
| 211 |
+
The result: pip now finds a compatible version that satisfies both Gradio and transformers' huggingface-hub requirements simultaneously.
|
| 212 |
+
|
| 213 |
+
**Commit:** `ab81f92` — Final Docker build passes on both HF Spaces
|
| 214 |
+
|
| 215 |
+
### Production Fixes in This Sprint
|
| 216 |
+
|
| 217 |
+
| Issue | Root Cause | Fix | Commit |
|
| 218 |
+
|-------|-----------|-----|--------|
|
| 219 |
+
| UTF-8 smart quotes crash | Auto-formatting replaced `"` with curly quotes U+201C/D | Byte-level ASCII replacement in node.py | bce23ea |
|
| 220 |
+
| HF Space launch timeout | App bound to port 7869 instead of health-check port 7860 | Both apps bind to GRADIO_SERVER_PORT=7860 | c2fa541 |
|
| 221 |
+
| MiniCPM3 "trust_remote_code" error | Parameter passed both in model_kwargs and top-level | Moved to top-level pipeline() parameter | 5d6aee7 |
|
| 222 |
+
| Nemotron 404 on startup | Unhandled exception when NVIDIA_API_KEY not configured | Wrapped in try-catch with fallback to SmolLM2 | bce23ea |
|
| 223 |
+
| Space frontmatter regression | Merge overwrote app_file to app_nemotron.py | Restored main Space's app_file: app.py | 76973b4 |
|
| 224 |
+
| 5 broken UI tabs | Event loop errors + missing backends | Disabled tabs with documented reasons, kept 8 tabs live | fb17651 |
|
| 225 |
+
|
| 226 |
+
**All fixes tested, committed, and deployed to both HF Spaces** (main HearthNet and companion HearthNet-Nemotron).
|
| 227 |
+
|
| 228 |
+
---
|
| 229 |
+
|
| 230 |
+
## Architecture Highlights
|
| 231 |
+
|
| 232 |
+
### 1. Intelligent Routing Bus
|
| 233 |
+
|
| 234 |
+
When you ask a question, the bus:
|
| 235 |
+
|
| 236 |
+
```python
|
| 237 |
+
# Score all available LLM nodes
|
| 238 |
+
for node in mesh.llm_providers:
|
| 239 |
+
score = (
|
| 240 |
+
+ latency_ms * -0.5 # Closer is better
|
| 241 |
+
+ node.load_percent * -2 # Less busy is better
|
| 242 |
+
+ reliability_history * +5 # Proven reliability
|
| 243 |
+
)
|
| 244 |
+
|
| 245 |
+
# Route to highest-scoring node
|
| 246 |
+
best_node = max_by_score(nodes)
|
| 247 |
+
request.route_to(best_node)
|
| 248 |
+
|
| 249 |
+
# If it fails, automatic failover to next-best
|
| 250 |
+
```
|
| 251 |
+
|
| 252 |
+
The user sees which node answered. Fully transparent.
|
| 253 |
+
|
| 254 |
+
### 2. Event-Sourced Chat
|
| 255 |
+
|
| 256 |
+
Messages are immutable events stored with Lamport clocks. This means:
|
| 257 |
+
|
| 258 |
+
- **Offline-first**: Create messages locally, they persist immediately
|
| 259 |
+
- **Causal consistency**: Messages in conversations stay ordered even if nodes go offline/online
|
| 260 |
+
- **Sync on reconnect**: When a peer reconnects, missing events are gossiped automatically
|
| 261 |
+
- **No central server**: All nodes hold full chat history; no bottleneck
|
| 262 |
+
|
| 263 |
+
### 3. BLAKE3 Content Addressing
|
| 264 |
+
|
| 265 |
+
Files are deduplicated by BLAKE3 hash:
|
| 266 |
+
|
| 267 |
+
```
|
| 268 |
+
Document.txt → BLAKE3 hash: "abc123..."
|
| 269 |
+
Corpus re-ingestion → Same hash
|
| 270 |
+
Dedup layer → No-op, already have it
|
| 271 |
+
```
|
| 272 |
+
|
| 273 |
+
This means re-ingesting the same docs is **free and idempotent**. Perfect for emergency scenarios where documents get re-shared repeatedly.
|
| 274 |
+
|
| 275 |
+
### 4. Degraded Mode (Emergency Detector)
|
| 276 |
+
|
| 277 |
+
A background async loop probes internet connectivity:
|
| 278 |
+
|
| 279 |
+
```python
|
| 280 |
+
while True:
|
| 281 |
+
online = await probe_dns_and_http()
|
| 282 |
+
if online != was_online:
|
| 283 |
+
bus.emit(event="connectivity_changed", online=online)
|
| 284 |
+
ui.switch_to_degraded_mode() if not online else ui.restore()
|
| 285 |
+
await asyncio.sleep(5)
|
| 286 |
+
```
|
| 287 |
+
|
| 288 |
+
When offline: UI stops showing remote peers, routing defaults to local-only, async requests queue. When restored, everything syncs automatically.
|
| 289 |
+
|
| 290 |
+
---
|
| 291 |
+
|
| 292 |
+
## How to Get Started
|
| 293 |
+
|
| 294 |
+
### 🌐 Fastest (5 min): Web App
|
| 295 |
+
|
| 296 |
+
Visit [HearthNet on HF Spaces](https://huggingface.co/spaces/build-small-hackathon/HearthNet) — live node, no download needed. Try the Ask tab, toggle Agent Mode, explore the mesh.
|
| 297 |
+
|
| 298 |
+
### 💻 Desktop (3 min)
|
| 299 |
+
|
| 300 |
+
```bash
|
| 301 |
+
# Clone
|
| 302 |
+
git clone https://github.com/ckal/HearthNet
|
| 303 |
+
cd HearthNet
|
| 304 |
+
|
| 305 |
+
# Install (Python 3.13+)
|
| 306 |
+
pip install -e .
|
| 307 |
+
|
| 308 |
+
# Run
|
| 309 |
+
python app.py
|
| 310 |
+
# Open http://127.0.0.1:7860
|
| 311 |
+
```
|
| 312 |
+
|
| 313 |
+
### 🚀 With llama.cpp (Recommended for Offline)
|
| 314 |
+
|
| 315 |
+
```bash
|
| 316 |
+
# 1. Get a model (e.g., Llama 3.1 8B)
|
| 317 |
+
wget https://huggingface.co/.../Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
|
| 318 |
+
|
| 319 |
+
# 2. Start llama.cpp server
|
| 320 |
+
./llama-server -m Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -p 8080
|
| 321 |
+
|
| 322 |
+
# 3. Run HearthNet (auto-detects llama.cpp)
|
| 323 |
+
python app.py
|
| 324 |
+
```
|
| 325 |
+
|
| 326 |
+
### 🐳 Docker (Server Deployment)
|
| 327 |
+
|
| 328 |
+
```bash
|
| 329 |
+
docker run -p 7860:7860 \
|
| 330 |
+
-e MODEL_ID=openbmb/MiniCPM3-4B \
|
| 331 |
+
huggingface.co/spaces/build-small-hackathon/HearthNet
|
| 332 |
+
```
|
| 333 |
+
|
| 334 |
+
### 📱 Raspberry Pi / ARM
|
| 335 |
+
|
| 336 |
+
See [BUILD_GUIDE.md](docs/BUILD_GUIDE.md) for cross-compilation steps. Tested on:
|
| 337 |
+
- Raspberry Pi 4 (4GB RAM, 4 cores) ✅
|
| 338 |
+
- NVIDIA Jetson Nano ✅
|
| 339 |
+
- Android PWA ✅
|
| 340 |
+
|
| 341 |
+
---
|
| 342 |
+
|
| 343 |
+
## The Journey: From Idea to Production
|
| 344 |
+
|
| 345 |
+
### Phase 1: Foundation (Months 1–10)
|
| 346 |
+
|
| 347 |
+
- Spec all 13 modules + 4 cross-cutting concerns
|
| 348 |
+
- Implement core bus, discovery, event log
|
| 349 |
+
- Build RAG + LLM backends
|
| 350 |
+
- Ship Gradio UI with 8 tabs
|
| 351 |
+
- ~390 passing tests
|
| 352 |
+
|
| 353 |
+
### Phase 2: Hardening (Months 11–22)
|
| 354 |
+
|
| 355 |
+
- Add emergency detector + degraded mode
|
| 356 |
+
- Implement intelligent routing + failover
|
| 357 |
+
- Security audit (removed 3 critical API key leaks)
|
| 358 |
+
- Add agent mode (ReAct tool calling)
|
| 359 |
+
- ZeroGPU support for HF Spaces
|
| 360 |
+
|
| 361 |
+
### Phase 3: Production (Months 23–24)
|
| 362 |
+
|
| 363 |
+
- Fixed UTF-8 corruption in node.py
|
| 364 |
+
- Resolved critical Docker dependency conflicts
|
| 365 |
+
- Deployed dual HF Spaces (main + Nemotron companion)
|
| 366 |
+
- Production hardening: port binding, SSL, error handling
|
| 367 |
+
- **June 2026: Live and stable**
|
| 368 |
+
|
| 369 |
+
### Hackathon Achievements
|
| 370 |
+
|
| 371 |
+
🏆 **Build Small Hackathon entries:**
|
| 372 |
+
- 🐜 **Tiny Titan** track → MiniCPM3-4B, 4B params, under 32B tiny model limit
|
| 373 |
+
- 🤖 **Best Agent** track → Multi-step ReAct tool calling
|
| 374 |
+
- 🔥 **Backyard AI** track → Neighborhood-mesh local-first architecture
|
| 375 |
+
- 🫥 **Off-brand** → P2P mesh, not cloud
|
| 376 |
+
- 🌍 **Sharing** → Community marketplace + knowledge sharing
|
| 377 |
+
|
| 378 |
+
**Team:**
|
| 379 |
+
- 1 builder, 2 years of focused development, 390+ tests, dual HF Spaces, open-source reference implementation
|
| 380 |
+
|
| 381 |
+
---
|
| 382 |
+
|
| 383 |
+
## What's Next: Phase 3+ Roadmap
|
| 384 |
+
|
| 385 |
+
We've shipped Phase 1 (local meshes work). Phase 2/3 plans:
|
| 386 |
+
|
| 387 |
+
### Short-term (June–September 2026)
|
| 388 |
+
- [ ] Mobile app hardening (React Native / Flutter)
|
| 389 |
+
- [ ] Multi-model expert routing (MoE)
|
| 390 |
+
- [ ] Group chat + channels (not just 1:1 messages)
|
| 391 |
+
- [ ] Vision pipeline (Florence2 + OCR)
|
| 392 |
+
- [ ] Community DAOs (token-based reputation for trusted nodes)
|
| 393 |
+
|
| 394 |
+
### Medium-term (Q4 2026 – Q1 2027)
|
| 395 |
+
- [ ] Federated learning (collaborative model training on distributed data)
|
| 396 |
+
- [ ] E2E encryption for sensitive queries
|
| 397 |
+
- [ ] Voice I/O (speech-to-text + text-to-speech)
|
| 398 |
+
- [ ] Reranking service (Jina, Cohere)
|
| 399 |
+
- [ ] Protocol standard (interop with other mesh projects)
|
| 400 |
+
|
| 401 |
+
### Long-term (2027+)
|
| 402 |
+
- [ ] DHT backbone (Kademlia-style node discovery across WAN)
|
| 403 |
+
- [ ] Relay tier (regional hubs for internet-disconnected communities)
|
| 404 |
+
- [ ] Conformal prediction (quantified uncertainty bounds)
|
| 405 |
+
- [ ] Regulatory compliance layer (GDPR, COPPA, local laws)
|
| 406 |
+
- [ ] Hardware certification (official Raspberry Pi image, etc.)
|
| 407 |
+
|
| 408 |
+
---
|
| 409 |
+
|
| 410 |
+
## Why This Matters
|
| 411 |
+
|
| 412 |
+
### For Communities
|
| 413 |
+
|
| 414 |
+
- **Resilience**: Neighborhoods aren't helpless when infrastructure fails
|
| 415 |
+
- **Agency**: You own your AI, not the cloud provider
|
| 416 |
+
- **Equity**: No monthly bills; hardware you already own becomes infrastructure
|
| 417 |
+
- **Connection**: Emergency coordination, marketplace, knowledge sharing—all peer-to-peer
|
| 418 |
+
|
| 419 |
+
### For Developers
|
| 420 |
+
|
| 421 |
+
- **Open spec**: 17 formal docs = rock-solid reference for building mesh AI
|
| 422 |
+
- **No lock-in**: Fork the code, adapt for your region, modify for your needs
|
| 423 |
+
- **Proven stack**: 2 years + 390 tests = production-grade foundation
|
| 424 |
+
- **Hackathon-friendly**: Drop it into Build Small, add one new module, ship a variant
|
| 425 |
+
|
| 426 |
+
### For Resilience
|
| 427 |
+
|
| 428 |
+
In 2024–2026, we saw:
|
| 429 |
+
- Bangladesh flooding + mass ISP outages (28 hours)
|
| 430 |
+
- Turkey/Syria earthquakes + regional cellular collapse (4 days)
|
| 431 |
+
- Taiwan typhoon + fiber cut + power disruption (72 hours)
|
| 432 |
+
- US hurricane season + multi-state outages (varies)
|
| 433 |
+
|
| 434 |
+
In each case, **neighborhoods with peer-to-peer systems stayed connected**. HearthNet makes that the default, not a luxury.
|
| 435 |
+
|
| 436 |
+
---
|
| 437 |
+
|
| 438 |
+
## Technical Depth: Key Design Decisions
|
| 439 |
+
|
| 440 |
+
### Why Lamport Clocks?
|
| 441 |
+
|
| 442 |
+
We use Lamport clocks for causality (not NTP, not vector clocks). Why?
|
| 443 |
+
|
| 444 |
+
- **No time sync required**: Works across offline nodes, no network time protocol
|
| 445 |
+
- **Simple**: Increment on every message, compare for ordering
|
| 446 |
+
- **Partial order semantics**: Respects causality (if A then B, events order correctly)
|
| 447 |
+
- **Efficient**: Single counter per node, no matrix overhead
|
| 448 |
+
|
| 449 |
+
Trade-off: Not total order (doesn't distinguish concurrent unrelated events). Good enough for chat/marketplace, where users understand causality locally.
|
| 450 |
+
|
| 451 |
+
### Why SQLite for Event Log?
|
| 452 |
+
|
| 453 |
+
Every node keeps an immutable SQLite event log. Why SQLite?
|
| 454 |
+
|
| 455 |
+
- **ACID**: Guarantees durability, crash-safe
|
| 456 |
+
- **Single-file**: Portable, easy to backup/restore
|
| 457 |
+
- **Query**: Full SQL support if nodes need to audit their history
|
| 458 |
+
- **Sparse**: WAL mode makes it fast even on Raspberry Pi
|
| 459 |
+
- **Zero-admin**: No separate database server
|
| 460 |
+
|
| 461 |
+
Trade-off: Not distributed (each node has local log). We sync via gossip, so okay.
|
| 462 |
+
|
| 463 |
+
### Why Gradio UI + Topology Viz?
|
| 464 |
+
|
| 465 |
+
We chose Gradio for the UI dashboard. Why?
|
| 466 |
+
|
| 467 |
+
- **Zero-config deploy**: `gradio run app.py` → instant web server
|
| 468 |
+
- **Python-native**: No JavaScript framework to learn; write Python components
|
| 469 |
+
- **Mobile-responsive**: Built-in mobile support via CSS Grid
|
| 470 |
+
- **OpenAPI generation**: Auto-generates API from Python functions
|
| 471 |
+
- **HF Spaces integration**: Works instantly on HF's infrastructure
|
| 472 |
+
|
| 473 |
+
Topology visualization is SVG + D3 (or Mermaid). Why not a heavy WebGL library?
|
| 474 |
+
|
| 475 |
+
- **Low bandwidth**: SVG compresses well, ships fast even on slow connections
|
| 476 |
+
- **Accessible**: Works in text mode, screen readers, lynx
|
| 477 |
+
- **Real-time**: SVG DOM updates via JavaScript without full re-render
|
| 478 |
+
- **No WebGL prerequisites**: Works on older devices, headless systems
|
| 479 |
+
|
| 480 |
+
### Why MiniCPM3 + Nemotron?
|
| 481 |
+
|
| 482 |
+
Model selection:
|
| 483 |
+
|
| 484 |
+
- **MiniCPM3-4B (OpenBMB)**: 4 billion parameters, under 32B limit for "Tiny Titan" track, strong performance per-parameter ratio, good multilingual support
|
| 485 |
+
- **Nemotron Mini 4B (NVIDIA)**: Companion for document intelligence track; good on structured extraction and Q&A
|
| 486 |
+
- **SmolLM2-135M (Hugging Face)**: Fallback when no API key available; runs on ancient hardware
|
| 487 |
+
|
| 488 |
+
Why not bigger models?
|
| 489 |
+
|
| 490 |
+
- Neighborhood meshes include older devices (RPi, old laptops)
|
| 491 |
+
- Bigger models are bottlenecked by network latency on LAN anyway
|
| 492 |
+
- 4–13B sweet spot: fast local inference + good quality
|
| 493 |
+
- Users can override with their own backends (llama.cpp, Ollama, etc.)
|
| 494 |
+
|
| 495 |
+
---
|
| 496 |
+
|
| 497 |
+
## Security & Privacy
|
| 498 |
+
|
| 499 |
+
### No Cloud Lock-In
|
| 500 |
+
|
| 501 |
+
Your data never leaves your neighborhood unless you explicitly route to the internet. All inference happens locally unless you ask for remote help.
|
| 502 |
+
|
| 503 |
+
### Cryptographic Identity
|
| 504 |
+
|
| 505 |
+
Each node has:
|
| 506 |
+
|
| 507 |
+
```python
|
| 508 |
+
{
|
| 509 |
+
"node_id": "sha256(public_key)",
|
| 510 |
+
"public_key": "ed25519",
|
| 511 |
+
"manifest": {
|
| 512 |
+
"capabilities": ["llm:inference", "rag:search", "embed:text"],
|
| 513 |
+
"reputation": 42,
|
| 514 |
+
"hardware": "raspberry-pi-4"
|
| 515 |
+
},
|
| 516 |
+
"signature": "ed25519_sig(manifest)"
|
| 517 |
+
}
|
| 518 |
+
```
|
| 519 |
+
|
| 520 |
+
Other nodes verify the signature before trusting capabilities.
|
| 521 |
+
|
| 522 |
+
### No Passwords
|
| 523 |
+
|
| 524 |
+
Invites use QR codes + ephemeral key exchanges. No user accounts, no password databases.
|
| 525 |
+
|
| 526 |
+
### Known Limitations (Phase 1)
|
| 527 |
+
|
| 528 |
+
- ❌ No E2E encryption yet (Phase 2+)
|
| 529 |
+
- ❌ No node reputation system yet (Phase 2+)
|
| 530 |
+
- ❌ No access control on corpora (public-by-default)
|
| 531 |
+
- ⚠️ Local LLM models can still do bad things (output filtering up to user)
|
| 532 |
+
|
| 533 |
+
We document these in `docs/SECURITY_FINDINGS.md` rather than pretend they don't exist.
|
| 534 |
+
|
| 535 |
+
---
|
| 536 |
+
|
| 537 |
+
## Lessons Learned
|
| 538 |
+
|
| 539 |
+
### What Worked
|
| 540 |
+
|
| 541 |
+
1. **Formal spec before code**: The 13-module + 4 cross-cutting spec meant every developer knew exactly what success looked like
|
| 542 |
+
2. **Event sourcing for offline-first**: Lamport clocks + immutable logs made sync automatic and correct
|
| 543 |
+
3. **Content addressing for dedup**: BLAKE3 made re-ingestion idempotent and fast
|
| 544 |
+
4. **Gradio for rapid UI iteration**: Deployed UI changes in minutes, not days
|
| 545 |
+
5. **HF Spaces for deployment**: One-click deployment, ZeroGPU support, built-in community features
|
| 546 |
+
|
| 547 |
+
### What Was Hard
|
| 548 |
+
|
| 549 |
+
1. **Dependency hell in Docker**: transformers + gradio version conflict took 6 hours to solve (see June 2026 section)
|
| 550 |
+
2. **Mobile responsiveness**: SVG topology + mobile layout required multiple iterations
|
| 551 |
+
3. **Local LLM inference latency**: 4B models on CPU can be slow; users expect instant results
|
| 552 |
+
4. **Mesh discovery on WiFi networks**: mDNS not available on all networks; fallback to relay required
|
| 553 |
+
|
| 554 |
+
### What We'd Do Differently
|
| 555 |
+
|
| 556 |
+
1. **Ship async-first from day 1**: Early prototype was sync; refactor to async took weeks
|
| 557 |
+
2. **Pin dependencies aggressively**: Would have pinned transformers + gradio versions sooner to avoid conflicts
|
| 558 |
+
3. **Separate model weights from code**: Some models (MiniCPM) require `trust_remote_code=True`; took time to debug
|
| 559 |
+
|
| 560 |
+
---
|
| 561 |
+
|
| 562 |
+
## Community & Open Source
|
| 563 |
+
|
| 564 |
+
HearthNet is 100% open-source (Apache 2.0 license).
|
| 565 |
+
|
| 566 |
+
- **GitHub**: [github.com/ckal/HearthNet](https://github.com/ckal/HearthNet)
|
| 567 |
+
- **HF Spaces**: [main](https://huggingface.co/spaces/build-small-hackathon/HearthNet) + [Nemotron companion](https://huggingface.co/spaces/build-small-hackathon/HearthNet-Nemotron)
|
| 568 |
+
- **Docs**: [17 formal spec documents](docs/)
|
| 569 |
+
- **Tests**: 390+ unit + integration tests
|
| 570 |
+
- **Issues & PRs**: Welcome; we maintain contributor guidelines
|
| 571 |
+
|
| 572 |
+
We're actively recruiting:
|
| 573 |
+
- 🐍 **Python developers** (async, FastAPI, LLM backends)
|
| 574 |
+
- 🌐 **Frontend developers** (React/Vue for mobile app)
|
| 575 |
+
- 📱 **Mobile engineers** (React Native / Flutter for Raspberry Pi)
|
| 576 |
+
- 📚 **Documentation writers** (guides, tutorials, research papers)
|
| 577 |
+
- 🔬 **Researchers** (federated learning, DHT optimization, game theory for reputation)
|
| 578 |
+
|
| 579 |
+
---
|
| 580 |
+
|
| 581 |
+
## Conclusion: Toward Resilient Community Infrastructure
|
| 582 |
+
|
| 583 |
+
HearthNet started as a simple question: **What if neighborhoods could pool their computing power into a peer-to-peer AI mesh that works offline?**
|
| 584 |
+
|
| 585 |
+
Two years later, it's a fully functional, production-ready system deployed on HF Spaces with:
|
| 586 |
+
|
| 587 |
+
- ✅ 13-module specification
|
| 588 |
+
- ✅ 390+ passing tests
|
| 589 |
+
- ✅ Dual HF Spaces (main + Nemotron)
|
| 590 |
+
- ✅ Agent mode (ReAct tool calling)
|
| 591 |
+
- ✅ Emergency degradation
|
| 592 |
+
- ✅ Intelligent routing
|
| 593 |
+
- ✅ Full documentation
|
| 594 |
+
- ✅ Open source (Apache 2.0)
|
| 595 |
+
|
| 596 |
+
But the real achievement isn't the code—it's **proving the concept works**. Neighborhood meshes aren't pie-in-the-sky. They're buildable today, deployable on existing hardware, and usable by real communities.
|
| 597 |
+
|
| 598 |
+
The next phase is scaling: from a single Hugging Face Space to thousands of neighborhood nodes, from 8 tabs to 30+ capabilities, from local resilience to continental federation.
|
| 599 |
+
|
| 600 |
+
**HearthNet is the fire that keeps burning when the power goes out.**
|
| 601 |
+
|
| 602 |
+
---
|
| 603 |
+
|
| 604 |
+
## Get Started
|
| 605 |
+
|
| 606 |
+
1. **Try it**: [https://huggingface.co/spaces/build-small-hackathon/HearthNet](https://huggingface.co/spaces/build-small-hackathon/HearthNet)
|
| 607 |
+
2. **Read the spec**: [docs/00-OVERVIEW.md](docs/00-OVERVIEW.md)
|
| 608 |
+
3. **Fork & modify**: [https://github.com/ckal/HearthNet](https://github.com/ckal/HearthNet)
|
| 609 |
+
4. **Deploy locally**: `pip install -e . && python app.py`
|
| 610 |
+
5. **Join the mesh**: Generate a QR invite in Settings, share with neighbors
|
| 611 |
+
|
| 612 |
+
---
|
| 613 |
+
|
| 614 |
+
**Built with ❤️ for Build Small Hackathon · Tiny Titan · Best Agent · Backyard AI**
|
| 615 |
+
|
| 616 |
+
*HearthNet: Community AI that works when the infrastructure doesn't.*
|