HearthNet-Nemotron / ARCHITECTURE.md
GitHub Actions
feat: Phase 3 types/constants, ARCHITECTURE.md, HF connect script, tasks update
d796d00
|
Raw
History Blame
20.2 kB
# HearthNet β€” Architecture Reference
> **Local-first community AI mesh.** Each participant runs a node on their own hardware.
> Nodes discover each other automatically and share AI capabilities, files, and community
> posts β€” no central server required.
---
## High-Level Concept
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Community Mesh (LAN / overlay) β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” mDNS/UDP β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” mDNS/UDP β”‚
β”‚ β”‚ Node A │◄───────────────►│ Node B │◄────────────── β”‚
β”‚ β”‚ (anchor) β”‚ β”‚ (hearth) β”‚ β”‚
β”‚ β”‚ β”‚ capability β”‚ β”‚ β”‚
β”‚ β”‚ CapBus ◄───┼─────bus.call───►─► CapBus β”‚ β”‚
β”‚ β”‚ LLM svc β”‚ β”‚ RAG svc β”‚ β”‚
β”‚ β”‚ RAG svc β”‚ β”‚ OCR svc β”‚ β”‚
β”‚ β”‚ Gradio UI β”‚ β”‚ Gradio UI β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
HearthNet is structured around three ideas:
1. **Node** β€” a Python process on someone's hardware (Raspberry Pi, laptop, server).
2. **CapabilityBus** β€” a message bus where services register *capabilities* (e.g. `llm.chat@1.0`). Any code, local or remote, calls a capability by name.
3. **Services** β€” pure-Python objects that handle capability calls. A node installs whichever services its hardware supports.
---
## Module Map
### Phase 1 β€” Foundation
| Module | Location | What it does |
|--------|----------|-------------|
| **M01 Identity** | `hearthnet/identity/` | Ed25519 node keys, community manifests, invite tokens |
| **M02 Discovery** | `hearthnet/discovery/` | mDNS + UDP multicast peer discovery |
| **M03 Bus** | `hearthnet/bus/` | Capability router, health ring buffer, trust levels |
| **M04 LLM** | `hearthnet/services/llm/` | Local model backends (Ollama, llama.cpp, LM Studio, HF, Anthropic) |
| **M05 RAG** | `hearthnet/services/rag/` | Chunker β†’ embedder β†’ Chroma vector store + retrieval |
| **M06 Marketplace** | `hearthnet/services/marketplace/` | Event-sourced community board (posts, offers, requests) |
| **M07 Blobs** | `hearthnet/blobs/` | BLAKE3 content-addressed file store with chunked transfer |
| **M08 UI** | `hearthnet/ui/` | Gradio 8-tab interface + themes + topology component |
| **M09 Emergency** | `hearthnet/emergency/` | Async probe loop β†’ emergency state machine |
| **M10 Chat** | `hearthnet/services/chat/` | Event-backed direct messages between nodes |
| **M11 Embedding** | `hearthnet/services/embedding/` | Sentence-transformer embeddings (BAAI/bge-small) |
| **M12 CLI** | `hearthnet/cli.py` | Click CLI: run, call, log, rag, invite, version, … |
| **M13 Onboarding** | `hearthnet/ui/onboarding.py` | Invite QR flow + first-run wizard |
### Phase 2 β€” Resilience & Rich Services
| Module | Location | What it does |
|--------|----------|-------------|
| **M14 Federation** | `hearthnet/federation/` | Cross-community node manifests + signed bridges |
| **M15 Relay** | `hearthnet/relay/` | Public-IP relay tier for NAT traversal |
| **M16 Tokens** | `hearthnet/identity/tokens.py` | AuthToken / CapabilityToken scoped access |
| **M17 OCR** | `hearthnet/services/ocr/` | Tesseract / TrOCR text extraction |
| **M18 Translation** | `hearthnet/services/translation/` | NLLB-200 local translation |
| **M19 STT/TTS** | `hearthnet/services/stt_tts/` | Whisper STT + Coqui/pyttsx3 TTS |
| **M20 Vision** | `hearthnet/services/vision/` | Florence-2 image captioning / VQA |
| **M21 Tool Calls** | `hearthnet/services/tools/` | LLM tool-call executor (plant ID, search, …) |
| **M22 Mobile** | `hearthnet/ui/mobile/` | PWA manifest + service worker for home-screen install |
| **M23 E2E Encryption** | `hearthnet/crypto/` | X25519 ECDH + ChaCha20-Poly1305 channel encryption |
| **M24 Rerank** | `hearthnet/services/rerank/` | Cross-encoder reranking for RAG results |
| **M25 Group Chat** | `hearthnet/services/group_chat/` | Multi-party room-based chat |
### Phase 3 β€” Experimental (opt-in via `config.toml`)
| Module | Location | Flag | What it does |
|--------|----------|------|-------------|
| **M26 Distributed Inference** | `hearthnet/distributed_inference/` | `research.distributed_inference` | Layer-shard a 7B model across LAN nodes (Petals-style) |
| **M27 MoE Routing** | `hearthnet/moe/` | `research.moe_routing` | Route queries to best expert (model/service/human) via learned scorer |
| **M28 FedLearn** | `hearthnet/fedlearn/` | `research.fedlearn` | FedAvg LoRA fine-tuning without sharing raw data |
| **M29 LoRa Beacons** | `hearthnet/lora/` | `research.lora_beacons` | 868 MHz offline "I'm alive" heartbeats via USB LoRa stick |
| **M30 Evidence Graph** | `hearthnet/evidence/` | `research.evidence` | Claim β†’ attest β†’ dispute provenance graph + EBKH bridge |
| **M31 Civil Defense** | `hearthnet/civdef/` | `research.civil_defense` | THW/DRK/KatS alert pipeline with role certs + audit chain |
| **M32 Protocol Standard** | `hearthnet/services/protocol/` | on by default | Protocol version list + conformance report |
### Cross-Cutting
| ID | Location | What it does |
|----|----------|-------------|
| **X01 Transport** | `hearthnet/transport/` | HTTP/SSE client, backpressure, rate limiting, frame types |
| **X02 Events** | `hearthnet/events/` | SQLite Lamport event log + gossip sync |
| **X03 Observability** | `hearthnet/observability/` | Tracing, metrics, Doctor health checks, TrackioExporter |
| **X04 Config** | `hearthnet/config.py` | Typed TOML config + ResearchConfig feature flags |
| **X05 DHT** | `hearthnet/dht/` | Kademlia-inspired DHT for cross-LAN peer lookup |
| **X06 WebSocket** | `hearthnet/transport/` | WebSocket pubsub (StateBus β†’ live UI push) |
| **X07 Federated Metrics** | `hearthnet/observability/` | Opt-in aggregate mesh health metrics |
| **X08 Tensor Transport** | `hearthnet/transport/tensor/` | Chunked tensor stream for M26 distributed inference |
| **X09 Conformance Suite** | `hearthnet/conformance/` | 21-check black-box conformance runner |
---
## Composition Root
`HearthNode` in [hearthnet/node.py](hearthnet/node.py) is the single composition root.
```python
node = HearthNode(
node_id="my-node",
display_name="Alice's Pi",
community_id="ed25519:abc123",
)
node.install_services(corpus="general")
await node.start()
```
`install_services()` registers all services the local hardware supports into the bus. Heavy optional dependencies (torch, chromadb, etc.) are imported lazily and fail gracefully β€” a node with no GPU still works, it just can't answer GPU-only capabilities.
---
## Capability Bus
```
Caller ──── bus.call(name, version, body) ──────────┐
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ CapabilityBus β”‚
β”‚ β”‚
β”‚ Registry β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ local route │─┼──► Service.handle()
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚ β”‚ remote route│─┼──► HTTP POST /bus/v1/call
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ HealthMonitor β”‚
β”‚ TrustFilter β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
- **Local route** β€” service is installed on this node β†’ direct Python call.
- **Remote route** β€” capability is advertised by a peer β†’ HTTP POST to that peer's transport.
- **Version negotiation** β€” capabilities are registered with a `(major, minor)` version; the bus picks the highest compatible version.
- **Health monitoring** β€” each service's response times are tracked in a ring buffer; unhealthy services are quarantined for `BUS_QUARANTINE_SECONDS`.
---
## Data Flow: LLM Chat Request
```
User types in Gradio UI
β”‚
β–Ό
app.py (Gradio event handler)
β”‚ bus.call("llm.chat@1.0", body)
β–Ό
CapabilityBus.call()
β”‚
β”œβ”€ local LlmService found?
β”‚ β”‚ yes β†’ LlmService.handle() β†’ backend.chat() β†’ yield Token
β”‚ β”‚
└─ no local service
β”‚ peer has llm.chat?
β”œβ”€ yes β†’ HTTP POST /bus/v1/call β†’ remote node β†’ stream tokens back
└─ no β†’ CapabilityError("not_found")
```
---
## Discovery Flow
```
Node boots
β”‚
β”œβ”€β”€ mDNS: register _hearthnet._tcp.local. (LAN multicast DNS)
β”œβ”€β”€ UDP: send announce to 224.0.0.251:7079 every 15s
β”‚
β–Ό
PeerRegistry receives announcements from other nodes
β”‚
β”œβ”€β”€ new peer β†’ RegistryEvent(kind="added", entry=...)
β”œβ”€β”€ peer gone (TTL expired) β†’ RegistryEvent(kind="removed", ...)
└── ManifestPublisher re-publishes every 300s
```
---
## Emergency Mode
```
EmergencyDetector (async loop, 30s probe)
β”‚
β”œβ”€β”€ probe connectivity endpoints
β”‚
β”œβ”€β”€ ONLINE β†’ EmergencyState.NORMAL
β”‚ β”‚ UI shows normal theme
β”‚
└── OFFLINE β†’ EmergencyState.EMERGENCY
β”‚ UI switches to emergency theme (red)
β”‚ emergency.llm.chat capability activated
β”‚ LoRa beacons sent if hardware available (M29)
β”‚ Civil defense alerts published if role cert present (M31)
```
---
## MoE Expert Routing (M27)
```
Query arrives at any node
β”‚
β–Ό
MoeRouter.route(query, top_k=3)
β”‚
β”œβ”€β”€ score all registered ExpertDescriptors against query
β”‚ (tag overlap + cosine similarity + recency weighting)
β”‚
└── return ranked RouteResult
β”‚
β”œβ”€β”€ expert_type="model" β†’ bus.call(f"llm.chat@1.0", ...) on that node
β”œβ”€β”€ expert_type="service" β†’ bus.call(expert_capability, ...)
β”œβ”€β”€ expert_type="human" β†’ notify via chat + start handoff timer (M27 Β§4)
└── expert_type="external"β†’ HTTP call to opt-in external API
```
Enable it: set `research.moe_routing = true` in `~/.config/hearthnet/config.toml`.
---
## Distributed Inference (M26 β€” BitTorrent-style LLM sharing)
```
Node A: layers 0–15 of Llama-3.2-3B
Node B: layers 16–27 of Llama-3.2-3B
Node C: layers 28–35 (lm_head) of Llama-3.2-3B
β”‚
β–Ό
PipelineOrchestrator.plan(model_id="llama3.2:3b")
β”‚ β†’ discovers shards via experimental.distributed_llm.shard.list
β”‚ β†’ checks layer coverage: 0..35 βœ“
β”‚
PipelineOrchestrator.run(pipeline, input_tokens)
│ → sends activations A→B via X08 TensorTransport (1 MiB chunks)
│ → B sends activations B→C
β”‚ β†’ C returns final logits
β”‚
└── caller gets streamed tokens like any local model
```
Model weights are shared chunk-by-chunk using BLAKE3 CID-addressed blob transfer β€” same
mechanism as file blobs (M07), but optimised for `.gguf` / `.safetensors` files.
---
## File Tree
```
hearthnet/
β”œβ”€β”€ node.py # HearthNode β€” composition root
β”œβ”€β”€ types.py # Shared type aliases (NodeID, ShardID, AlertID, …)
β”œβ”€β”€ constants.py # All numeric defaults and limits
β”œβ”€β”€ config.py # HearthnetConfig + ResearchConfig (TOML-backed)
β”œβ”€β”€ cli.py # Click CLI entry point
β”œβ”€β”€ facades.py # HearthFacade β€” thin high-level API for app.py
β”œβ”€β”€ controller.py # HearthController β€” legacy thin wrapper
β”‚
β”œβ”€β”€ bus/ # M03 CapabilityBus
β”‚ β”œβ”€β”€ router.py # routing logic (local β†’ remote)
β”‚ β”œβ”€β”€ registry.py # CapabilityEntry, RegistryEvent, Diff
β”‚ β”œβ”€β”€ capability.py # CapabilityEntry dataclass
β”‚ └── health.py # ring-buffer health monitor
β”‚
β”œβ”€β”€ identity/ # M01
β”‚ β”œβ”€β”€ keys.py # Ed25519 key generation + signing
β”‚ β”œβ”€β”€ manifest.py # NodeManifest, CommunityManifest, CommunityPolicy, …
β”‚ └── tokens.py # AuthToken, CapabilityToken
β”‚
β”œβ”€β”€ discovery/ # M02
β”‚ └── peers.py # mDNS + UDP multicast PeerRegistry
β”‚
β”œβ”€β”€ transport/ # X01 / X06 / X08
β”‚ β”œβ”€β”€ client.py # HTTP + SSE client
β”‚ β”œβ”€β”€ streams.py # Frame, SseReader
β”‚ β”œβ”€β”€ backpressure.py # FlowControl, RateCheck, RateLimiter
β”‚ └── tensor/ # X08 tensor chunked transport
β”‚
β”œβ”€β”€ events/ # X02
β”‚ β”œβ”€β”€ log.py # SQLite Lamport event log
β”‚ └── sync.py # Gossip SyncClient / SyncServer
β”‚
β”œβ”€β”€ observability/ # X03
β”‚ β”œβ”€β”€ tracing.py # attach/detach trace context
β”‚ β”œβ”€β”€ metrics.py # MetricsCollector, TrackioExporter
β”‚ └── doctor.py # DoctorResult, CheckResult, DoctorService
β”‚
β”œβ”€β”€ services/ # M04 – M21 + M32
β”‚ β”œβ”€β”€ llm/ # M04 β€” backends: ollama, llama_cpp, lmstudio, hf_api, anthropic
β”‚ β”œβ”€β”€ rag/ # M05
β”‚ β”œβ”€β”€ marketplace/ # M06
β”‚ β”œβ”€β”€ chat/ # M10
β”‚ β”œβ”€β”€ embedding/ # M11
β”‚ β”œβ”€β”€ ocr/ # M17
β”‚ β”œβ”€β”€ translation/ # M18
β”‚ β”œβ”€β”€ stt_tts/ # M19
β”‚ β”œβ”€β”€ vision/ # M20
β”‚ β”œβ”€β”€ tools/ # M21
β”‚ β”œβ”€β”€ group_chat/ # M25
β”‚ └── protocol/ # M32
β”‚
β”œβ”€β”€ ui/ # M08
β”‚ β”œβ”€β”€ app.py # Gradio 8-tab entry point
β”‚ β”œβ”€β”€ tabs/ # one file per tab
β”‚ β”œβ”€β”€ theme.py # hearthnet_theme, emergency_theme
β”‚ β”œβ”€β”€ topology.py # TopologyComponent (mesh graph)
β”‚ β”œβ”€β”€ onboarding.py # first-run wizard + invite QR
β”‚ └── mobile/ # M22 PWA manifest + service worker
β”‚
β”œβ”€β”€ emergency/ # M09
β”‚ β”œβ”€β”€ detector.py # async probe loop
β”‚ └── state.py # EmergencyState enum
β”‚
β”œβ”€β”€ crypto/ # M23
β”‚ └── channel.py # X25519 + ChaCha20-Poly1305
β”‚
β”œβ”€β”€ blobs/ # M07
β”‚ └── store.py # BLAKE3 CID store + chunked reader
β”‚
β”œβ”€β”€ dht/ # X05
β”œβ”€β”€ federation/ # M14
β”œβ”€β”€ relay/ # M15
β”‚
β”œβ”€β”€ distributed_inference/ # M26 (experimental)
β”œβ”€β”€ moe/ # M27 (experimental)
β”œβ”€β”€ fedlearn/ # M28 (experimental)
β”œβ”€β”€ lora/ # M29 (experimental)
β”œβ”€β”€ evidence/ # M30 (experimental)
β”œβ”€β”€ civdef/ # M31 (experimental)
└── conformance/ # X09
```
---
## Configuration
`~/.config/hearthnet/config.toml` (created on first run with defaults):
```toml
[node]
node_id = "" # auto-generated Ed25519 key ID
display_name = "My Node"
data_dir = "~/.hearthnet"
[transport]
http_port = 7080
ui_port = 7860
[llm]
default_backend = "ollama" # "ollama" | "llama_cpp" | "lmstudio" | "hf_api" | "smollm"
[rag]
corpus_dir = "~/.hearthnet/corpus"
embedding_model = "BAAI/bge-small-en-v1.5"
[policy.research]
enable = false # master switch for all experimental modules
moe_routing = false # M27
distributed_inference = false # M26
fedlearn = false # M28
lora_beacons = false # M29
evidence = false # M30
civil_defense = false # M31
```
---
## Connecting a Local Node to the HF Space
The HF Space at `https://huggingface.co/spaces/build-small-hackathon/HearthNet` is a
single-node anchor you can peer with from any local machine.
```bash
# 1. Clone and install
git clone https://huggingface.co/spaces/build-small-hackathon/HearthNet
cd HearthNet
pip install -e .
# 2. Run your local node (pick a free port if 7080 is taken)
python -m hearthnet.cli run --http-port 7080 --ui-port 7860
# 3. Manually add the HF Space anchor as a peer (different network = manual)
python -m hearthnet.cli call discovery.peer.add 1 0 \
'{"endpoint":"https://build-small-hackathon-hearthnet.hf.space","node_id":"hf-space-anchor"}'
# 4. Verify peering
python -m hearthnet.cli call discovery.peers 1 0 '{}'
```
Or use the helper script:
```bash
python scripts/connect_to_hf.py
```
Once peered, your local node can:
- Route LLM queries **from** the HF Space to your local (better) model
- Push community posts that appear in the HF Space UI
- Share blob files across the connection
> **Note:** The HF Space runs on a public server without a static IP for inbound connections.
> Your local node initiates the connection; the HF Space cannot discover you via mDNS.
> Use `discovery.peer.add` or the invite flow to establish the bridge manually.
---
## Security Model
- **Node identity** β€” Ed25519 key pair generated locally, never leaves the device.
- **Trust levels** β€” `unknown` β†’ `member` β†’ `trusted` β†’ `anchor`. Capabilities can require a minimum trust level.
- **Capability scoping** β€” `AuthToken` restricts which capabilities a caller may invoke.
- **Channel encryption** β€” M23 X25519 ECDH + ChaCha20-Poly1305 for inter-node transport (opt-in, defaults off).
- **Experimental capabilities** β€” Phase 3 modules are off by default and require explicit opt-in. The bus refuses to register them unless the feature flag is on.
- **No central authority** β€” there is no HearthNet.com, no certificate authority, no registration server. Trust is established peer-to-peer via invite chains.
---
## Testing
```bash
# Full suite (133 unit + integration tests):
pytest tests/ -q
# Skip slow E2E browser tests:
pytest tests/ -q -k "not e2e"
# Phase 3 experimental module tests only:
pytest tests/test_phase3_experimental.py -v
# Conformance runner (X09):
python -m hearthnet.conformance.runner --output conformance-report/
```
---
*This document is generated from the spec set in `docs/`. For per-module detail see:*
- *Phase 1+2: `docs/00-OVERVIEW.md`, `docs/CAPABILITY_CONTRACT.md`, `docs/M01-*.md` …*
- *Phase 3: `docs/p2_p3/IMPLEMENTATION_REFERENCE_p3.md`, `docs/p2_p3/M26-*.md` …*