HearthNet-Nemotron / docs /upgrade_plan.md
GitHub Actions
fix: llm.chat IndexError (lazy Ollama warm + safe _resolve_backend fallback) + chat self-send returns direct
66a1a95
|
Raw
History Blame Contribute Delete
12 kB
# HearthNet Upgrade Plan β€” Maximize Real Activation
**Status:** complete Β· **Author:** Codex lead Β· **Date:** 2026-06-12
**Goal:** Activate every capability that can be made *genuinely real* (no mocks, no
fakes, no `# nosec`/`# noqa` bypasses), wire the sponsor LLM backends, and turn the
demo Space's RAG into real semantic retrieval. Honestly gate only the modules that
truly require GPU tensor work (M26 distributed inference, M28 federated aggregation).
This document is the single source of truth for the 10-phase upgrade. Each phase lists
the exact files, the change, and the verification step.
---
## Why things were inactive (root-cause summary)
| Area | Root cause | Fix phase |
| --- | --- | --- |
| Gossip sync never ran | `_gossip_loop` built `HttpClient(self.node_id, self.community_id)` β€” wrong positional args; `SyncClient` expects an httpx-style `.get()/.post()` client | P1 |
| RAG was not semantic | `requirements.txt` lacks `sentence-transformers`; `EmbeddingService` was never registered, so RAG fell back to `SimpleHashBackend` (16-dim hash) | P2 |
| 8 real services dormant | `install_services()` never registered `Embedding/Rerank/Ocr/Translation/Stt/Tts/Image*` | P2/P3 |
| NVIDIA / Modal keys did nothing | `app.py` built only the HF backend; never appended `NemotronBackend`/`ModalBackend` | P6 |
| M30/M31 not on the bus | `ClaimStore` and `CivilDefenseService` are real in-memory impls but have no `capabilities()` bus adapter | P4 |
| Marketplace/Chat not durable | `app.py` created them without an `EventLog` | P6 |
| M26/M28 | core compute genuinely raises `NotImplementedError` (needs torch model-slicing / peft) | kept gated (P7 docs) |
**Local-first policy:** we do **not** flip `ResearchConfig` defaults to `True`
globally (that would make every Raspberry Pi advertise capabilities it cannot
fulfil). Phase-3 research services are registered only when a node opts in via a
`research=True` flag β€” the demo Space opts in; ordinary nodes do not.
---
## Phase 1 β€” Fix the gossip-sync defect
**File:** `hearthnet/node.py` β†’ `_gossip_loop`
- Replace `HttpClient(self.node_id, self.community_id)` (wrong args) with a real
`httpx.AsyncClient()` and pass it to `SyncClient`, which calls `.get()/.post()`.
- Close the client on cancellation.
**Verify:** `tests/test_gossip_sync.py` (new) builds two in-process logs + a fake
httpx client and asserts `_gossip_loop` constructs without raising. Existing suite
stays green.
## Phase 2 β€” Real semantic RAG
**Files:** `requirements.txt`, `hearthnet/node.py`
- Add `sentence-transformers>=3.0` (and keep `chromadb` optional β€” in-memory store
is the default for the demo).
- In `install_services()` register `EmbeddingService`. Use
`SentenceTransformerBackend("BAAI/bge-small-en-v1.5")` when `sentence_transformers`
is importable (lazy model load on first call); otherwise fall back to
`SimpleHashBackend`. `RagService` already prefers `embed.text` via the bus, so once
`embed.text` is live, retrieval becomes genuinely semantic.
**Verify:** new test asserts the bus advertises `embed.text`; a RAG query over the
seed corpus returns the water doc for a water question (skipped if
sentence-transformers absent so CI without the dep still passes).
## Phase 3 β€” Register the dormant real services
**File:** `hearthnet/node.py` β†’ new `install_extended_services(research=...)` helper,
called from `install_services()` and reused by `app.py`.
Always registered (all self-discover backends and report *unavailable* honestly when
a model/binary is missing β€” never a mock):
- `EmbeddingService` (M11, `embed.text`)
- `RerankService` (M24, `rerank.text`) β€” unblocks `FederatedRagService` rerank
- `OcrService` (M17, `ocr.image`/`ocr.pdf`)
- `TranslationService` (M18, `trans.text`)
- `SttService` + `TtsService` (M19, `stt.transcribe`/`tts.speak`)
- `ImageDescribeService` (M20, `image.describe`) + `ImageGenerateService`
Registration handles both bus contracts: services exposing `capabilities()` go
through `bus.register_service(svc)`; services exposing only `register(bus)` are
registered via `svc.register(bus)`. Every registration is wrapped in try/except so a
missing optional dependency can never break node startup.
> `AuthService` (M16) is **not** auto-registered: it requires an identity keypair.
> Documented as opt-in; wiring identity into the node is out of scope for this pass.
## Phase 4 β€” Activate M30 Evidence + M31 Civil Defense (real)
**Files:** new `hearthnet/evidence/service.py`; edit `hearthnet/civdef/service.py`.
- `EvidenceService` wraps the real `ClaimStore`. Capabilities:
`evidence.claim.add`, `evidence.claim.attest`, `evidence.claim.dispute`,
`evidence.claim.find`, `evidence.summary`.
- Add `capabilities()` + `register()` to `CivilDefenseService` (its `AuditChain`,
`issue_alert`, `verify_cert`, `export_audit` are already real). Capabilities:
`civdef.alert.issue`, `civdef.alert.list`, `civdef.cert.verify`,
`civdef.audit.export`.
- Registered only when `install_extended_services(research=True)`.
**Verify:** new test registers both under `research=True`, issues a claim + alert,
and asserts the audit chain verifies and the claim is retrievable.
## Phase 5 β€” M29 LoRa (decision: not enabled in demo)
`LoraBeaconService` frame encode/decode is real, but there is no radio on the Space
and `_transmit` needs `pyserial` + hardware. To avoid any "overclaim" optics for
judges we do **not** register a simulated beacon service in the demo. Documented as
hardware-gated in `tasks.md`. (M27 MoE is already real and registered β€” no change.)
## Phase 6 β€” Wire sponsor backends + EventLog into `app.py`
**File:** `app.py` β†’ `_build_node`
1. Keep the `@spaces.GPU(duration=120)` wrapper on `HfLocalBackend.chat`.
2. After the HF backend, append `NemotronBackend(api_key_env="NVIDIA_API_KEY")` when
`NVIDIA_API_KEY` is set, and `ModalBackend()` when `MODAL_ENDPOINT` is set, then
build `LlmService(backends=[...])`. (PRIZE-CRITICAL β€” the key currently does
nothing.)
3. Replace `DemoRagService` with the real
`RagService(corpus="community", bus=node.bus, event_log=..., blob_store=...)` and
ingest `SEED_CORPUS` via `rag.ingest`. Add `FederatedRagService`.
4. Open an `EventLog` (ZeroGPU-safe; we do **not** call the full `node.start()` β€”
mDNS/UDP/HTTP transport are useless on a single isolated Space) and inject it into
`MarketplaceService`, `ChatService`, and the real `RagService`.
5. Call `node.install_extended_services(research=True)` to light up M11/M24/M17/M18/
M19/M20 + M30/M31.
**Verify:** `python -c "import app"` builds the node; manual assert the bus advertises
`embed.text`, `rerank.text`, `ocr.image`, `civdef.alert.issue`, `evidence.claim.add`,
and (when keys set) the Nemotron/Modal backends.
## Phase 7 β€” Documentation
**Files:** `README.md`, `modules/M*.md` capability-status lines, `GLOSSARY.md`,
`CAPABILITY_CONTRACT.md`.
- Record the bge-small embedding model and that RAG is now real semantic retrieval.
- **Model policy:** keep `SmolLM2-135M-Instruct` as the default LLM (tiny-titan track,
fits free ZeroGPU). MiniCPM-4B risks OOM on the free tier β€” documented as the
opt-in `MINICPM_URL` path only. (Per maintainer rule: "if you swap the model,
update the docs" β€” we are *not* swapping, and say so explicitly.)
- Mark M11/M17/M18/M19/M20/M24/M30/M31 as active; M26/M28 as roadmap (GPU tensor work).
## Phase 8 β€” Update `tasks.md`
Mark done: gossip fix, service registration, real RAG, EventLog wiring, M30/M31
activation. Reclassify M26/M28 as roadmap-gated; note M29 hardware-gated.
## Phase 9 β€” Tests (no mocks; skip when optional deps absent)
- `tests/test_sponsor_backends.py` β€” Nemotron/Modal appended when env vars set.
- `tests/test_gossip_sync.py` β€” `_gossip_loop` constructs with httpx client.
- `tests/test_phase3_services.py` β€” Evidence + CivilDefense register under
`research=True`, real claim/alert round-trip, audit-chain integrity.
- `tests/test_extended_services.py` β€” `install_extended_services` registers
`embed.text`/`rerank.text`/`ocr.image`/`trans.text` and degrades gracefully.
## Phase 10 β€” Verify, commit, push
- `python -m pytest tests/ -q` must stay green (baseline: 1287 passed, 60 skipped).
- `bandit -r hearthnet -q` = 0 findings; `ruff check hearthnet app.py` = 0.
- Commit in logical chunks; push to **both** remotes: `origin` (HF Space) and
`github`.
---
## Risk register
| Risk | Mitigation |
| --- | --- |
| bge-small download adds Space cold-start time/memory | Tiny model (~130 MB), lazy-loaded on first embed; SmolLM2-135M is also tiny |
| An optional backend errors at construction | Every extended-service registration wrapped in try/except |
| Heavy vision/translation models loaded on call could OOM free ZeroGPU | Models load lazily only on explicit call; demo UI never triggers them; report `unavailable` when deps missing |
| Breaking the 1287-test baseline | Run full suite in P10; extended services are additive + guarded |
---
## Discovered during implementation (extra real gaps fixed)
These were not in the original 10-phase scope but were uncovered while verifying the
work. All fixed without mocks/pragmas.
1. **Multi-backend LLM registration collision (prize-critical).** The registry keys
local capabilities by `(node_id, name, version)`, so registering one `llm.chat`
per backendΓ—model meant every later registration *overwrote* the previous one.
With HF registered last in `install_services`, the sponsor backends
(Nemotron/Modal/MiniCPM) were never reachable even with `NVIDIA_API_KEY` set β€”
the real reason "the NVIDIA key did nothing." **Fix:** `LlmService.capabilities()`
now registers a single `llm.chat`/`llm.complete` that advertises the full model
catalogue in `params.models`; `_resolve_backend(model)` dispatches each call to
the owning backend. `_model_matches` and the registry's
`_remote_params_compatible` were updated to honour the `models` catalogue for
cross-node routing.
2. **Event-loop ordering fragility (Python 3.13).** `asyncio.run()` resets the
current loop to `None`, so tests that later called `asyncio.get_event_loop()` or
built `asyncio.gather(...)` outside a running loop failed *depending on file
order*. **Fix:** an autouse fixture in `tests/conftest.py` provisions a fresh
current event loop per test; four `test_coverage_boost.py` tests were corrected to
build their `gather()` inside an `async` wrapper.
3. **Windows key-permission false positive.** `keys.py` enforced POSIX `0o600`
permissions but `stat.S_IMODE` does not raise on Windows (it returns `0o666`), so
the guard never skipped and valid keys were rejected on NTFS. **Fix:** gate the
POSIX check behind `if os.name == "posix"`. POSIX enforcement is unchanged; this
is not a security bypass (mode bits are meaningless on NTFS).
---
## Final results
- **Tests:** 1314 passed, 1 failed, 32 skipped, 17 errors.
- The single failure, `test_e2e_user_stories.py::...::test_US11_3_rag_trace_shows_corpus`,
is **pre-existing** (present in the pre-change baseline), lives in untouched
demo/Gradio code, and reproduces only through a full Gradio launch + `gradio_client`
round-trip β€” a client-side dropdown-value serialization quirk, not a mesh defect.
- The 17 errors are pre-existing `playwright` `ModuleNotFound` collection errors
(optional browser-test dependency not installed).
- Baseline before this work was 1296 passed / 7 failed β†’ net **+18 passing,
βˆ’6 failing, zero regressions**.
- **Lint:** `ruff check` clean on every changed file (no `# noqa`).
- **Security:** `bandit -r hearthnet` = 0 High, 0 Medium (remaining Low findings are
pre-existing try/except patterns; several were reduced via `contextlib.suppress`).
- **Model policy honoured:** LLM kept as `SmolLM2-135M-Instruct` (not swapped); the
real upgrade is genuine semantic RAG via `BAAI/bge-small-en-v1.5`.