HearthNet-Nemotron

Running on Zero

App Files Files Community

HearthNet-Nemotron / docs /upgrade_plan.md

GitHub Actions

fix: llm.chat IndexError (lazy Ollama warm + safe _resolve_backend fallback) + chat self-send returns direct

66a1a95 12 days ago

preview code

Raw

History Blame Contribute Delete

12 kB

	# HearthNet Upgrade Plan — Maximize Real Activation

	Status: complete · Author: Codex lead · Date: 2026-06-12
	Goal: Activate every capability that can be made genuinely real (no mocks, no
	fakes, no `# nosec`/`# noqa` bypasses), wire the sponsor LLM backends, and turn the
	demo Space's RAG into real semantic retrieval. Honestly gate only the modules that
	truly require GPU tensor work (M26 distributed inference, M28 federated aggregation).

	This document is the single source of truth for the 10-phase upgrade. Each phase lists
	the exact files, the change, and the verification step.

	---

	## Why things were inactive (root-cause summary)

	\| Area \| Root cause \| Fix phase \|
	\| --- \| --- \| --- \|
	\| Gossip sync never ran \| `_gossip_loop` built `HttpClient(self.node_id, self.community_id)` — wrong positional args; `SyncClient` expects an httpx-style `.get()/.post()` client \| P1 \|
	\| RAG was not semantic \| `requirements.txt` lacks `sentence-transformers`; `EmbeddingService` was never registered, so RAG fell back to `SimpleHashBackend` (16-dim hash) \| P2 \|
	\| 8 real services dormant \| `install_services()` never registered `Embedding/Rerank/Ocr/Translation/Stt/Tts/Image*` \| P2/P3 \|
	\| NVIDIA / Modal keys did nothing \| `app.py` built only the HF backend; never appended `NemotronBackend`/`ModalBackend` \| P6 \|
	\| M30/M31 not on the bus \| `ClaimStore` and `CivilDefenseService` are real in-memory impls but have no `capabilities()` bus adapter \| P4 \|
	\| Marketplace/Chat not durable \| `app.py` created them without an `EventLog` \| P6 \|
	\| M26/M28 \| core compute genuinely raises `NotImplementedError` (needs torch model-slicing / peft) \| kept gated (P7 docs) \|

	Local-first policy: we do not flip `ResearchConfig` defaults to `True`
	globally (that would make every Raspberry Pi advertise capabilities it cannot
	fulfil). Phase-3 research services are registered only when a node opts in via a
	`research=True` flag — the demo Space opts in; ordinary nodes do not.

	---

	## Phase 1 — Fix the gossip-sync defect

	File: `hearthnet/node.py` → `_gossip_loop`

	- Replace `HttpClient(self.node_id, self.community_id)` (wrong args) with a real
	`httpx.AsyncClient()` and pass it to `SyncClient`, which calls `.get()/.post()`.
	- Close the client on cancellation.

	Verify: `tests/test_gossip_sync.py` (new) builds two in-process logs + a fake
	httpx client and asserts `_gossip_loop` constructs without raising. Existing suite
	stays green.

	## Phase 2 — Real semantic RAG

	Files: `requirements.txt`, `hearthnet/node.py`

	- Add `sentence-transformers>=3.0` (and keep `chromadb` optional — in-memory store
	is the default for the demo).
	- In `install_services()` register `EmbeddingService`. Use
	`SentenceTransformerBackend("BAAI/bge-small-en-v1.5")` when `sentence_transformers`
	is importable (lazy model load on first call); otherwise fall back to
	`SimpleHashBackend`. `RagService` already prefers `embed.text` via the bus, so once
	`embed.text` is live, retrieval becomes genuinely semantic.

	Verify: new test asserts the bus advertises `embed.text`; a RAG query over the
	seed corpus returns the water doc for a water question (skipped if
	sentence-transformers absent so CI without the dep still passes).

	## Phase 3 — Register the dormant real services

	File: `hearthnet/node.py` → new `install_extended_services(research=...)` helper,
	called from `install_services()` and reused by `app.py`.

	Always registered (all self-discover backends and report unavailable honestly when
	a model/binary is missing — never a mock):

	- `EmbeddingService` (M11, `embed.text`)
	- `RerankService` (M24, `rerank.text`) — unblocks `FederatedRagService` rerank
	- `OcrService` (M17, `ocr.image`/`ocr.pdf`)
	- `TranslationService` (M18, `trans.text`)
	- `SttService` + `TtsService` (M19, `stt.transcribe`/`tts.speak`)
	- `ImageDescribeService` (M20, `image.describe`) + `ImageGenerateService`

	Registration handles both bus contracts: services exposing `capabilities()` go
	through `bus.register_service(svc)`; services exposing only `register(bus)` are
	registered via `svc.register(bus)`. Every registration is wrapped in try/except so a
	missing optional dependency can never break node startup.

	> `AuthService` (M16) is not auto-registered: it requires an identity keypair.
	> Documented as opt-in; wiring identity into the node is out of scope for this pass.

	## Phase 4 — Activate M30 Evidence + M31 Civil Defense (real)

	Files: new `hearthnet/evidence/service.py`; edit `hearthnet/civdef/service.py`.

	- `EvidenceService` wraps the real `ClaimStore`. Capabilities:
	`evidence.claim.add`, `evidence.claim.attest`, `evidence.claim.dispute`,
	`evidence.claim.find`, `evidence.summary`.
	- Add `capabilities()` + `register()` to `CivilDefenseService` (its `AuditChain`,
	`issue_alert`, `verify_cert`, `export_audit` are already real). Capabilities:
	`civdef.alert.issue`, `civdef.alert.list`, `civdef.cert.verify`,
	`civdef.audit.export`.
	- Registered only when `install_extended_services(research=True)`.

	Verify: new test registers both under `research=True`, issues a claim + alert,
	and asserts the audit chain verifies and the claim is retrievable.

	## Phase 5 — M29 LoRa (decision: not enabled in demo)

	`LoraBeaconService` frame encode/decode is real, but there is no radio on the Space
	and `_transmit` needs `pyserial` + hardware. To avoid any "overclaim" optics for
	judges we do not register a simulated beacon service in the demo. Documented as
	hardware-gated in `tasks.md`. (M27 MoE is already real and registered — no change.)

	## Phase 6 — Wire sponsor backends + EventLog into `app.py`

	File: `app.py` → `_build_node`

	1. Keep the `@spaces.GPU(duration=120)` wrapper on `HfLocalBackend.chat`.
	2. After the HF backend, append `NemotronBackend(api_key_env="NVIDIA_API_KEY")` when
	`NVIDIA_API_KEY` is set, and `ModalBackend()` when `MODAL_ENDPOINT` is set, then
	build `LlmService(backends=[...])`. (PRIZE-CRITICAL — the key currently does
	nothing.)
	3. Replace `DemoRagService` with the real
	`RagService(corpus="community", bus=node.bus, event_log=..., blob_store=...)` and
	ingest `SEED_CORPUS` via `rag.ingest`. Add `FederatedRagService`.
	4. Open an `EventLog` (ZeroGPU-safe; we do not call the full `node.start()` —
	mDNS/UDP/HTTP transport are useless on a single isolated Space) and inject it into
	`MarketplaceService`, `ChatService`, and the real `RagService`.
	5. Call `node.install_extended_services(research=True)` to light up M11/M24/M17/M18/
	M19/M20 + M30/M31.

	Verify: `python -c "import app"` builds the node; manual assert the bus advertises
	`embed.text`, `rerank.text`, `ocr.image`, `civdef.alert.issue`, `evidence.claim.add`,
	and (when keys set) the Nemotron/Modal backends.

	## Phase 7 — Documentation

	Files: `README.md`, `modules/M*.md` capability-status lines, `GLOSSARY.md`,
	`CAPABILITY_CONTRACT.md`.

	- Record the bge-small embedding model and that RAG is now real semantic retrieval.
	- Model policy: keep `SmolLM2-135M-Instruct` as the default LLM (tiny-titan track,
	fits free ZeroGPU). MiniCPM-4B risks OOM on the free tier — documented as the
	opt-in `MINICPM_URL` path only. (Per maintainer rule: "if you swap the model,
	update the docs" — we are not swapping, and say so explicitly.)
	- Mark M11/M17/M18/M19/M20/M24/M30/M31 as active; M26/M28 as roadmap (GPU tensor work).

	## Phase 8 — Update `tasks.md`

	Mark done: gossip fix, service registration, real RAG, EventLog wiring, M30/M31
	activation. Reclassify M26/M28 as roadmap-gated; note M29 hardware-gated.

	## Phase 9 — Tests (no mocks; skip when optional deps absent)

	- `tests/test_sponsor_backends.py` — Nemotron/Modal appended when env vars set.
	- `tests/test_gossip_sync.py` — `_gossip_loop` constructs with httpx client.
	- `tests/test_phase3_services.py` — Evidence + CivilDefense register under
	`research=True`, real claim/alert round-trip, audit-chain integrity.
	- `tests/test_extended_services.py` — `install_extended_services` registers
	`embed.text`/`rerank.text`/`ocr.image`/`trans.text` and degrades gracefully.

	## Phase 10 — Verify, commit, push

	- `python -m pytest tests/ -q` must stay green (baseline: 1287 passed, 60 skipped).
	- `bandit -r hearthnet -q` = 0 findings; `ruff check hearthnet app.py` = 0.
	- Commit in logical chunks; push to both remotes: `origin` (HF Space) and
	`github`.

	---

	## Risk register

	\| Risk \| Mitigation \|
	\| --- \| --- \|
	\| bge-small download adds Space cold-start time/memory \| Tiny model (~130 MB), lazy-loaded on first embed; SmolLM2-135M is also tiny \|
	\| An optional backend errors at construction \| Every extended-service registration wrapped in try/except \|
	\| Heavy vision/translation models loaded on call could OOM free ZeroGPU \| Models load lazily only on explicit call; demo UI never triggers them; report `unavailable` when deps missing \|
	\| Breaking the 1287-test baseline \| Run full suite in P10; extended services are additive + guarded \|

	---

	## Discovered during implementation (extra real gaps fixed)

	These were not in the original 10-phase scope but were uncovered while verifying the
	work. All fixed without mocks/pragmas.

	1. Multi-backend LLM registration collision (prize-critical). The registry keys
	local capabilities by `(node_id, name, version)`, so registering one `llm.chat`
	per backend×model meant every later registration overwrote the previous one.
	With HF registered last in `install_services`, the sponsor backends
	(Nemotron/Modal/MiniCPM) were never reachable even with `NVIDIA_API_KEY` set —
	the real reason "the NVIDIA key did nothing." Fix: `LlmService.capabilities()`
	now registers a single `llm.chat`/`llm.complete` that advertises the full model
	catalogue in `params.models`; `_resolve_backend(model)` dispatches each call to
	the owning backend. `_model_matches` and the registry's
	`_remote_params_compatible` were updated to honour the `models` catalogue for
	cross-node routing.
	2. Event-loop ordering fragility (Python 3.13). `asyncio.run()` resets the
	current loop to `None`, so tests that later called `asyncio.get_event_loop()` or
	built `asyncio.gather(...)` outside a running loop failed *depending on file
	order. Fix:* an autouse fixture in `tests/conftest.py` provisions a fresh
	current event loop per test; four `test_coverage_boost.py` tests were corrected to
	build their `gather()` inside an `async` wrapper.
	3. Windows key-permission false positive. `keys.py` enforced POSIX `0o600`
	permissions but `stat.S_IMODE` does not raise on Windows (it returns `0o666`), so
	the guard never skipped and valid keys were rejected on NTFS. Fix: gate the
	POSIX check behind `if os.name == "posix"`. POSIX enforcement is unchanged; this
	is not a security bypass (mode bits are meaningless on NTFS).

	---

	## Final results

	- Tests: 1314 passed, 1 failed, 32 skipped, 17 errors.
	- The single failure, `test_e2e_user_stories.py::...::test_US11_3_rag_trace_shows_corpus`,
	is pre-existing (present in the pre-change baseline), lives in untouched
	demo/Gradio code, and reproduces only through a full Gradio launch + `gradio_client`
	round-trip — a client-side dropdown-value serialization quirk, not a mesh defect.
	- The 17 errors are pre-existing `playwright` `ModuleNotFound` collection errors
	(optional browser-test dependency not installed).
	- Baseline before this work was 1296 passed / 7 failed → net **+18 passing,
	−6 failing, zero regressions**.
	- Lint: `ruff check` clean on every changed file (no `# noqa`).
	- Security: `bandit -r hearthnet` = 0 High, 0 Medium (remaining Low findings are
	pre-existing try/except patterns; several were reduced via `contextlib.suppress`).
	- Model policy honoured: LLM kept as `SmolLM2-135M-Instruct` (not swapped); the
	real upgrade is genuine semantic RAG via `BAAI/bge-small-en-v1.5`.