Spaces:
Running on Zero
Running on Zero
GitHub Actions
fix: llm.chat IndexError (lazy Ollama warm + safe _resolve_backend fallback) + chat self-send returns direct
66a1a95 | # HearthNet Upgrade Plan β Maximize Real Activation | |
| **Status:** complete Β· **Author:** Codex lead Β· **Date:** 2026-06-12 | |
| **Goal:** Activate every capability that can be made *genuinely real* (no mocks, no | |
| fakes, no `# nosec`/`# noqa` bypasses), wire the sponsor LLM backends, and turn the | |
| demo Space's RAG into real semantic retrieval. Honestly gate only the modules that | |
| truly require GPU tensor work (M26 distributed inference, M28 federated aggregation). | |
| This document is the single source of truth for the 10-phase upgrade. Each phase lists | |
| the exact files, the change, and the verification step. | |
| --- | |
| ## Why things were inactive (root-cause summary) | |
| | Area | Root cause | Fix phase | | |
| | --- | --- | --- | | |
| | Gossip sync never ran | `_gossip_loop` built `HttpClient(self.node_id, self.community_id)` β wrong positional args; `SyncClient` expects an httpx-style `.get()/.post()` client | P1 | | |
| | RAG was not semantic | `requirements.txt` lacks `sentence-transformers`; `EmbeddingService` was never registered, so RAG fell back to `SimpleHashBackend` (16-dim hash) | P2 | | |
| | 8 real services dormant | `install_services()` never registered `Embedding/Rerank/Ocr/Translation/Stt/Tts/Image*` | P2/P3 | | |
| | NVIDIA / Modal keys did nothing | `app.py` built only the HF backend; never appended `NemotronBackend`/`ModalBackend` | P6 | | |
| | M30/M31 not on the bus | `ClaimStore` and `CivilDefenseService` are real in-memory impls but have no `capabilities()` bus adapter | P4 | | |
| | Marketplace/Chat not durable | `app.py` created them without an `EventLog` | P6 | | |
| | M26/M28 | core compute genuinely raises `NotImplementedError` (needs torch model-slicing / peft) | kept gated (P7 docs) | | |
| **Local-first policy:** we do **not** flip `ResearchConfig` defaults to `True` | |
| globally (that would make every Raspberry Pi advertise capabilities it cannot | |
| fulfil). Phase-3 research services are registered only when a node opts in via a | |
| `research=True` flag β the demo Space opts in; ordinary nodes do not. | |
| --- | |
| ## Phase 1 β Fix the gossip-sync defect | |
| **File:** `hearthnet/node.py` β `_gossip_loop` | |
| - Replace `HttpClient(self.node_id, self.community_id)` (wrong args) with a real | |
| `httpx.AsyncClient()` and pass it to `SyncClient`, which calls `.get()/.post()`. | |
| - Close the client on cancellation. | |
| **Verify:** `tests/test_gossip_sync.py` (new) builds two in-process logs + a fake | |
| httpx client and asserts `_gossip_loop` constructs without raising. Existing suite | |
| stays green. | |
| ## Phase 2 β Real semantic RAG | |
| **Files:** `requirements.txt`, `hearthnet/node.py` | |
| - Add `sentence-transformers>=3.0` (and keep `chromadb` optional β in-memory store | |
| is the default for the demo). | |
| - In `install_services()` register `EmbeddingService`. Use | |
| `SentenceTransformerBackend("BAAI/bge-small-en-v1.5")` when `sentence_transformers` | |
| is importable (lazy model load on first call); otherwise fall back to | |
| `SimpleHashBackend`. `RagService` already prefers `embed.text` via the bus, so once | |
| `embed.text` is live, retrieval becomes genuinely semantic. | |
| **Verify:** new test asserts the bus advertises `embed.text`; a RAG query over the | |
| seed corpus returns the water doc for a water question (skipped if | |
| sentence-transformers absent so CI without the dep still passes). | |
| ## Phase 3 β Register the dormant real services | |
| **File:** `hearthnet/node.py` β new `install_extended_services(research=...)` helper, | |
| called from `install_services()` and reused by `app.py`. | |
| Always registered (all self-discover backends and report *unavailable* honestly when | |
| a model/binary is missing β never a mock): | |
| - `EmbeddingService` (M11, `embed.text`) | |
| - `RerankService` (M24, `rerank.text`) β unblocks `FederatedRagService` rerank | |
| - `OcrService` (M17, `ocr.image`/`ocr.pdf`) | |
| - `TranslationService` (M18, `trans.text`) | |
| - `SttService` + `TtsService` (M19, `stt.transcribe`/`tts.speak`) | |
| - `ImageDescribeService` (M20, `image.describe`) + `ImageGenerateService` | |
| Registration handles both bus contracts: services exposing `capabilities()` go | |
| through `bus.register_service(svc)`; services exposing only `register(bus)` are | |
| registered via `svc.register(bus)`. Every registration is wrapped in try/except so a | |
| missing optional dependency can never break node startup. | |
| > `AuthService` (M16) is **not** auto-registered: it requires an identity keypair. | |
| > Documented as opt-in; wiring identity into the node is out of scope for this pass. | |
| ## Phase 4 β Activate M30 Evidence + M31 Civil Defense (real) | |
| **Files:** new `hearthnet/evidence/service.py`; edit `hearthnet/civdef/service.py`. | |
| - `EvidenceService` wraps the real `ClaimStore`. Capabilities: | |
| `evidence.claim.add`, `evidence.claim.attest`, `evidence.claim.dispute`, | |
| `evidence.claim.find`, `evidence.summary`. | |
| - Add `capabilities()` + `register()` to `CivilDefenseService` (its `AuditChain`, | |
| `issue_alert`, `verify_cert`, `export_audit` are already real). Capabilities: | |
| `civdef.alert.issue`, `civdef.alert.list`, `civdef.cert.verify`, | |
| `civdef.audit.export`. | |
| - Registered only when `install_extended_services(research=True)`. | |
| **Verify:** new test registers both under `research=True`, issues a claim + alert, | |
| and asserts the audit chain verifies and the claim is retrievable. | |
| ## Phase 5 β M29 LoRa (decision: not enabled in demo) | |
| `LoraBeaconService` frame encode/decode is real, but there is no radio on the Space | |
| and `_transmit` needs `pyserial` + hardware. To avoid any "overclaim" optics for | |
| judges we do **not** register a simulated beacon service in the demo. Documented as | |
| hardware-gated in `tasks.md`. (M27 MoE is already real and registered β no change.) | |
| ## Phase 6 β Wire sponsor backends + EventLog into `app.py` | |
| **File:** `app.py` β `_build_node` | |
| 1. Keep the `@spaces.GPU(duration=120)` wrapper on `HfLocalBackend.chat`. | |
| 2. After the HF backend, append `NemotronBackend(api_key_env="NVIDIA_API_KEY")` when | |
| `NVIDIA_API_KEY` is set, and `ModalBackend()` when `MODAL_ENDPOINT` is set, then | |
| build `LlmService(backends=[...])`. (PRIZE-CRITICAL β the key currently does | |
| nothing.) | |
| 3. Replace `DemoRagService` with the real | |
| `RagService(corpus="community", bus=node.bus, event_log=..., blob_store=...)` and | |
| ingest `SEED_CORPUS` via `rag.ingest`. Add `FederatedRagService`. | |
| 4. Open an `EventLog` (ZeroGPU-safe; we do **not** call the full `node.start()` β | |
| mDNS/UDP/HTTP transport are useless on a single isolated Space) and inject it into | |
| `MarketplaceService`, `ChatService`, and the real `RagService`. | |
| 5. Call `node.install_extended_services(research=True)` to light up M11/M24/M17/M18/ | |
| M19/M20 + M30/M31. | |
| **Verify:** `python -c "import app"` builds the node; manual assert the bus advertises | |
| `embed.text`, `rerank.text`, `ocr.image`, `civdef.alert.issue`, `evidence.claim.add`, | |
| and (when keys set) the Nemotron/Modal backends. | |
| ## Phase 7 β Documentation | |
| **Files:** `README.md`, `modules/M*.md` capability-status lines, `GLOSSARY.md`, | |
| `CAPABILITY_CONTRACT.md`. | |
| - Record the bge-small embedding model and that RAG is now real semantic retrieval. | |
| - **Model policy:** keep `SmolLM2-135M-Instruct` as the default LLM (tiny-titan track, | |
| fits free ZeroGPU). MiniCPM-4B risks OOM on the free tier β documented as the | |
| opt-in `MINICPM_URL` path only. (Per maintainer rule: "if you swap the model, | |
| update the docs" β we are *not* swapping, and say so explicitly.) | |
| - Mark M11/M17/M18/M19/M20/M24/M30/M31 as active; M26/M28 as roadmap (GPU tensor work). | |
| ## Phase 8 β Update `tasks.md` | |
| Mark done: gossip fix, service registration, real RAG, EventLog wiring, M30/M31 | |
| activation. Reclassify M26/M28 as roadmap-gated; note M29 hardware-gated. | |
| ## Phase 9 β Tests (no mocks; skip when optional deps absent) | |
| - `tests/test_sponsor_backends.py` β Nemotron/Modal appended when env vars set. | |
| - `tests/test_gossip_sync.py` β `_gossip_loop` constructs with httpx client. | |
| - `tests/test_phase3_services.py` β Evidence + CivilDefense register under | |
| `research=True`, real claim/alert round-trip, audit-chain integrity. | |
| - `tests/test_extended_services.py` β `install_extended_services` registers | |
| `embed.text`/`rerank.text`/`ocr.image`/`trans.text` and degrades gracefully. | |
| ## Phase 10 β Verify, commit, push | |
| - `python -m pytest tests/ -q` must stay green (baseline: 1287 passed, 60 skipped). | |
| - `bandit -r hearthnet -q` = 0 findings; `ruff check hearthnet app.py` = 0. | |
| - Commit in logical chunks; push to **both** remotes: `origin` (HF Space) and | |
| `github`. | |
| --- | |
| ## Risk register | |
| | Risk | Mitigation | | |
| | --- | --- | | |
| | bge-small download adds Space cold-start time/memory | Tiny model (~130 MB), lazy-loaded on first embed; SmolLM2-135M is also tiny | | |
| | An optional backend errors at construction | Every extended-service registration wrapped in try/except | | |
| | Heavy vision/translation models loaded on call could OOM free ZeroGPU | Models load lazily only on explicit call; demo UI never triggers them; report `unavailable` when deps missing | | |
| | Breaking the 1287-test baseline | Run full suite in P10; extended services are additive + guarded | | |
| --- | |
| ## Discovered during implementation (extra real gaps fixed) | |
| These were not in the original 10-phase scope but were uncovered while verifying the | |
| work. All fixed without mocks/pragmas. | |
| 1. **Multi-backend LLM registration collision (prize-critical).** The registry keys | |
| local capabilities by `(node_id, name, version)`, so registering one `llm.chat` | |
| per backendΓmodel meant every later registration *overwrote* the previous one. | |
| With HF registered last in `install_services`, the sponsor backends | |
| (Nemotron/Modal/MiniCPM) were never reachable even with `NVIDIA_API_KEY` set β | |
| the real reason "the NVIDIA key did nothing." **Fix:** `LlmService.capabilities()` | |
| now registers a single `llm.chat`/`llm.complete` that advertises the full model | |
| catalogue in `params.models`; `_resolve_backend(model)` dispatches each call to | |
| the owning backend. `_model_matches` and the registry's | |
| `_remote_params_compatible` were updated to honour the `models` catalogue for | |
| cross-node routing. | |
| 2. **Event-loop ordering fragility (Python 3.13).** `asyncio.run()` resets the | |
| current loop to `None`, so tests that later called `asyncio.get_event_loop()` or | |
| built `asyncio.gather(...)` outside a running loop failed *depending on file | |
| order*. **Fix:** an autouse fixture in `tests/conftest.py` provisions a fresh | |
| current event loop per test; four `test_coverage_boost.py` tests were corrected to | |
| build their `gather()` inside an `async` wrapper. | |
| 3. **Windows key-permission false positive.** `keys.py` enforced POSIX `0o600` | |
| permissions but `stat.S_IMODE` does not raise on Windows (it returns `0o666`), so | |
| the guard never skipped and valid keys were rejected on NTFS. **Fix:** gate the | |
| POSIX check behind `if os.name == "posix"`. POSIX enforcement is unchanged; this | |
| is not a security bypass (mode bits are meaningless on NTFS). | |
| --- | |
| ## Final results | |
| - **Tests:** 1314 passed, 1 failed, 32 skipped, 17 errors. | |
| - The single failure, `test_e2e_user_stories.py::...::test_US11_3_rag_trace_shows_corpus`, | |
| is **pre-existing** (present in the pre-change baseline), lives in untouched | |
| demo/Gradio code, and reproduces only through a full Gradio launch + `gradio_client` | |
| round-trip β a client-side dropdown-value serialization quirk, not a mesh defect. | |
| - The 17 errors are pre-existing `playwright` `ModuleNotFound` collection errors | |
| (optional browser-test dependency not installed). | |
| - Baseline before this work was 1296 passed / 7 failed β net **+18 passing, | |
| β6 failing, zero regressions**. | |
| - **Lint:** `ruff check` clean on every changed file (no `# noqa`). | |
| - **Security:** `bandit -r hearthnet` = 0 High, 0 Medium (remaining Low findings are | |
| pre-existing try/except patterns; several were reduced via `contextlib.suppress`). | |
| - **Model policy honoured:** LLM kept as `SmolLM2-135M-Instruct` (not swapped); the | |
| real upgrade is genuine semantic RAG via `BAAI/bge-small-en-v1.5`. | |