Spaces:
Running on Zero
Running on Zero
| # M09 β Emergency Mode Detector | |
| **Spec version:** v1.0 | |
| **Depends on:** M03 (bus, to deregister internet-dependent capabilities), X04 (config), X03 (observability), `httpx`, `socket` | |
| **Depended on by:** M08 (UI shows banner), M04 (re-registers internet backends on restore), M02 (increases discovery cadence) | |
| --- | |
| ## 1. Responsibility | |
| Detect whether the node has working internet access. Publish state transitions locally. Cause the bus to deregister/re-register internet-dependent capabilities and let other modules react. | |
| Out of scope: | |
| - VPN / overlay status | |
| - Per-service connectivity checks | |
| - Cellular signal strength | |
| --- | |
| ## 2. File layout | |
| ``` | |
| hearthnet/emergency/ | |
| βββ __init__.py | |
| βββ detector.py # Detector: probe loop, state machine | |
| βββ state.py # EmergencyState dataclass + StateBus | |
| ``` | |
| --- | |
| ## 3. Public API | |
| ### 3.1 `state.py` | |
| ```python | |
| # hearthnet/emergency/state.py | |
| from dataclasses import dataclass | |
| from typing import Literal | |
| Mode = Literal["online", "degraded", "offline"] | |
| @dataclass(frozen=True) | |
| class EmergencyState: | |
| mode: Mode | |
| since: str # RFC 3339 | |
| last_probe: str | |
| probe_results: dict[str, bool] # target β success | |
| class StateBus: | |
| """In-process pubsub for state changes. UI and other modules subscribe.""" | |
| def __init__(self): ... | |
| def current(self) -> EmergencyState: ... | |
| async def subscribe(self) -> AsyncIterator[EmergencyState]: ... | |
| def _emit(self, state: EmergencyState) -> None: ... # internal | |
| ``` | |
| ### 3.2 `detector.py` | |
| ```python | |
| # hearthnet/emergency/detector.py | |
| class Detector: | |
| def __init__( | |
| self, | |
| config: EmergencyConfig, | |
| bus: CapabilityBus, | |
| state_bus: StateBus, | |
| ): | |
| ... | |
| async def run(self) -> None: | |
| """Main loop. Cancel-safe. | |
| Probe cadence: | |
| - online β every EMERGENCY_PROBE_INTERVAL_ONLINE (10s) | |
| - degraded β every EMERGENCY_PROBE_INTERVAL_OFFLINE (2s) | |
| - offline β every EMERGENCY_PROBE_INTERVAL_OFFLINE (2s) | |
| Each tick: | |
| 1. probe all targets concurrently with 2s timeout | |
| 2. compute new mode | |
| 3. apply debounce (EMERGENCY_TRANSITION_DEBOUNCE_SECONDS, anti-flap) | |
| 4. if mode changed: | |
| - state_bus._emit(new_state) | |
| - if entered offline: bus deregisters internet-dependent capabilities | |
| - if entered online: bus re-registers them | |
| - emit log + metric | |
| """ | |
| async def shutdown(self) -> None: ... | |
| # --- probe primitives --- | |
| async def _probe_dns(self, host: str) -> bool: ... | |
| async def _probe_http(self, url: str) -> bool: ... | |
| ``` | |
| --- | |
| ## 4. State machine | |
| ``` | |
| ββββββββββ any probe fails ββββββββββββ | |
| β ONLINE ββββββββββββββββββββΊβ DEGRADED β | |
| βββββ¬βββββ βββββββ¬βββββ | |
| β² β β₯2 probes fail for 30s | |
| β all probes pass for 10s βΌ | |
| β ββββββββββββ | |
| ββββββββββββββββββββββββββββ€ OFFLINE β | |
| ββββββββββββ | |
| ``` | |
| Anti-flap: if more than 3 transitions occur within 60 seconds, the detector stays in the more pessimistic state (degraded or offline) until the window passes. | |
| --- | |
| ## 5. Behaviour | |
| ### 5.1 Probes | |
| Default targets (from `EmergencyConfig.probe_targets`): | |
| - `1.1.1.1` (DNS A query) | |
| - `8.8.8.8` (DNS A query) | |
| - `cloudflare.com` (HTTPS HEAD) | |
| - `quad9.net` (HTTPS HEAD) | |
| Mode rule: | |
| - `online` requires all 4 succeed | |
| - `offline` requires β₯ 2 to fail | |
| - everything between is `degraded` | |
| ### 5.2 Effects on the bus | |
| When entering `offline`: | |
| ```python | |
| for entry in bus.registry.all_local(): | |
| if entry.descriptor.params.get("requires_internet"): | |
| bus.registry.deregister_local(entry.descriptor.name, entry.descriptor.version) | |
| log.info("offline.deregistered", capability=entry.descriptor.name) | |
| ``` | |
| When returning to `online`: | |
| ```python | |
| for backend in llm_service._backends: | |
| if backend.requires_internet: | |
| llm_service._register_backend(backend) # re-emit descriptors | |
| ``` | |
| `requires_internet` is a convention: services that wrap remote APIs (`anthropic_api`, `hf_api`) set this flag on their `BackendModel` and inject it into the capability descriptor params at registration time. | |
| ### 5.3 Effects on M02 discovery | |
| Detector also calls `peer_registry.set_pruning_aggressive(offline)`: | |
| - Offline: prune stale peers after 30 s instead of 90 | |
| - Online: standard 90 s | |
| This makes offline mode adapt faster to neighbour churn. | |
| ### 5.4 UI surface (M08 consumes) | |
| The state bus is the source for the amber `INTERNET OFFLINE β LOKAL AKTIV` banner. UI subscribes; flips theme; switches LLM passthrough to local-only backends visibly. | |
| ### 5.5 Clock sanity probe (only when online) | |
| When online for β₯ 30 s, send an extra HEAD to a single anchor and check the `Date` header. If our system clock differs by > 60 s, log a warning. We do NOT auto-correct. | |
| ### 5.6 No on-wire pubsub | |
| `emergency.mode.changed` is local only ([CONTRACT Β§8](../CAPABILITY_CONTRACT.md)). Other nodes do their own detection. | |
| --- | |
| ## 6. Errors | |
| This module raises nothing externally; all failures are logged. Internal probe failures are the *normal* signal that drives state. | |
| --- | |
| ## 7. Configuration | |
| From [X04 Β§3](../cross-cutting/X04-config.md): | |
| ```python | |
| config.emergency.probe_targets # list[str] | |
| ``` | |
| Constants: `EMERGENCY_PROBE_INTERVAL_ONLINE`, `EMERGENCY_PROBE_INTERVAL_OFFLINE`, `EMERGENCY_PROBE_TIMEOUT_SECONDS`, `EMERGENCY_TRANSITION_DEBOUNCE_SECONDS`. | |
| --- | |
| ## 8. Tests | |
| ### Unit | |
| - `test_state_transitions_with_synthetic_probes` | |
| - `test_anti_flap_holds_pessimistic_state` | |
| - `test_deregister_called_on_offline_entry` | |
| - `test_reregister_called_on_online_entry` | |
| ### Integration | |
| - `test_demo_unplug_triggers_banner_within_5s` β simulate WAN drop with `iptables` rule, observe state change | |
| --- | |
| ## 9. Cross-references | |
| | What | Where | | |
| |------|-------| | |
| | Online/offline pubsub topic (local) | [CONTRACT Β§8](../CAPABILITY_CONTRACT.md) | | |
| | LLM internet-dependent backends | [M04 Β§4.3](M04-llm.md) | | |
| | Discovery cadence change | [M02 Β§4.3](M02-discovery.md) | | |
| | UI banner | [M08 Β§5.5](M08-ui.md) | | |
| --- | |
| ## 10. Open questions | |
| 1. **Captive portal detection** β Phase 2: probe a known-content URL and compare body hash. MVP: false positives accepted. | |
| 2. **IPv6-only networks** β current probes are dual-stack via OS. Should work; not yet tested. | |
| 3. **Custom probe scripts** β Phase 2: let users add their own targets. | |