Spaces:
Running on Zero
M09 β Emergency Mode Detector
Spec version: v1.0
Depends on: M03 (bus, to deregister internet-dependent capabilities), X04 (config), X03 (observability), httpx, socket
Depended on by: M08 (UI shows banner), M04 (re-registers internet backends on restore), M02 (increases discovery cadence)
1. Responsibility
Detect whether the node has working internet access. Publish state transitions locally. Cause the bus to deregister/re-register internet-dependent capabilities and let other modules react.
Out of scope:
- VPN / overlay status
- Per-service connectivity checks
- Cellular signal strength
2. File layout
hearthnet/emergency/
βββ __init__.py
βββ detector.py # Detector: probe loop, state machine
βββ state.py # EmergencyState dataclass + StateBus
3. Public API
3.1 state.py
# hearthnet/emergency/state.py
from dataclasses import dataclass
from typing import Literal
Mode = Literal["online", "degraded", "offline"]
@dataclass(frozen=True)
class EmergencyState:
mode: Mode
since: str # RFC 3339
last_probe: str
probe_results: dict[str, bool] # target β success
class StateBus:
"""In-process pubsub for state changes. UI and other modules subscribe."""
def __init__(self): ...
def current(self) -> EmergencyState: ...
async def subscribe(self) -> AsyncIterator[EmergencyState]: ...
def _emit(self, state: EmergencyState) -> None: ... # internal
3.2 detector.py
# hearthnet/emergency/detector.py
class Detector:
def __init__(
self,
config: EmergencyConfig,
bus: CapabilityBus,
state_bus: StateBus,
):
...
async def run(self) -> None:
"""Main loop. Cancel-safe.
Probe cadence:
- online β every EMERGENCY_PROBE_INTERVAL_ONLINE (10s)
- degraded β every EMERGENCY_PROBE_INTERVAL_OFFLINE (2s)
- offline β every EMERGENCY_PROBE_INTERVAL_OFFLINE (2s)
Each tick:
1. probe all targets concurrently with 2s timeout
2. compute new mode
3. apply debounce (EMERGENCY_TRANSITION_DEBOUNCE_SECONDS, anti-flap)
4. if mode changed:
- state_bus._emit(new_state)
- if entered offline: bus deregisters internet-dependent capabilities
- if entered online: bus re-registers them
- emit log + metric
"""
async def shutdown(self) -> None: ...
# --- probe primitives ---
async def _probe_dns(self, host: str) -> bool: ...
async def _probe_http(self, url: str) -> bool: ...
4. State machine
ββββββββββ any probe fails ββββββββββββ
β ONLINE ββββββββββββββββββββΊβ DEGRADED β
βββββ¬βββββ βββββββ¬βββββ
β² β β₯2 probes fail for 30s
β all probes pass for 10s βΌ
β ββββββββββββ
ββββββββββββββββββββββββββββ€ OFFLINE β
ββββββββββββ
Anti-flap: if more than 3 transitions occur within 60 seconds, the detector stays in the more pessimistic state (degraded or offline) until the window passes.
5. Behaviour
5.1 Probes
Default targets (from EmergencyConfig.probe_targets):
1.1.1.1(DNS A query)8.8.8.8(DNS A query)cloudflare.com(HTTPS HEAD)quad9.net(HTTPS HEAD)
Mode rule:
onlinerequires all 4 succeedofflinerequires β₯ 2 to fail- everything between is
degraded
5.2 Effects on the bus
When entering offline:
for entry in bus.registry.all_local():
if entry.descriptor.params.get("requires_internet"):
bus.registry.deregister_local(entry.descriptor.name, entry.descriptor.version)
log.info("offline.deregistered", capability=entry.descriptor.name)
When returning to online:
for backend in llm_service._backends:
if backend.requires_internet:
llm_service._register_backend(backend) # re-emit descriptors
requires_internet is a convention: services that wrap remote APIs (anthropic_api, hf_api) set this flag on their BackendModel and inject it into the capability descriptor params at registration time.
5.3 Effects on M02 discovery
Detector also calls peer_registry.set_pruning_aggressive(offline):
- Offline: prune stale peers after 30 s instead of 90
- Online: standard 90 s
This makes offline mode adapt faster to neighbour churn.
5.4 UI surface (M08 consumes)
The state bus is the source for the amber INTERNET OFFLINE β LOKAL AKTIV banner. UI subscribes; flips theme; switches LLM passthrough to local-only backends visibly.
5.5 Clock sanity probe (only when online)
When online for β₯ 30 s, send an extra HEAD to a single anchor and check the Date header. If our system clock differs by > 60 s, log a warning. We do NOT auto-correct.
5.6 No on-wire pubsub
emergency.mode.changed is local only (CONTRACT Β§8). Other nodes do their own detection.
6. Errors
This module raises nothing externally; all failures are logged. Internal probe failures are the normal signal that drives state.
7. Configuration
From X04 Β§3:
config.emergency.probe_targets # list[str]
Constants: EMERGENCY_PROBE_INTERVAL_ONLINE, EMERGENCY_PROBE_INTERVAL_OFFLINE, EMERGENCY_PROBE_TIMEOUT_SECONDS, EMERGENCY_TRANSITION_DEBOUNCE_SECONDS.
8. Tests
Unit
test_state_transitions_with_synthetic_probestest_anti_flap_holds_pessimistic_statetest_deregister_called_on_offline_entrytest_reregister_called_on_online_entry
Integration
test_demo_unplug_triggers_banner_within_5sβ simulate WAN drop withiptablesrule, observe state change
9. Cross-references
| What | Where |
|---|---|
| Online/offline pubsub topic (local) | CONTRACT Β§8 |
| LLM internet-dependent backends | M04 Β§4.3 |
| Discovery cadence change | M02 Β§4.3 |
| UI banner | M08 Β§5.5 |
10. Open questions
- Captive portal detection β Phase 2: probe a known-content URL and compare body hash. MVP: false positives accepted.
- IPv6-only networks β current probes are dual-stack via OS. Should work; not yet tested.
- Custom probe scripts β Phase 2: let users add their own targets.