Spaces:
Running on Zero
Running on Zero
File size: 6,894 Bytes
6f9a5fd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | # M09 β Emergency Mode Detector
**Spec version:** v1.0
**Depends on:** M03 (bus, to deregister internet-dependent capabilities), X04 (config), X03 (observability), `httpx`, `socket`
**Depended on by:** M08 (UI shows banner), M04 (re-registers internet backends on restore), M02 (increases discovery cadence)
---
## 1. Responsibility
Detect whether the node has working internet access. Publish state transitions locally. Cause the bus to deregister/re-register internet-dependent capabilities and let other modules react.
Out of scope:
- VPN / overlay status
- Per-service connectivity checks
- Cellular signal strength
---
## 2. File layout
```
hearthnet/emergency/
βββ __init__.py
βββ detector.py # Detector: probe loop, state machine
βββ state.py # EmergencyState dataclass + StateBus
```
---
## 3. Public API
### 3.1 `state.py`
```python
# hearthnet/emergency/state.py
from dataclasses import dataclass
from typing import Literal
Mode = Literal["online", "degraded", "offline"]
@dataclass(frozen=True)
class EmergencyState:
mode: Mode
since: str # RFC 3339
last_probe: str
probe_results: dict[str, bool] # target β success
class StateBus:
"""In-process pubsub for state changes. UI and other modules subscribe."""
def __init__(self): ...
def current(self) -> EmergencyState: ...
async def subscribe(self) -> AsyncIterator[EmergencyState]: ...
def _emit(self, state: EmergencyState) -> None: ... # internal
```
### 3.2 `detector.py`
```python
# hearthnet/emergency/detector.py
class Detector:
def __init__(
self,
config: EmergencyConfig,
bus: CapabilityBus,
state_bus: StateBus,
):
...
async def run(self) -> None:
"""Main loop. Cancel-safe.
Probe cadence:
- online β every EMERGENCY_PROBE_INTERVAL_ONLINE (10s)
- degraded β every EMERGENCY_PROBE_INTERVAL_OFFLINE (2s)
- offline β every EMERGENCY_PROBE_INTERVAL_OFFLINE (2s)
Each tick:
1. probe all targets concurrently with 2s timeout
2. compute new mode
3. apply debounce (EMERGENCY_TRANSITION_DEBOUNCE_SECONDS, anti-flap)
4. if mode changed:
- state_bus._emit(new_state)
- if entered offline: bus deregisters internet-dependent capabilities
- if entered online: bus re-registers them
- emit log + metric
"""
async def shutdown(self) -> None: ...
# --- probe primitives ---
async def _probe_dns(self, host: str) -> bool: ...
async def _probe_http(self, url: str) -> bool: ...
```
---
## 4. State machine
```
ββββββββββ any probe fails ββββββββββββ
β ONLINE ββββββββββββββββββββΊβ DEGRADED β
βββββ¬βββββ βββββββ¬βββββ
β² β β₯2 probes fail for 30s
β all probes pass for 10s βΌ
β ββββββββββββ
ββββββββββββββββββββββββββββ€ OFFLINE β
ββββββββββββ
```
Anti-flap: if more than 3 transitions occur within 60 seconds, the detector stays in the more pessimistic state (degraded or offline) until the window passes.
---
## 5. Behaviour
### 5.1 Probes
Default targets (from `EmergencyConfig.probe_targets`):
- `1.1.1.1` (DNS A query)
- `8.8.8.8` (DNS A query)
- `cloudflare.com` (HTTPS HEAD)
- `quad9.net` (HTTPS HEAD)
Mode rule:
- `online` requires all 4 succeed
- `offline` requires β₯ 2 to fail
- everything between is `degraded`
### 5.2 Effects on the bus
When entering `offline`:
```python
for entry in bus.registry.all_local():
if entry.descriptor.params.get("requires_internet"):
bus.registry.deregister_local(entry.descriptor.name, entry.descriptor.version)
log.info("offline.deregistered", capability=entry.descriptor.name)
```
When returning to `online`:
```python
for backend in llm_service._backends:
if backend.requires_internet:
llm_service._register_backend(backend) # re-emit descriptors
```
`requires_internet` is a convention: services that wrap remote APIs (`anthropic_api`, `hf_api`) set this flag on their `BackendModel` and inject it into the capability descriptor params at registration time.
### 5.3 Effects on M02 discovery
Detector also calls `peer_registry.set_pruning_aggressive(offline)`:
- Offline: prune stale peers after 30 s instead of 90
- Online: standard 90 s
This makes offline mode adapt faster to neighbour churn.
### 5.4 UI surface (M08 consumes)
The state bus is the source for the amber `INTERNET OFFLINE β LOKAL AKTIV` banner. UI subscribes; flips theme; switches LLM passthrough to local-only backends visibly.
### 5.5 Clock sanity probe (only when online)
When online for β₯ 30 s, send an extra HEAD to a single anchor and check the `Date` header. If our system clock differs by > 60 s, log a warning. We do NOT auto-correct.
### 5.6 No on-wire pubsub
`emergency.mode.changed` is local only ([CONTRACT Β§8](../CAPABILITY_CONTRACT.md)). Other nodes do their own detection.
---
## 6. Errors
This module raises nothing externally; all failures are logged. Internal probe failures are the *normal* signal that drives state.
---
## 7. Configuration
From [X04 Β§3](../cross-cutting/X04-config.md):
```python
config.emergency.probe_targets # list[str]
```
Constants: `EMERGENCY_PROBE_INTERVAL_ONLINE`, `EMERGENCY_PROBE_INTERVAL_OFFLINE`, `EMERGENCY_PROBE_TIMEOUT_SECONDS`, `EMERGENCY_TRANSITION_DEBOUNCE_SECONDS`.
---
## 8. Tests
### Unit
- `test_state_transitions_with_synthetic_probes`
- `test_anti_flap_holds_pessimistic_state`
- `test_deregister_called_on_offline_entry`
- `test_reregister_called_on_online_entry`
### Integration
- `test_demo_unplug_triggers_banner_within_5s` β simulate WAN drop with `iptables` rule, observe state change
---
## 9. Cross-references
| What | Where |
|------|-------|
| Online/offline pubsub topic (local) | [CONTRACT Β§8](../CAPABILITY_CONTRACT.md) |
| LLM internet-dependent backends | [M04 Β§4.3](M04-llm.md) |
| Discovery cadence change | [M02 Β§4.3](M02-discovery.md) |
| UI banner | [M08 Β§5.5](M08-ui.md) |
---
## 10. Open questions
1. **Captive portal detection** β Phase 2: probe a known-content URL and compare body hash. MVP: false positives accepted.
2. **IPv6-only networks** β current probes are dual-stack via OS. Should work; not yet tested.
3. **Custom probe scripts** β Phase 2: let users add their own targets.
|