File size: 6,894 Bytes
6f9a5fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# M09 β€” Emergency Mode Detector

**Spec version:** v1.0
**Depends on:** M03 (bus, to deregister internet-dependent capabilities), X04 (config), X03 (observability), `httpx`, `socket`
**Depended on by:** M08 (UI shows banner), M04 (re-registers internet backends on restore), M02 (increases discovery cadence)

---

## 1. Responsibility

Detect whether the node has working internet access. Publish state transitions locally. Cause the bus to deregister/re-register internet-dependent capabilities and let other modules react.

Out of scope:
- VPN / overlay status
- Per-service connectivity checks
- Cellular signal strength

---

## 2. File layout

```
hearthnet/emergency/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ detector.py        # Detector: probe loop, state machine
└── state.py           # EmergencyState dataclass + StateBus
```

---

## 3. Public API

### 3.1 `state.py`

```python
# hearthnet/emergency/state.py
from dataclasses import dataclass
from typing import Literal

Mode = Literal["online", "degraded", "offline"]

@dataclass(frozen=True)
class EmergencyState:
    mode:        Mode
    since:       str           # RFC 3339
    last_probe:  str
    probe_results: dict[str, bool]   # target β†’ success

class StateBus:
    """In-process pubsub for state changes. UI and other modules subscribe."""

    def __init__(self): ...
    def current(self) -> EmergencyState: ...
    async def subscribe(self) -> AsyncIterator[EmergencyState]: ...
    def _emit(self, state: EmergencyState) -> None: ...    # internal
```

### 3.2 `detector.py`

```python
# hearthnet/emergency/detector.py
class Detector:
    def __init__(
        self,
        config: EmergencyConfig,
        bus: CapabilityBus,
        state_bus: StateBus,
    ):
        ...

    async def run(self) -> None:
        """Main loop. Cancel-safe.
        Probe cadence:
          - online β†’ every EMERGENCY_PROBE_INTERVAL_ONLINE (10s)
          - degraded β†’ every EMERGENCY_PROBE_INTERVAL_OFFLINE (2s)
          - offline β†’ every EMERGENCY_PROBE_INTERVAL_OFFLINE (2s)
        Each tick:
          1. probe all targets concurrently with 2s timeout
          2. compute new mode
          3. apply debounce (EMERGENCY_TRANSITION_DEBOUNCE_SECONDS, anti-flap)
          4. if mode changed:
              - state_bus._emit(new_state)
              - if entered offline: bus deregisters internet-dependent capabilities
              - if entered online: bus re-registers them
              - emit log + metric
        """

    async def shutdown(self) -> None: ...

    # --- probe primitives ---

    async def _probe_dns(self, host: str) -> bool: ...
    async def _probe_http(self, url: str) -> bool: ...
```

---

## 4. State machine

```
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  any probe fails  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ ONLINE β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”‚ DEGRADED β”‚
              β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                  β–²                               β”‚  β‰₯2 probes fail for 30s
                  β”‚ all probes pass for 10s       β–Ό
                  β”‚                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  └─────────────────────────── OFFLINE  β”‚
                                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

Anti-flap: if more than 3 transitions occur within 60 seconds, the detector stays in the more pessimistic state (degraded or offline) until the window passes.

---

## 5. Behaviour

### 5.1 Probes

Default targets (from `EmergencyConfig.probe_targets`):

- `1.1.1.1` (DNS A query)
- `8.8.8.8` (DNS A query)
- `cloudflare.com` (HTTPS HEAD)
- `quad9.net` (HTTPS HEAD)

Mode rule:

- `online` requires all 4 succeed
- `offline` requires β‰₯ 2 to fail
- everything between is `degraded`

### 5.2 Effects on the bus

When entering `offline`:

```python
for entry in bus.registry.all_local():
    if entry.descriptor.params.get("requires_internet"):
        bus.registry.deregister_local(entry.descriptor.name, entry.descriptor.version)
        log.info("offline.deregistered", capability=entry.descriptor.name)
```

When returning to `online`:

```python
for backend in llm_service._backends:
    if backend.requires_internet:
        llm_service._register_backend(backend)        # re-emit descriptors
```

`requires_internet` is a convention: services that wrap remote APIs (`anthropic_api`, `hf_api`) set this flag on their `BackendModel` and inject it into the capability descriptor params at registration time.

### 5.3 Effects on M02 discovery

Detector also calls `peer_registry.set_pruning_aggressive(offline)`:

- Offline: prune stale peers after 30 s instead of 90
- Online: standard 90 s

This makes offline mode adapt faster to neighbour churn.

### 5.4 UI surface (M08 consumes)

The state bus is the source for the amber `INTERNET OFFLINE β€” LOKAL AKTIV` banner. UI subscribes; flips theme; switches LLM passthrough to local-only backends visibly.

### 5.5 Clock sanity probe (only when online)

When online for β‰₯ 30 s, send an extra HEAD to a single anchor and check the `Date` header. If our system clock differs by > 60 s, log a warning. We do NOT auto-correct.

### 5.6 No on-wire pubsub

`emergency.mode.changed` is local only ([CONTRACT Β§8](../CAPABILITY_CONTRACT.md)). Other nodes do their own detection.

---

## 6. Errors

This module raises nothing externally; all failures are logged. Internal probe failures are the *normal* signal that drives state.

---

## 7. Configuration

From [X04 Β§3](../cross-cutting/X04-config.md):

```python
config.emergency.probe_targets    # list[str]
```

Constants: `EMERGENCY_PROBE_INTERVAL_ONLINE`, `EMERGENCY_PROBE_INTERVAL_OFFLINE`, `EMERGENCY_PROBE_TIMEOUT_SECONDS`, `EMERGENCY_TRANSITION_DEBOUNCE_SECONDS`.

---

## 8. Tests

### Unit
- `test_state_transitions_with_synthetic_probes`
- `test_anti_flap_holds_pessimistic_state`
- `test_deregister_called_on_offline_entry`
- `test_reregister_called_on_online_entry`

### Integration
- `test_demo_unplug_triggers_banner_within_5s` β€” simulate WAN drop with `iptables` rule, observe state change

---

## 9. Cross-references

| What | Where |
|------|-------|
| Online/offline pubsub topic (local) | [CONTRACT Β§8](../CAPABILITY_CONTRACT.md) |
| LLM internet-dependent backends | [M04 Β§4.3](M04-llm.md) |
| Discovery cadence change | [M02 Β§4.3](M02-discovery.md) |
| UI banner | [M08 Β§5.5](M08-ui.md) |

---

## 10. Open questions

1. **Captive portal detection** β€” Phase 2: probe a known-content URL and compare body hash. MVP: false positives accepted.
2. **IPv6-only networks** β€” current probes are dual-stack via OS. Should work; not yet tested.
3. **Custom probe scripts** β€” Phase 2: let users add their own targets.