File size: 12,072 Bytes
70650b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
# X05 β€” Distributed Hash Table (DHT)

**Spec version:** v1.0 (Phase 2)
**Depends on:** M01 (identity), X01 (transport), X04 (config), X03 (observability)
**Depended on by:** M14 (federation discovery), M07 ext (background blob replication via content routing), M02 ext (cross-LAN peer discovery), M15 (relay bootstrap)

---

## 1. Responsibility

Provide a Kademlia-style DHT over the internet that lets:

- A node find peers of its own community across LANs (cross-LAN extension of M02)
- A node find sources of a specific CID across communities (extension of M07's local source index)
- A node bootstrap into a federation (find an anchor of community X without knowing its IP)

Out of scope:
- Permanent storage in the DHT β€” DHT entries are TTL'd advertisements only
- Anonymity (no onion routing β€” there's no anonymity goal)
- Sybil resistance β€” communities are the trust roots; DHT is an unreliable hint layer

---

## 2. File layout

```
hearthnet/dht/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ kademlia.py        # KademliaNode, routing table
β”œβ”€β”€ routing.py         # FindNode, FindValue, Store, Ping RPCs
β”œβ”€β”€ storage.py         # local DHT k/v store with TTL
└── bootstrap.py       # bootstrap peer list, NAT-aware reachability
```

---

## 3. Concepts

### 3.1 Key space

XOR-distance over a 256-bit key space. Keys are derived as:

- For peers: `key = blake3(node_id_full)[:32]`
- For CIDs: `key = blake3(cid_string)[:32]`
- For communities: `key = blake3(community_id_full)[:32]`

### 3.2 Bucket structure

Standard Kademlia: 256 buckets of size `DHT_REPLICATION_K = 8` (from Phase 2 constants). Concurrent lookups: `DHT_ALPHA = 3`.

### 3.3 Values stored

The DHT does **not** store community state. It stores small, signed advertisements:

| Value type | Key | Value (signed) | TTL |
|------------|-----|----------------|-----|
| Peer presence | `blake3(node_id)` | `{endpoints, community_id, expires_at}` | matches manifest TTL (30s) |
| CID source | `blake3(cid)` | `{node_id, last_seen}` | 1 hour |
| Community bootstrap | `blake3(community_id)` | `{anchor_node_ids, endpoints, manifest_url}` | 24 hours |

The DHT is a **hint cache**. Authoritative state lives in community event logs.

---

## 4. Public API

### 4.1 `kademlia.py`

```python
# hearthnet/dht/kademlia.py
from dataclasses import dataclass

@dataclass(frozen=True)
class DhtContact:
    node_key:    bytes            # 32 bytes
    node_id_full: str
    endpoint:    Endpoint
    last_seen:   float

@dataclass(frozen=True)
class DhtValue:
    """A stored advertisement. The payload is a signed dict."""
    key:        bytes
    payload:    dict              # has 'signature' field
    expires_at: int               # unix seconds

class KademliaNode:
    """One node's view of the DHT.
       Provides high-level find_node / find_value / store APIs."""

    def __init__(
        self,
        kp: KeyPair,
        endpoint: Endpoint,
        transport_client: HttpClient,
        bootstrap_endpoints: list[Endpoint],
    ):
        ...

    async def start(self) -> None:
        """Bootstrap: ping bootstrap_endpoints, populate routing table."""

    async def stop(self) -> None: ...

    # --- public lookups ---

    async def find_node(self, target_key: bytes) -> list[DhtContact]:
        """Return the k closest contacts to target_key."""

    async def find_value(self, key: bytes) -> list[DhtValue]:
        """Return values stored at this key (or empty)."""

    async def store(self, value: DhtValue) -> int:
        """Replicate to k closest nodes. Returns count of successful stores."""

    # --- maintenance ---

    async def refresh_buckets(self) -> None:
        """Per DHT_REFRESH_SECONDS: ping a random key in each bucket to liveness-check it."""

    async def republish_values(self) -> None:
        """Per DHT_REPUBLISH_SECONDS: re-store our own advertisements so TTL doesn't expire them."""

    # --- introspection ---

    def routing_table_size(self) -> int: ...
    def stored_values(self) -> int: ...
```

### 4.2 `routing.py`

Wire RPCs exposed by the bus transport (X01) as additional endpoints under `/dht/v1/`:

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/dht/v1/ping` | POST | Liveness, exchange contact info |
| `/dht/v1/find_node` | POST | Return k closest contacts to a key |
| `/dht/v1/find_value` | POST | Return values at a key OR closest contacts if absent |
| `/dht/v1/store` | POST | Accept a value into local storage (if we're among the k closest) |

```python
# hearthnet/dht/routing.py
async def serve_ping(req: dict) -> dict: ...
async def serve_find_node(req: dict, kademlia: KademliaNode) -> dict: ...
async def serve_find_value(req: dict, kademlia: KademliaNode) -> dict: ...
async def serve_store(req: dict, kademlia: KademliaNode) -> dict: ...

# Request / response shapes documented inline in each function.
```

### 4.3 `storage.py`

```python
# hearthnet/dht/storage.py
class DhtStore:
    """Local key-value store with TTL eviction.
       Backing: SQLite in <DATA>/dht/store.sqlite."""

    def __init__(self, db_path: Path):
        ...

    def put(self, value: DhtValue) -> bool:
        """Idempotent. Returns True if stored (we're in k closest), False if rejected."""

    def get(self, key: bytes) -> list[DhtValue]:
        """Return non-expired values for this key."""

    def evict_expired(self) -> int: ...

    def size(self) -> int: ...
```

### 4.4 `bootstrap.py`

```python
# hearthnet/dht/bootstrap.py

DEFAULT_BOOTSTRAP_NODES: list[Endpoint] = [
    # Filled at packaging time with community-run bootstrap endpoints.
    # Christof's relay.hearthnet.de will be a default.
]

async def is_reachable(endpoint: Endpoint, timeout_seconds: float = 5) -> bool:
    """Send a ping; return True if responded."""

async def discover_external_ip() -> str | None:
    """Use STUN against a public STUN server to learn our external IP.
       Used by relay-assisted bootstrap to advertise reachable endpoints."""
```

---

## 5. Behaviour

### 5.1 Advertisement lifecycle

```
Node starts β†’ KademliaNode.start()
  β†’ ping bootstrap_endpoints; build initial routing table
  β†’ store our peer presence: store(DhtValue(blake3(node_id), {...}, ttl=30s))
  β†’ store community bootstrap: store(DhtValue(blake3(community_id), {anchors, ...}, ttl=24h))
  β†’ for each pinned CID: store(DhtValue(blake3(cid), {node_id, ...}, ttl=1h))
  ↓
Every MANIFEST_REPUBLISH_INTERVAL_SECONDS: re-store peer presence
Every DHT_REPUBLISH_SECONDS: re-store all our advertisements
Every DHT_REFRESH_SECONDS: refresh routing table buckets
```

### 5.2 Lookup integration with M02

When [M02 PeerRegistry](../../modules/M02-discovery.md) doesn't find a peer for a known community on the LAN:

```python
# M02 extension (Phase 2)
async def find_remote_peers(community_id: str) -> list[PeerRecord]:
    if dht is None:
        return []
    contacts = await dht.find_value(blake3(community_id))
    candidates = [parse_community_bootstrap(v.payload) for v in contacts]
    return await fetch_manifests_and_filter(candidates, community_id)
```

### 5.3 Lookup integration with M07

When [M07 TransferManager](../../modules/M07-file-blobs.md) needs sources for a CID and the local `file.cid.advertised` index is empty:

```python
# M07 extension (Phase 2)
async def find_remote_sources(cid: str) -> list[str]:  # NodeIDs
    contacts = await dht.find_value(blake3(cid))
    return [parse_source_advert(v.payload).node_id for v in contacts]
```

### 5.4 Signature requirement on stored values

Every DHT value's `payload` must contain a `signature` field signed by the advertiser. Receivers reject values whose signature does not validate against the advertiser's claimed NodeID. Cost is small; protection is essential β€” without it, anyone can poison the DHT.

### 5.5 NAT traversal hooks

The DHT itself does not do hole-punching. It cooperates with [M15 Relay Tier](../M15-relay-tier.md):

- If our advertised endpoint is unreachable (NAT'd), we additionally advertise `via_relay: "<relay_url>"` in the value payload
- Peers wanting to reach us see the relay hint and route through it
- Direct peer-to-peer over NAT (STUN/TURN) is Phase 3

### 5.6 Privacy of the DHT

The DHT is a public-internet-facing component (by definition). It leaks:
- Which NodeIDs exist
- Which communities exist
- Which CIDs are popular

It does **not** leak:
- The contents of any blob
- The contents of community event logs
- Who's actually a member of a community (membership is in the signed manifest, fetched out of band)

This is acceptable for a system whose goal is community resilience, not anonymity.

### 5.7 Anti-spam

- Per-source rate limit on `store` calls: max 100 per minute per node
- Stored value size cap: 4 KB
- Per-bucket eviction prefers values with higher signature reputation (Phase 3)

### 5.8 Bootstrap reachability

`bootstrap_endpoints` (from config) are tried in order. If all fail, the node logs a warning and continues with mDNS+UDP only. The DHT is best-effort.

---

## 6. Wire format (request/response examples)

### 6.1 `POST /dht/v1/find_value`

Request:
```json
{
  "key":      "blake3:<hex of 32 bytes>",
  "from":     "ed25519:<our NodeID>",
  "trace_id": "01HXR...",
  "signature": "ed25519:<over the above three fields canonicalised>"
}
```

Response (value found):
```json
{
  "values": [
    {
      "key":     "blake3:...",
      "payload": {"node_id":"...","endpoints":[...],"signature":"ed25519:..."},
      "expires_at": 1717942800
    }
  ]
}
```

Response (not found, get closer contacts):
```json
{
  "values":  [],
  "closer":  [{"node_id_full":"ed25519:...","endpoint":{"host":"...","port":7080}}, "..."]
}
```

---

## 7. Errors

`DhtError`:

- `bootstrap_failed` β€” no bootstrap endpoint reachable
- `lookup_timeout` β€” couldn't find value or contacts within DHT_LOOKUP_TIMEOUT
- `store_unauthorized` β€” payload signature invalid
- `value_too_large` β€” > 4 KB
- `rate_limited` β€” per-source store rate exceeded

These don't always map to wire codes β€” most DHT activity is internal to the node. When they bubble up to a caller, `dht_lookup_failed` is the wire code.

---

## 8. Configuration

```python
config.dht.enabled                  = False  # opt-in; phase 1 default off
config.dht.bootstrap_endpoints      = [...]
config.dht.public_endpoint_override = None   # for nodes behind NAT, manual override
config.dht.advertise_cids           = True   # also advertise pinned CIDs
config.dht.advertise_community      = True
```

Constants used: `DHT_REPLICATION_K=8`, `DHT_ALPHA=3`, `DHT_REFRESH_SECONDS=3600`, `DHT_REPUBLISH_SECONDS=86400`.

---

## 9. Tests

### Unit
- `test_xor_distance_metric`
- `test_routing_table_insert_eviction`
- `test_signed_value_verification`
- `test_unsigned_value_rejected`
- `test_ttl_eviction`

### Integration
- `test_three_node_dht_find_value` β€” three KademliaNodes in process, store, find
- `test_bootstrap_picks_up_existing_dht`
- `test_partition_then_reconnect_converges`
- `test_value_republish_keeps_alive`

### Property-based
- `test_kademlia_eventual_consistency_under_churn` (Hypothesis-driven)

---

## 10. Cross-references

| What | Where |
|------|-------|
| Used by federation bootstrap | [M14 Β§4.3](../modules/M14-federation.md) |
| Used by background blob replication | M07 ext (see [00-OVERVIEW Β§1](../00-OVERVIEW.md)) |
| Wire error code | `dht_lookup_failed` in [CAP2 Β§9](../CAPABILITY_CONTRACT_v2.md) |
| Phase 1 alternative (mDNS/UDP) | [M02](../../modules/M02-discovery.md) |
| Phase 3 sybil resistance | TBD |

---

## 11. Open questions

1. **libp2p reuse vs custom Python.** libp2p has a Python port but it's heavyweight. A focused 1000-LOC Kademlia matches our needs and stays auditable. Decision: custom for now; can swap.
2. **NAT hole punching.** Currently relay-only. STUN/TURN integration is Phase 3.
3. **Public DHT vs federated DHTs.** Should the DHT itself be federated (per-community DHT joined via cross-sig)? Maybe. Defer.
4. **Onion routing.** Out of scope. HearthNet has no anonymity goal.