HearthNet-Nemotron / docs /p2_p3 /M14-federation.md
Chris4K's picture
p2, p3
70650b7
|
Raw
History Blame
16 kB

M14 β€” Federation

Spec version: v1.0 (Phase 2) Depends on: M01 (identity), M16 (tokens), X02 (events), X05 (DHT, for cross-LAN bootstrap), X01 (transport), M03 (bus), M15 (relay, for NAT traversal) Depended on by: M22 (mobile uses federation for cross-community discovery), all services indirectly (federated calls route through M14)


1. Responsibility

Let two communities establish a trust link and grant each other scoped access to specific capabilities. Federation is the bridge from "your house" to "the neighbourhood".

  • Establish a federation between two communities (mutual cross-signature)
  • Enforce scope on cross-community capability calls
  • Route federated calls through a designated Bridge node profile
  • Heartbeat between federated communities so liveness is visible
  • Issue federation tokens (via M16) for individual calls

Out of scope:

  • Many-community mesh federation β€” Phase 3 (this module handles bilateral only)
  • Anonymous federation β€” never

2. File layout

hearthnet/federation/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ manifest.py         # FederationManifest builder + verifier
β”œβ”€β”€ peering.py          # the cross-sig handshake
β”œβ”€β”€ relay_client.py     # outbound calls into peer community
└── service.py          # FederationService β€” registers federation.* capabilities

3. Federation manifest

Format defined in CAP2 Β§6.1. Key properties:

  • Lives in both communities' event logs
  • Signed by anchors of both, with per-community min_signatures_to_federate co-signatures
  • Carries the scope each side grants the other
  • Has an expiry (expires_at) β€” typically 1 year, with auto-renewal via heartbeats

A node may belong to one community (Phase 1) and that community may be federated with N other communities. Federation is a community-level relationship, not per-node.


4. Public API

4.1 manifest.py

# hearthnet/federation/manifest.py
from dataclasses import dataclass

@dataclass(frozen=True)
class FederationScope:
    """What one community grants the other."""
    capabilities:        list[str]                   # ["rag.query@1.0"]
    params_constraints:  dict[str, list[str]]         # {"corpus":["public-emergency"]}
    rate_limit_per_minute: int                        # cross-community budget
    data_visibility:     str                          # "public_corpora_only"|"members_only"|"open"

@dataclass(frozen=True)
class FederationManifest:
    schema_version:  int
    federation_id:   str
    community_a:     str
    community_b:     str
    established_at:  str
    expires_at:      str
    scope_a_to_b:    FederationScope                  # what A grants B
    scope_b_to_a:    FederationScope                  # what B grants A
    bootstrap_endpoints_a: list[Endpoint]
    bootstrap_endpoints_b: list[Endpoint]
    signature_a:     dict                              # {signed_by, signature, co_signers}
    signature_b:     dict

    def grants_to(self, calling_community_id: str) -> FederationScope | None:
        """Returns the scope grant *to* the calling community (if federated, else None)."""

    def is_expired(self, now: datetime | None = None) -> bool: ...

def build_federation_proposal(
    our_community_manifest: CommunityManifest,
    peer_community_manifest_url: str,
    proposed_scope_to_grant: FederationScope,
    proposed_scope_to_receive: FederationScope,
    bootstrap_endpoints: list[Endpoint],
) -> 'FederationProposal':
    """Step 1: prepare a proposal. Not yet a manifest β€” just a draft for the other side."""

@dataclass(frozen=True)
class FederationProposal:
    community_a:           str
    community_b:           str
    scope_a_to_b:          FederationScope
    scope_b_to_a:          FederationScope
    bootstrap_endpoints_a: list[Endpoint]
    bootstrap_endpoints_b: list[Endpoint]
    proposer_signature:    str

def co_sign_federation(
    proposal: FederationProposal,
    signing_kp: KeyPair,
    role: str,                 # "a" or "b"
) -> dict:
    """Returns {signed_by, signature, co_signers[]} payload."""

def finalize_federation_manifest(
    proposal: FederationProposal,
    sig_a: dict,
    sig_b: dict,
) -> FederationManifest:
    """Assemble fully-signed manifest after both sides have signed."""

def parse_federation_manifest(blob: bytes | dict) -> FederationManifest: ...
def verify_federation_manifest(
    m: FederationManifest,
    community_a_manifest: CommunityManifest,
    community_b_manifest: CommunityManifest,
) -> None:
    """Verify both sides signed, anchors are valid in their communities,
    co-signer counts meet policy, expiry is in the future."""

4.2 peering.py

# hearthnet/federation/peering.py
class FederationHandshake:
    """Conducts the multi-step cross-signing handshake.
       Stateful; one instance per active proposal."""

    def __init__(
        self,
        our_community_manifest: CommunityManifest,
        our_kp: KeyPair,
        transport_client: HttpClient,
        event_log: EventLog,
    ):
        ...

    async def initiate(
        self,
        peer_endpoints: list[Endpoint],
        scope_to_grant: FederationScope,
        scope_to_receive: FederationScope,
    ) -> FederationProposal:
        """1. Fetch peer's community manifest.
        2. Build proposal.
        3. Sign as community A's anchor.
        4. POST to peer.
        5. Receive peer's signed proposal back.
        6. Verify both signatures and gather more local co-signers if policy requires.
        Returns the fully-signed proposal ready to finalize."""

    async def accept(self, proposal: FederationProposal) -> FederationManifest:
        """The other side accepting an incoming proposal.
        Returns the finalized manifest (publishable to event log)."""

    async def publish(self, manifest: FederationManifest) -> None:
        """Append federation.peer.added event to local log.
        Push the manifest to peer so they can do the same."""

4.3 relay_client.py

# hearthnet/federation/relay_client.py
class FederationCaller:
    """Outbound side: makes calls into federated communities.
       Used by services when their request triggers a federated lookup
       (e.g. rag.query across federated corpora)."""

    def __init__(
        self,
        bus: CapabilityBus,
        our_kp: KeyPair,
        our_community_id: str,
        federation_manifests_provider: Callable[[], list[FederationManifest]],
    ):
        ...

    async def call_in_peer(
        self,
        peer_community_id: str,
        capability: str,
        version: Version,
        body: dict,
        *,
        timeout_seconds: float | None = None,
    ) -> dict:
        """1. Look up federation manifest for peer_community_id.
        2. Verify scope includes (capability, params).
        3. Issue an auth.token via local M16 with capability scope.
        4. Pick a peer Bridge endpoint (from manifest.bootstrap_endpoints_b).
        5. POST /bus/v1/call to peer's federation.proxy@1.0 with token + body.
        Returns the result. Raises FederationError on scope/auth issues."""

    async def stream_in_peer(...) -> AsyncIterator[Frame]:
        """Streaming variant."""

4.4 service.py

# hearthnet/federation/service.py
class FederationService:
    name    = "federation"
    version = "1.0"

    def __init__(
        self,
        bus: CapabilityBus,
        event_log: EventLog,
        replay_engine: ReplayEngine,
        author_kp: KeyPair,
        community_manifest_provider: Callable[[], CommunityManifest],
        revocation_cache: RevocationCache,
    ):
        ...

    def capabilities(self) -> list[tuple[CapabilityDescriptor, Callable, ParamsPredicate]]:
        """Registers: federation.peer.add, federation.peer.remove,
        federation.peer.list, federation.proxy (all @1.0)."""

    async def start(self) -> None: ...
    async def stop(self) -> None: ...
    def health(self) -> dict: ...

    # --- handlers ---

    async def handle_peer_add(self, req: RouteRequest) -> dict:
        """CAP2 Β§4.1. Run handshake; check co-signatures; emit event."""

    async def handle_peer_remove(self, req: RouteRequest) -> dict:
        """CAP2 Β§4.2."""

    async def handle_peer_list(self, req: RouteRequest) -> dict:
        """CAP2 Β§4.3."""

    async def handle_proxy(self, req: RouteRequest) -> dict | AsyncIterator[dict]:
        """CAP2 Β§4.4. Forward a federated call to the local bus.
        1. Verify token attached to request.
        2. Look up federation manifest, check scope.
        3. Call bus.call(target_capability, version, body) internally.
        4. Return result (or stream frames)."""

    # --- maintenance ---

    async def heartbeat_loop(self) -> None:
        """Per FEDERATION_HEARTBEAT_SECONDS, ping each federated peer's
        federation.peer.list. Update last_heartbeat. Emit
        federation.heartbeat event."""

5. Behaviour

5.1 Two-phase handshake

Community A's anchor decides to federate with B.
  ↓
A: peering.initiate(B_endpoints, scope_a_to_b, scope_b_to_a)
   β†’ fetch B's community manifest
   β†’ build proposal; sign as A
   β†’ POST /federation/proposal to B's bridge
   ↓
B receives proposal, presents to a trusted member.
  human decision (or auto-policy)
   ↓
B: peering.accept(proposal)
   β†’ sign as B
   β†’ return signed proposal
   ↓
A: gather more local co-signers if our policy.min_signatures_to_federate > 1
   ↓
A: finalize_federation_manifest(proposal, sig_a, sig_b)
   ↓
A: publish federation.peer.added event locally
   ↓
A: POST manifest to B so they publish too
   ↓
both communities heartbeat each other periodically

5.2 Bridge node profile

A community designates one or more nodes with profile: "bridge". Bridge nodes:

  • Always-on (best-effort)
  • Have a publicly-reachable endpoint or a relay-tier (M15) registration
  • Run FederationService and act as the proxy for inbound federated calls
  • Hold the bandwidth budget for cross-community traffic

Non-bridge nodes can still call into federated communities (via M14 FederationCaller); they just don't serve cross-community calls.

5.3 Scope enforcement (inbound)

When federation.proxy is invoked:

  1. Caller signature verified (Phase 1 Β§1.3)
  2. Caller's community is parsed from token's iss
  3. Federation manifest lookup; absent β†’ not_federated
  4. Scope check: (capability, version) ∈ scope and params allowed β†’ else federation_forbidden
  5. Token's signature verified against issuer's community anchors
  6. Token's aud must match our community
  7. Token's scope βŠ† federation manifest's scope (caller's community can't grant themselves more than they were granted)
  8. Dispatch internally via bus
  9. Record metrics: hearthnet_federation_calls_total{peer_community, capability, result}

5.4 Heartbeats and expiry

  • Every FEDERATION_HEARTBEAT_SECONDS (300), each bridge calls federation.peer.list on each federated peer
  • If a heartbeat fails 3 times in a row, the peer is marked degraded in the local view
  • Federation manifests have expires_at. 30 days before expiry, a renewal handshake is auto-initiated. If renewal fails by expiry, the federation lapses; calls return not_federated.

5.5 Revocation

To break a federation:

  • Either side may call federation.peer.remove
  • Co-signature requirements: same as creation (policy.min_signatures_to_federate)
  • Event federation.peer.removed is published locally; peer is notified and publishes their own
  • All outstanding tokens issued under this federation are implicitly revoked (M16 verifies federation still exists)

5.6 Identity import (Phase 2.5 hook)

A federated user with NodeID in community A wishing to access community B's services as themselves (not via their community A anchor's token) can use federation.identity.attest (reserved capability, Phase 2.5). Out of scope for first cut.

5.7 Trust transitivity

Not transitive. A↔B and B↔C do not imply A↔C. Each pair establishes its own manifest. This is intentional β€” explicit consent.

5.8 Conflict: federation with revoked member's community

If community A has federated with B, and later A's anchor (the signer) is revoked from A:

  • The federation manifest's signature was valid at sign time
  • Going forward, A's community may renew with a new anchor signature
  • B verifies federation against A's current anchor set on every call β€” if no current anchor co-signs, the federation is invalid

6. Discovery integration

A community wishing to find a federated peer they haven't talked to in a while:

  1. Look up bootstrap_endpoints in their stored federation manifest
  2. Try each; if all fail, fall back to X05 DHT: find_value(blake3(peer_community_id))
  3. If DHT also returns nothing, try M15 relay: relay.lookup_community(peer_community_id)
  4. Only after all three fail β†’ mark federation as unreachable; UI shows offline indicator

7. Errors

FederationError:

  • not_federated β€” no manifest for this peer
  • federation_expired β€” manifest past expires_at
  • scope_violation β€” request outside granted scope
  • bridge_unreachable β€” couldn't reach any of peer's bridges
  • co_signer_insufficient β€” proposal lacks required signature count
  • peer_community_invalid β€” peer's manifest failed verification

Wire mapping per CAP2 Β§9.


8. Configuration

config.federation.enabled              = False           # opt-in
config.federation.bridge_node          = False           # we serve cross-community calls
config.federation.relay_url            = None            # M15 hosted relay for NAT
config.federation.auto_renew_days_before = 30
config.federation.max_peer_communities = 16
config.federation.heartbeat_seconds    = FEDERATION_HEARTBEAT_SECONDS
config.federation.scope_default_rate_limit_per_minute = 60

Constants: FEDERATION_MANIFEST_TTL_SECONDS, FEDERATION_HEARTBEAT_SECONDS.


9. Tests

Unit

  • test_federation_proposal_builds_correctly
  • test_co_sign_signature_verifies
  • test_finalize_requires_min_signers
  • test_grants_to_returns_scope_for_correct_direction
  • test_expired_federation_rejects_calls

Integration

  • test_two_community_federation_round_trip β€” A and B in different processes federate, then A queries B's RAG via proxy
  • test_scope_violation_returns_403
  • test_heartbeat_marks_degraded_after_3_failures
  • test_revocation_breaks_existing_tokens
  • test_renewal_30_days_before_expiry

Chaos

  • test_partition_during_federation_handshake_resumable

10. Cross-references

What Where
Federation manifest schema CAP2 Β§6.1
federation.* capabilities CAP2 Β§4.1–4.4
Token issuance for cross-community M16 Β§5.1
DHT bootstrap X05 Β§4.3
Relay tier NAT traversal M15
Bridge node profile Phase 1 PRD Β§5.4 + this module Β§5.2
Phase 3 transitive federation TBD

11. Open questions

  1. Multi-party federation (mesh of N>2 communities) β€” currently bilateral only. Phase 3 candidate.
  2. Federated marketplace β€” should market.list cross federations? Reserved scope param; default off.
  3. Federated identity β€” single-sign-on across federated communities. Phase 2.5; design depends on token-on-token.
  4. Federation revocation event propagation β€” if A↔B and A↔C, and B unilaterally revokes A, should C see this? MVP: no, each pair is independent.
  5. Audit log for federation activity β€” should there be a separate "federation_audit" log so cross-community activity is easy to surface to operators? Yes, Phase 2.5.