Distributed LLM inference across browser peers, via WebRTC. Open this URL in two tabs (or send to a friend), connect, load the model in each, and run inference together.
connecting to broker…
Signaling. This demo uses the public PeerJS broker at 0.peerjs.com — rate-limited and shared. For production, run your own PeerServer (npm install peer or the Docker image) and pass {host, port, path, key} to new Peer(). Or write a PHP signaling server — it's just WebSocket + relay-room-by-id.
NAT traversal. Default ICE uses Google's public STUN. If two peers can't connect (corporate NAT, symmetric NAT) you need a TURN server. Coturn on a small VPS handles thousands of users.
CORS / COEP. transformers.js downloads from huggingface.co CDN (CORS-enabled). No special headers required unless you later add SharedArrayBuffer/threaded WASM, then add Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp.
Pipeline / MoE. The protocol layer (capability ad, request routing, error handling, circuit breaker) is fully wired. The inference handlers return { unimplemented: true } — implementing them requires a custom transformers.js fork that exposes intermediate hidden states between transformer blocks. See the handlePipelineStage and handleMoeExpert stubs at the bottom of the script for the TODO contract.