Reachy Mini Home Assistant Voice Assistant - Architecture Design
1. System Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Application Layer β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Home β β Web UI β β Console β β
β β Assistant β β (Gradio) β β Interface β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Business Logic Layer β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Voice β β Motion β β State β β
β β Manager β β Controller β β Manager β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β ββββββββββββββββ ββββββββββββββββ β
β β ESPHome β β Event β β
β β Handler β β Dispatcher β β
β ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Services Layer β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Wake Word β β Audio β β Motion β β
β β Detector β β Processor β β Queue β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ESPHome Protocol (Audio Streaming to/from HA) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hardware Abstraction Layer β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Audio β β Motion β β Camera β β
β β Adapter β β Adapter β β Adapter β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β ββββββββββββββββ ββββββββββββββββ β
β β Reachy Mini β β ESPHome β β
β β SDK Wrapper β β Protocol β β
β ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Reachy Mini Hardware + Home Assistant β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Microphone β β Head Motors β β Camera β β
β β Array (4) β β (6 DOF) β β (Wide) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β ββββββββββββββββ ββββββββββββββββ β
β β Speaker β β Antennas β β
β β (5W) β β (2) β β
β ββββββββββββββββ ββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Home Assistant (STT/TTS Processing) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Core Design Principles
2.1 Based on linux-voice-assistant
This project is based on the architecture of OHF-Voice/linux-voice-assistant, with key features:
- STT/TTS Handled by Home Assistant: Audio data is transmitted to Home Assistant via ESPHome protocol for speech recognition and synthesis
- Local Wake Word Detection: Uses microWakeWord or openWakeWord for offline wake word detection
- ESPHome Protocol Communication: Communicates with Home Assistant via ESPHome protocol
- Motion Control Enhancement: Integrates Reachy Mini's motion control capabilities
2.2 Architecture Characteristics
- Modular Design: Audio, voice, motion, and ESPHome modules are independent
- Asynchronous Processing: Uses asyncio for high-performance asynchronous processing
- State Management: Centralized state management (ServerState)
- Event-Driven: Event-based communication mechanism
3. Module Design
3.1 Audio Module (audio/)
Responsibilities:
- Audio device management (microphone, speaker)
- Audio recording and playback
- Audio format conversion (16KHz mono PCM)
Interfaces:
class AudioAdapter(ABC):
"""Audio device adapter abstract base class"""
@abstractmethod
async def list_input_devices(self) -> List[AudioDevice]:
"""List available audio input devices"""
pass
@abstractmethod
async def list_output_devices(self) -> List[AudioDevice]:
"""List available audio output devices"""
pass
@abstractmethod
async def start_recording(self, device: str, callback: Callable) -> None:
"""Start audio recording"""
pass
@abstractmethod
async def stop_recording(self) -> None:
"""Stop audio recording"""
pass
@abstractmethod
async def play_audio(self, audio_data: bytes, device: str) -> None:
"""Play audio"""
pass
Key Components:
adapter.py: Audio device adapter implementationprocessor.py: Audio processor (format conversion, buffering)
3.2 Voice Module (voice/)
Responsibilities:
- Wake word detection (local offline)
- STT (Speech-to-Text) - backup implementation
- TTS (Text-to-Speech) - backup implementation
Interfaces:
class WakeWordDetector(ABC):
"""Wake word detector abstract base class"""
@abstractmethod
async def load_model(self, model_path: str) -> None:
"""Load wake word model"""
pass
@abstractmethod
async def detect(self, audio_chunk: bytes) -> bool:
"""Detect wake word in audio chunk"""
pass
@abstractmethod
async def set_sensitivity(self, sensitivity: float) -> None:
"""Set detection sensitivity"""
pass
Key Components:
detector.py: Wake word detector (microWakeWord/openWakeWord)stt.py: STT engine (Whisper - backup)tts.py: TTS engine (Piper - backup)
3.3 Motion Module (motion/)
Responsibilities:
- Head motion control (6 DOF)
- Antenna animation
- Motion queue management (priority-based)
- Speech-reactive motions
Interfaces:
class MotionController(ABC):
"""Motion controller abstract base class"""
@abstractmethod
async def connect(self, host: str, wireless: bool) -> None:
"""Connect to Reachy Mini"""
pass
@abstractmethod
async def move_head(self, pose: HeadPose, duration: float) -> None:
"""Move head to specified pose"""
pass
@abstractmethod
async def set_antenna(self, antenna_id: int, angle: float) -> None:
"""Set antenna angle"""
pass
@abstractmethod
async def play_emotion(self, emotion: str) -> None:
"""Play emotion"""
pass
Key Components:
controller.py: Motion controller implementationqueue.py: Motion queue manager (priority-based)
3.4 ESPHome Module (esphome/)
Responsibilities:
- ESPHome protocol server implementation
- Audio streaming to/from Home Assistant
- Event handling (wake word, TTS start/end, STT result)
- mDNS service discovery
Interfaces:
class ESPHomeServer(ABC):
"""ESPHome server abstract base class"""
@abstractmethod
async def start(self, host: str, port: int) -> None:
"""Start ESPHome server"""
pass
@abstractmethod
async def stop(self) -> None:
"""Stop ESPHome server"""
pass
@abstractmethod
async def send_audio(self, audio_data: bytes) -> None:
"""Send audio to Home Assistant"""
pass
@abstractmethod
async def on_event(self, event: ESPHomeEvent) -> None:
"""Handle ESPHome event"""
pass
Key Components:
protocol.py: ESPHome protocol definitionsserver.py: ESPHome server implementation
3.5 Configuration Module (config/)
Responsibilities:
- Configuration file management
- Environment variable management
- Default configuration
Interfaces:
class ConfigManager:
"""Configuration manager"""
def __init__(self, config_path: str):
"""Initialize configuration manager"""
pass
def load(self) -> Dict:
"""Load configuration"""
pass
def save(self, config: Dict) -> None:
"""Save configuration"""
pass
def get(self, key: str, default=None) -> Any:
"""Get configuration value"""
pass
Key Components:
manager.py: Configuration manager implementation
4. Data Flow
4.1 Wake Word Detection Flow
Microphone Input (16kHz PCM)
β
Audio Chunk (1024 samples)
β
Wake Word Detector
ββ microWakeWord Features
ββ openWakeWord Features
β
Detection
ββ microWakeWord: probability > cutoff
ββ openWakeWord: probability > 0.5
β
Refractory Period Check (2 seconds)
β
Trigger Wakeup Event
β
ESPHome Server β Home Assistant
4.2 Audio Streaming Flow (to Home Assistant)
Microphone Input
β
Audio Chunk
β
ESPHome Server
β
VoiceAssistantAudio Message
β
Home Assistant (STT Processing)
β
VoiceAssistantEvent (STT Result)
4.3 TTS Audio Flow (from Home Assistant)
Home Assistant (TTS Processing)
β
VoiceAssistantEvent (TTS Start)
β
ESPHome Server
β
Motion Controller (Speech-reactive motions)
β
VoiceAssistantAudio (TTS Audio)
β
Speaker Playback
β
VoiceAssistantEvent (TTS End)
5. State Management
5.1 ServerState
Centralized state management:
class ServerState:
"""Server global state"""
# Application info
name: str
mac_address: str
# Audio
audio_queue: Queue
audio_input_device: Optional[str]
audio_output_device: Optional[str]
# Voice
wake_words: Dict[str, WakeWordDetector]
active_wake_words: List[str]
stop_word: WakeWordDetector
# Motion
motion_controller: MotionController
motion_queue: MotionQueue
# ESPHome
esphome_server: ESPHomeServer
esphome_connected: bool
# Status
is_streaming_audio: bool
is_playing_tts: bool
6. Deployment Architecture
6.1 Running on Reachy Mini
Reachy Mini (Raspberry Pi 4)
βββ Application (This Project)
β βββ Audio Module
β βββ Voice Module
β βββ Motion Module
β βββ ESPHome Module
βββ Reachy Mini Hardware
β βββ 4 Microphones
β βββ 5W Speaker
β βββ Head Motors (6 DOF)
β βββ Antennas (2)
βββ Network
βββ ESPHome Protocol (Port 6053)
ββ Home Assistant
6.2 Home Assistant Integration
Home Assistant
βββ ESPHome Integration
β ββ Reachy Mini (ESPHome Server)
βββ Voice Assistant
β βββ STT Service
β βββ TTS Service
βββ Automations
ββ Voice Commands
7. Performance Considerations
7.1 Latency Targets
- Wake Word Detection: < 500ms
- Audio Streaming: < 100ms
- TTS Playback: < 200ms
- Motion Response: < 100ms
7.2 Resource Requirements
- CPU: Raspberry Pi 4 (4 cores)
- RAM: 4GB minimum
- Network: Stable WiFi/Ethernet connection
8. Security Considerations
8.1 ESPHome Security
- Use encrypted connections (TLS)
- Implement authentication (if required)
- Validate all incoming messages
8.2 Audio Privacy
- Audio data is transmitted only when wake word is detected
- Support for local-only mode (no audio transmission)
- Clear audio recording indicators
9. Future Extensions
9.1 Additional Features
- Face tracking (camera integration)
- Visual recognition (SmolVLM2)
- Advanced emotions (dance library)
- Multi-language support
9.2 Performance Optimizations
- GPU acceleration for wake word detection
- Audio preprocessing on hardware
- Motion trajectory optimization
Note: This architecture document is the English version of ARCHITECTURE.md. For the Chinese version, see ARCHITECTURE.md.