Reachy Mini for Home Assistant - Project Plan
Project Overview
Integrate Home Assistant voice assistant functionality into Reachy Mini Wi-Fi robot, communicating with Home Assistant via ESPHome protocol.
Local Reference Directories (DO NOT modify any files in reference directories)
- linux-voice-assistant - Linux-based Home Assistant voice assistant app for reference
- Reachy Mini SDK - Reachy Mini SDK local directory for reference
- reachy_mini_conversation_app - Reachy Mini conversation app for reference
- reachy-mini-desktop-app - Reachy Mini desktop app for reference
- sendspin - Sendspin client for reference
Core Design Principles
- Zero Configuration - Users only need to install the app, no manual configuration required
- Native Hardware - Use robot's built-in microphone and speaker
- Home Assistant Centralized Management - All configuration done on Home Assistant side
- Motion Feedback - Provide head movement and antenna animation feedback during voice interaction
- Project Constraints - Strictly follow Reachy Mini SDK architecture design and constraints
- Code Quality - Follow Python development standards with consistent code style, clear structure, complete comments, comprehensive documentation, high test coverage, high code quality, readability, maintainability, extensibility, and reusability
- Feature Priority - Voice conversation with Home Assistant is highest priority; other features are auxiliary and must not affect voice conversation functionality or response speed
- No LED Functions - LEDs are hidden inside the robot; all LED control is ignored
- Preserve Functionality - Any code modifications should optimize while preserving completed features; do not remove features to solve problems. When issues occur, prioritize solving problems after referencing examples, not adding various log outputs
Technical Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Reachy Mini (ARM64) β
β β
β ββββββββββββββββββββββββββββββββ AUDIO INPUT ββββββββββββββββββββββββββββ β
β β ReSpeaker XVF3800 (16kHz) β β
β β ββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β 4-Mic Array β β β XVF3800 DSP β β β
β β ββββββββββββββββ β β’ Echo Cancellation (AEC) β β β
β β β β’ Noise Suppression (NS) β β β
β β β β’ Auto Gain Control (AGC, max 30dB) β β β
β β β β’ Direction of Arrival (DOA) β β β
β β β β’ Voice Activity Detection (VAD) β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β β
β β βΌ β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Wake Word Detection (microWakeWord) β β β
β β β β’ "Okay Nabu" / "Hey Jarvis" β β β
β β β β’ Stop word detection β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββ AUDIO OUTPUT βββββββββββββββββββββββββββ β
β β ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β β
β β β TTS Player β β Music Player (Sendspin) β β β
β β β β’ Voice assistant speech β β β’ Multi-room audio streaming β β β
β β β β’ Sound effects β β β’ Auto-discovery via mDNS β β β
β β β β’ Priority over music β β β’ Auto-pause during conversation β β β
β β ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β β
β β β β β β
β β ββββββββββββββββ¬ββββββββββββββββ β β
β β βΌ β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β ReSpeaker Speaker (16kHz) β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββ VISION & TRACKING ββββββββββββββββββββββββββ β
β β ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β β
β β β Camera (VPU accelerated) β β β YOLO Face Detection β β β
β β β β’ MJPEG stream server β β β’ AdamCodd/YOLOv11n-face β β β
β β β β’ ESPHome Camera entity β β β’ Adaptive frame rate: β β β
β β ββββββββββββββββββββββββββββ β - 15fps: conversation/face β β β
β β β - 3fps: idle (power saving) β β β
β β β β’ look_at_image() pose calc β β β
β β β β’ Smooth return after face lost β β β
β β ββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββ MOTION CONTROL βββββββββββββββββββββββββββββ β
β β MovementManager (10Hz Control Loop) β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Motion Layers (Priority: Move > Action > SpeechSway > Breath) β β β
β β β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββββ β β β
β β β β Move Queue β β Actions β β SpeechSway β β Breathing β β β β
β β β β (Emotions) β β (Nod/Shake)β β (Voice VAD)β β (Idle anim) β β β β
β β β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββββ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Face Tracking Offsets (Secondary Pose Overlay) β β β
β β β β’ Pitch offset: +9Β° (down compensation) β β β
β β β β’ Yaw offset: -7Β° (right compensation) β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β State Machine: on_wakeup β on_listening β on_speaking β on_idle β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββ TAP DETECTION ββββββββββββββββββββββββββββββ β
β β IMU Accelerometer (Wireless version only, 20Hz polling) β β
β β β’ Tap-to-wake: Enter continuous conversation mode β β
β β β’ Second tap: Exit continuous conversation mode β β
β β β’ Threshold: 0.5g (configurable, persisted) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββ ESPHOME SERVER βββββββββββββββββββββββββββββ β
β β Port 6053 (mDNS auto-discovery) β β
β β β’ 43+ entities (sensors, controls, media player, camera) β β
β β β’ Voice Assistant pipeline integration β β
β β β’ Real-time state synchronization β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β ESPHome Protocol (protobuf)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Home Assistant β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββββββ β
β β STT Engine β β Intent Processingβ β TTS Engine β β
β β (User configured)β β (Conversation) β β (User configured) β β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Completed Features
Core Features
- ESPHome protocol server implementation
- mDNS service discovery (auto-discovered by Home Assistant)
- Local wake word detection (microWakeWord)
- Tap-to-wake (IMU acceleration detection, wireless version only)
- Audio stream transmission to Home Assistant
- TTS audio playback
- Stop word detection
Reachy Mini Integration
- Use Reachy Mini SDK microphone input
- Use Reachy Mini SDK speaker output
- Head motion control (nod, shake, gaze)
- Antenna animation control
- Voice state feedback actions
- YOLO face tracking (replaces DOA sound source localization)
- 5Hz unified motion control loop
Application Architecture
- Compliant with Reachy Mini App architecture
File List
reachy_mini_ha_voice/
βββ reachy_mini_ha_voice/
β βββ __init__.py # Package initialization
β βββ __main__.py # Command line entry
β βββ main.py # ReachyMiniApp entry
β βββ voice_assistant.py # Voice assistant service
β βββ satellite.py # ESPHome protocol handling
β βββ audio_player.py # Audio player
β βββ camera_server.py # MJPEG camera stream server + face tracking
β βββ head_tracker.py # YOLO face detector
β βββ motion.py # Motion control (high-level API)
β βββ movement_manager.py # Unified movement manager (20Hz control loop, optimized to prevent daemon crash)
β βββ models.py # Data models
β βββ entity.py # ESPHome base entity
β βββ entity_extensions.py # Extended entity types
β βββ reachy_controller.py # Reachy Mini controller wrapper
β βββ api_server.py # API server
β βββ zeroconf.py # mDNS discovery
β βββ util.py # Utility functions
βββ wakewords/ # Wake word models (auto-download)
β βββ okay_nabu.json
β βββ okay_nabu.tflite
β βββ hey_jarvis.json
β βββ hey_jarvis.tflite
β βββ stop.json
β βββ stop.tflite
βββ sounds/ # Sound effect files (auto-download)
β βββ wake_word_triggered.flac
β βββ timer_finished.flac
βββ pyproject.toml # Project configuration
βββ README.md # Documentation
βββ PROJECT_PLAN.md # Project plan
Dependencies
dependencies = [
"reachy-mini", # Reachy Mini SDK
"sounddevice>=0.4.6", # Audio processing (backup)
"soundfile>=0.12.0", # Audio file reading
"numpy>=1.24.0", # Numerical computation
"pymicro-wakeword>=2.0.0,<3.0.0", # Wake word detection
"pyopen-wakeword>=1.0.0,<2.0.0", # Backup wake word
"aioesphomeapi>=42.0.0", # ESPHome protocol
"zeroconf>=0.100.0", # mDNS discovery
"scipy>=1.10.0", # Motion control
"pydantic>=2.0.0", # Data validation
]
Usage Flow
Install App
- Install
reachy-mini-ha-voicefrom Reachy Mini App Store
- Install
Start App
- App auto-starts ESPHome server (port 6053)
- Auto-downloads required models and sounds
Connect Home Assistant
- Home Assistant auto-discovers device (mDNS)
- Or manually add: Settings β Devices & Services β Add Integration β ESPHome
Use Voice Assistant
- Say "Okay Nabu" to wake
- Speak command
- Reachy Mini provides motion feedback
ESPHome Entity Planning
Based on deep analysis of Reachy Mini SDK, the following entities are exposed to Home Assistant:
Implemented Entities
| Entity Type | Name | Description |
|---|---|---|
| Media Player | media_player |
Audio playback control |
| Voice Assistant | voice_assistant |
Voice assistant pipeline |
Implemented Control Entities (Read/Write)
Phase 1-3: Basic Controls and Pose
| ESPHome Entity Type | Name | SDK API | Range/Options | Description |
|---|---|---|---|---|
Number |
speaker_volume |
AudioPlayer.set_volume() |
0-100 | Speaker volume |
Select |
motor_mode |
set_motor_control_mode() |
enabled/disabled/gravity_compensation | Motor mode selection |
Switch |
motors_enabled |
enable_motors() / disable_motors() |
on/off | Motor torque switch |
Button |
wake_up |
mini.wake_up() |
- | Wake robot action |
Button |
go_to_sleep |
mini.goto_sleep() |
- | Sleep robot action |
Number |
head_x |
goto_target(head=...) |
Β±50mm | Head X position control |
Number |
head_y |
goto_target(head=...) |
Β±50mm | Head Y position control |
Number |
head_z |
goto_target(head=...) |
Β±50mm | Head Z position control |
Number |
head_roll |
goto_target(head=...) |
-40Β° ~ +40Β° | Head roll angle control |
Number |
head_pitch |
goto_target(head=...) |
-40Β° ~ +40Β° | Head pitch angle control |
Number |
head_yaw |
goto_target(head=...) |
-180Β° ~ +180Β° | Head yaw angle control |
Number |
body_yaw |
goto_target(body_yaw=...) |
-160Β° ~ +160Β° | Body yaw angle control |
Number |
antenna_left |
goto_target(antennas=...) |
-90Β° ~ +90Β° | Left antenna angle control |
Number |
antenna_right |
goto_target(antennas=...) |
-90Β° ~ +90Β° | Right antenna angle control |
Phase 4: Gaze Control
| ESPHome Entity Type | Name | SDK API | Range/Options | Description |
|---|---|---|---|---|
Number |
look_at_x |
look_at_world(x, y, z) |
World coordinates | Gaze point X coordinate |
Number |
look_at_y |
look_at_world(x, y, z) |
World coordinates | Gaze point Y coordinate |
Number |
look_at_z |
look_at_world(x, y, z) |
World coordinates | Gaze point Z coordinate |
Implemented Sensor Entities (Read-only)
Phase 1 & 5: Basic Status and Audio Sensors
| ESPHome Entity Type | Name | SDK API | Description |
|---|---|---|---|
Text Sensor |
daemon_state |
DaemonStatus.state |
Daemon status |
Binary Sensor |
backend_ready |
backend_status.ready |
Backend ready status |
Text Sensor |
error_message |
DaemonStatus.error |
Current error message |
Sensor |
doa_angle |
DoAInfo.angle |
Sound source direction angle (Β°) |
Binary Sensor |
speech_detected |
DoAInfo.speech_detected |
Speech detection status |
Phase 6: Diagnostic Information
| ESPHome Entity Type | Name | SDK API | Description |
|---|---|---|---|
Sensor |
control_loop_frequency |
control_loop_stats |
Control loop frequency (Hz) |
Text Sensor |
sdk_version |
DaemonStatus.version |
SDK version |
Text Sensor |
robot_name |
DaemonStatus.robot_name |
Robot name |
Binary Sensor |
wireless_version |
DaemonStatus.wireless_version |
Wireless version flag |
Binary Sensor |
simulation_mode |
DaemonStatus.simulation_enabled |
Simulation mode flag |
Text Sensor |
wlan_ip |
DaemonStatus.wlan_ip |
Wireless IP address |
Phase 7: IMU Sensors (Wireless version only)
| ESPHome Entity Type | Name | SDK API | Description |
|---|---|---|---|
Sensor |
imu_accel_x |
mini.imu["accelerometer"][0] |
X-axis acceleration (m/sΒ²) |
Sensor |
imu_accel_y |
mini.imu["accelerometer"][1] |
Y-axis acceleration (m/sΒ²) |
Sensor |
imu_accel_z |
mini.imu["accelerometer"][2] |
Z-axis acceleration (m/sΒ²) |
Sensor |
imu_gyro_x |
mini.imu["gyroscope"][0] |
X-axis angular velocity (rad/s) |
Sensor |
imu_gyro_y |
mini.imu["gyroscope"][1] |
Y-axis angular velocity (rad/s) |
Sensor |
imu_gyro_z |
mini.imu["gyroscope"][2] |
Z-axis angular velocity (rad/s) |
Sensor |
imu_temperature |
mini.imu["temperature"] |
IMU temperature (Β°C) |
Phase 8-12: Extended Features
| ESPHome Entity Type | Name | Description |
|---|---|---|
Select |
emotion |
Emotion selector (Happy/Sad/Angry/Fear/Surprise/Disgust) |
Number |
microphone_volume |
Microphone volume (0-100%) |
Camera |
camera |
ESPHome Camera entity (live preview) |
Number |
led_brightness |
LED brightness (0-100%) |
Select |
led_effect |
LED effect (off/solid/breathing/rainbow/doa) |
Number |
led_color_r |
LED red component (0-255) |
Number |
led_color_g |
LED green component (0-255) |
Number |
led_color_b |
LED blue component (0-255) |
Switch |
agc_enabled |
Auto gain control switch |
Number |
agc_max_gain |
AGC max gain (0-30 dB) |
Number |
noise_suppression |
Noise suppression level (0-100%) |
Binary Sensor |
echo_cancellation_converged |
Echo cancellation convergence status |
Note: Head position (x/y/z) and angles (roll/pitch/yaw), body yaw, antenna angles are all controllable entities, using
Numbertype for bidirectional control. Callgoto_target()when setting new values, callget_current_head_pose()etc. when reading current values.
Implementation Priority
Phase 1 - Basic Status and Volume (High Priority) β Completed
-
daemon_state- Daemon status sensor -
backend_ready- Backend ready status -
error_message- Error message -
speaker_volume- Speaker volume control
-
Phase 2 - Motor Control (High Priority) β Completed
-
motors_enabled- Motor switch -
motor_mode- Motor mode selection (enabled/disabled/gravity_compensation) -
wake_up/go_to_sleep- Wake/sleep buttons
-
Phase 3 - Pose Control (Medium Priority) β Completed
-
head_x/y/z- Head position control -
head_roll/pitch/yaw- Head angle control -
body_yaw- Body yaw angle control -
antenna_left/right- Antenna angle control
-
Phase 4 - Gaze Control (Medium Priority) β Completed
-
look_at_x/y/z- Gaze point coordinate control
-
Phase 5 - DOA (Direction of Arrival) β Re-added for wakeup turn-to-sound
-
doa_angle- Sound source direction (degrees, 0-180Β°, where 0Β°=left, 90Β°=front, 180Β°=right) -
speech_detected- Speech detection status - Turn-to-sound at wakeup (robot turns toward speaker when wake word detected)
- Direction correction:
yaw = Ο/2 - doa(fixed left/right inversion) - Note: DOA only read once at wakeup to avoid daemon pressure; face tracking takes over after
-
Phase 6 - Diagnostic Information (Low Priority) β Completed
-
control_loop_frequency- Control loop frequency -
sdk_version- SDK version -
robot_name- Robot name -
wireless_version- Wireless version flag -
simulation_mode- Simulation mode flag -
wlan_ip- Wireless IP address
-
Phase 7 - IMU Sensors (Optional, wireless version only) β Completed
-
imu_accel_x/y/z- Accelerometer -
imu_gyro_x/y/z- Gyroscope -
imu_temperature- IMU temperature
-
Phase 8 - Emotion Control β Completed
-
emotion- Emotion selector (Happy/Sad/Angry/Fear/Surprise/Disgust)
-
Phase 9 - Audio Control β Completed
-
microphone_volume- Microphone volume control (0-100%)
-
Phase 10 - Camera Integration β Completed
-
camera- ESPHome Camera entity (live preview)
-
Phase 11 - LED Control β Disabled (LEDs hidden inside robot)
-
led_brightness- LED brightness (0-100%) - Commented out -
led_effect- LED effect (off/solid/breathing/rainbow/doa) - Commented out -
led_color_r/g/b- LED RGB color (0-255) - Commented out
-
Phase 12 - Audio Processing Parameters β Completed
-
agc_enabled- Auto gain control switch -
agc_max_gain- AGC max gain (0-30 dB) -
noise_suppression- Noise suppression level (0-100%) -
echo_cancellation_converged- Echo cancellation convergence status (read-only)
-
Phase 13 - Sendspin Audio Playback Support β Completed
-
sendspin_enabled- Sendspin switch (Switch) -
sendspin_url- Sendspin server URL (Text Sensor) -
sendspin_connected- Sendspin connection status (Binary Sensor) - AudioPlayer integrates aiosendspin library
- TTS audio sent to both local speaker and Sendspin server
-
π Phase 1-13 Entities Completed!
Total Completed: 43 entities
- Phase 1: 4 entities (Basic status and volume)
- Phase 2: 4 entities (Motor control)
- Phase 3: 9 entities (Pose control)
- Phase 4: 3 entities (Gaze control)
- Phase 5: 2 entities (Audio sensors)
- Phase 6: 6 entities (Diagnostic information)
- Phase 7: 7 entities (IMU sensors)
- Phase 8: 1 entity (Emotion control)
- Phase 9: 1 entity (Microphone volume)
- Phase 10: 1 entity (Camera)
- Phase 11: 0 entities (LED control - Disabled)
- Phase 12: 4 entities (Audio processing parameters)
- Phase 13: 3 entities (Sendspin audio output)
π Voice Assistant Enhancement Features Implementation Status
Phase 14 - Emotion Action Feedback System (Partial) π‘
Implementation Status: Basic infrastructure ready, supports manual trigger, uses voice-driven natural micro-movements during conversation
Implemented Features:
- β
Phase 8 Emotion Selector entity (
emotion) - β
Basic emotion action playback API (
_play_emotion) - β Emotion mapping: Happy/Sad/Angry/Fear/Surprise/Disgust
- β
Integration with HuggingFace action library (
pollen-robotics/reachy-mini-emotions-library) - β SpeechSway system for natural head micro-movements during conversation (non-blocking)
- β Tap detection disabled during emotion playback (polls daemon API for completion)
Design Decisions:
- π― No auto-play of full emotion actions during conversation to avoid blocking
- π― Use voice-driven head sway (SpeechSway) for natural motion feedback
- π― Emotion actions retained as manual trigger feature via ESPHome entity
- π― Tap detection waits for actual move completion via
/api/move/runningpolling
Not Implemented:
- β Auto-trigger emotion actions based on voice assistant response (decided not to implement to avoid blocking)
- β Intent recognition and emotion matching
- β Dance action library integration
- β Context awareness (e.g., weather query - sunny plays happy, rainy plays sad)
Code Locations:
entity_registry.py:633-658- Emotion Selector entitysatellite.py:_play_emotion()- Emotion playback with move UUID trackingsatellite.py:_wait_for_move_completion()- Polls daemon API for move completionmotion.py:132-156- Conversation start motion control (uses SpeechSway)movement_manager.py:541-595- Move queue management (allows SpeechSway overlay)
Actual Behavior:
| Voice Assistant Event | Actual Action | Implementation Status |
|---|---|---|
| Wake word detected | Turn toward sound source + nod confirmation | β Implemented |
| Conversation start | Voice-driven head micro-movements (SpeechSway) | β Implemented |
| During conversation | Continuous voice-driven micro-movements + breathing animation | β Implemented |
| Conversation end | Return to neutral position + breathing animation | β Implemented |
| Manual emotion trigger | Play via ESPHome emotion entity |
β Implemented |
Technical Details:
# motion.py - Use SpeechSway instead of full emotion actions during conversation
def on_speaking_start(self):
self._is_speaking = True
self._movement_manager.set_state(RobotState.SPEAKING)
# SpeechSway automatically generates natural head micro-movements based on audio loudness
# No full emotion actions played to avoid blocking conversation experience
# movement_manager.py - Motion layering system
# 1. Move queue (emotion actions) - Sets base pose
# 2. Action (nod/shake etc.) - Overlays on base pose
# 3. SpeechSway - Voice-driven micro-movements, can coexist with Move
# 4. Breathing - Idle breathing animation
Original Plan (Decided not to implement to avoid blocking conversation):
| Voice Assistant Event | Original Planned Action | Reason Not Implemented |
|---|---|---|
| Positive response received | Play "happy" action | Full action would block conversation fluency |
| Negative response received | Play "sad" action | Full action would block conversation fluency |
| Play music/entertainment | Play "dance" action | Full action would block conversation fluency |
| Timer completed | Play "alert" action | Full action would block conversation fluency |
| Error/cannot understand | Play "confused" action | Full action would block conversation fluency |
Manual Emotion Trigger Example:
# Home Assistant automation example - Manual emotion trigger
automation:
- alias: "Reachy Good Morning Greeting"
trigger:
- platform: time
at: "07:00:00"
action:
- service: select.select_option
target:
entity_id: select.reachy_mini_emotion
data:
option: "Happy"
Phase 15 - Face Tracking (Complements DOA Turn-to-Sound) β Completed
Goal: Implement natural face tracking so robot looks at speaker during conversation.
Design Decision:
- β DOA (Direction of Arrival): Used once at wakeup to turn toward sound source
- β YOLO face detection: Takes over after initial turn for continuous tracking
- Reason: DOA provides quick initial orientation, face tracking provides accurate continuous tracking
Wakeup Turn-to-Sound Flow:
- Wake word detected β Read DOA angle once (avoid daemon pressure)
- If DOA angle > 10Β°: Turn head toward sound source (80% of angle, conservative)
- Face tracking takes over for continuous tracking during conversation
Implemented Features:
| Feature | Description | Implementation Location | Status |
|---|---|---|---|
| DOA turn-to-sound | Turn toward speaker at wakeup | satellite.py:_turn_to_sound_source() |
β Implemented |
| YOLO face detection | Uses AdamCodd/YOLOv11n-face-detection model |
head_tracker.py |
β Implemented |
| Adaptive frame rate tracking | 15fps during conversation, 3fps when idle without face | camera_server.py |
β Implemented |
| look_at_image() | Calculate target pose from face position | camera_server.py |
β Implemented |
| Smooth return to neutral | Smooth return within 1 second after face lost | camera_server.py |
β Implemented |
| face_tracking_offsets | As secondary pose overlay to motion control | movement_manager.py |
β Implemented |
| DOA entities | doa_angle and speech_detected exposed to Home Assistant |
entity_registry.py |
β Implemented |
| Model download retry | 3 retries, 5 second interval | head_tracker.py |
β Implemented |
| Conversation mode integration | Auto-switch tracking frequency on voice assistant state change | satellite.py |
β Implemented |
Resource Optimization (v0.5.1):
- During conversation (listening/thinking/speaking): High-frequency tracking 15fps
- Idle with face detected: High-frequency tracking 15fps
- Idle without face for 10s: Low-power mode 3fps (only detect if someone appears)
- Immediately restore high-frequency tracking when face detected
Code Locations:
satellite.py:_turn_to_sound_source()- DOA turn-to-sound at wakeuphead_tracker.py- YOLO face detector (HeadTrackerclass)camera_server.py:_capture_frames()- Adaptive frame rate face trackingcamera_server.py:set_conversation_mode()- Conversation mode switch APIsatellite.py:_set_conversation_mode()- Voice assistant state integrationmovement_manager.py:set_face_tracking_offsets()- Face tracking offset API
Technical Details:
# camera_server.py - Adaptive frame rate face tracking
class MJPEGCameraServer:
def __init__(self):
self._fps_high = 15 # During conversation/face detected
self._fps_low = 3 # Idle without face
self._low_power_threshold = 10.0 # 10s without face switches to low power
def _should_run_face_tracking(self, current_time):
# Conversation mode: Always high-frequency tracking
if self._in_conversation:
return True
# High-frequency mode: Track every frame
if self._current_fps == self._fps_high:
return True
# Low-power mode: Periodic detection
return time.since_last_check >= 1/self._fps_low
# satellite.py - Voice assistant state integration
def _reachy_on_listening(self):
self._set_conversation_mode(True) # Start conversation, high-frequency tracking
def _reachy_on_idle(self):
self._set_conversation_mode(False) # End conversation, adaptive tracking
Phase 16 - Cartoon Style Motion Mode (Partial) π‘
Goal: Use SDK interpolation techniques for more expressive robot movements.
SDK Support: InterpolationTechnique enum
LINEAR- Linear, mechanical feelMIN_JERK- Minimum jerk, natural and smooth (default)EASE_IN_OUT- Ease in-out, elegantCARTOON- Cartoon style, with bounce effect, lively and cute
Implemented Features:
- β
20Hz unified control loop (
movement_manager.py) - Reduced from 100Hz to prevent daemon crash - β Pose change detection - Only send commands on significant changes (threshold 0.001)
- β State query caching - 100ms TTL, reduces daemon load
- β Smooth interpolation (ease in-out curve)
- β
Breathing animation - Idle Z-axis micro-movement + antenna sway (
BreathingAnimation) - β Command queue mode - Thread-safe external API
- β Error throttling - Prevents log explosion
- β Connection health monitoring - Auto-detect and recover from connection loss
Not Implemented:
- β Dynamic interpolation technique switching (CARTOON/EASE_IN_OUT etc.)
- β Exaggerated cartoon bounce effects
Code Locations:
movement_manager.py:192-243- BreathingAnimation classmovement_manager.py:246-697- MovementManager class
Scene Implementation Status:
| Scene | Recommended Interpolation | Effect | Status |
|---|---|---|---|
| Wake nod | CARTOON |
Lively bounce effect | β Not implemented |
| Thinking head up | EASE_IN_OUT |
Elegant transition | β Implemented (smooth interpolation) |
| Speaking micro-movements | MIN_JERK |
Natural and fluid | β Implemented (SpeechSway) |
| Error head shake | CARTOON |
Exaggerated denial | β Not implemented |
| Return to neutral | MIN_JERK |
Smooth return | β Implemented |
| Idle breathing | - | Subtle sense of life | β Implemented (BreathingAnimation) |
Phase 17 - Antenna Sync Animation During Speech (Partial) π‘
Goal: Antennas sway with audio rhythm during TTS playback, simulating "speaking" effect.
Implemented Features:
- β
Voice-driven head sway (
SpeechSwayGenerator) - β VAD detection based on audio loudness
- β Multi-frequency sine wave overlay (Lissajous motion)
- β Smooth envelope transitions
Code Locations:
movement_manager.py:124-189- SpeechSwayGenerator classmotion.py:212-222- update_audio_loudness() method
Technical Details:
# Speech sway parameters
SWAY_A_PITCH_DEG = 3.0 # Pitch amplitude (degrees)
SWAY_A_YAW_DEG = 2.0 # Yaw amplitude
SWAY_A_ROLL_DEG = 2.0 # Roll amplitude
SWAY_F_PITCH = 0.8 # Pitch frequency Hz
SWAY_F_YAW = 0.6 # Yaw frequency
SWAY_F_ROLL = 0.5 # Roll frequency
# VAD thresholds
VAD_DB_ON = -35 # Start detection threshold
VAD_DB_OFF = -45 # Stop detection threshold
Not Implemented:
- β Antenna sway with audio rhythm (currently only head sway)
- β Audio spectrum analysis driven animation
Phase 18 - Visual Gaze Interaction (Not Implemented) β
Goal: Use camera to detect faces for eye contact.
SDK Support:
look_at_image(u, v)- Look at point in imagelook_at_world(x, y, z)- Look at world coordinate pointmedia.get_frame()- Get camera frame (β Already implemented incamera_server.py:146)
Not Implemented Features:
| Feature | Description | Status |
|---|---|---|
| Face detection | Use OpenCV/MediaPipe to detect faces | β Not implemented |
| Eye tracking | Look at speaker's face during conversation | β Not implemented |
| Multi-person switching | When multiple people detected, look at current speaker | β Not implemented |
| Idle scanning | Randomly look around when idle | β Not implemented |
Phase 19 - Gravity Compensation Interactive Mode (Partial) π‘
Goal: Allow users to physically touch and guide robot head for "teaching" style interaction.
SDK Support: enable_gravity_compensation() - Motors enter gravity compensation mode, can be manually moved
Implemented Features:
- β
Gravity compensation mode switch (
motor_modeSelect entity, option "gravity_compensation") - β
reachy_controller.py:236-237- Gravity compensation API call
Not Implemented:
- β Teaching mode - Record motion trajectory
- β Save/playback custom actions
- β Voice command triggered teaching flow
Application Scenarios:
- β User says "Let me teach you a move" β Enter gravity compensation mode
- β User manually moves head β Record motion trajectory
- β User says "Remember this" β Save action
- β User says "Do that action again" β Playback recorded action
Phase 20 - Environment Awareness Response (Partial) π‘
Goal: Use IMU sensors to sense environment changes and respond.
SDK Support:
- β
mini.imu["accelerometer"]- Accelerometer (Phase 7 implemented as entity) - β
mini.imu["gyroscope"]- Gyroscope (Phase 7 implemented as entity)
Implemented Features:
| Detection Event | Response Action | Status |
|---|---|---|
| Tap-to-wake | Enter continuous conversation mode | β Implemented |
| Second tap | Exit continuous conversation mode | β Implemented |
Tap-to-wake vs Voice Wake:
| Wake Method | Conversation Mode | Description |
|---|---|---|
| Voice wake (Okay Nabu) | Single conversation | Need to say wake word for each conversation |
| Tap-to-wake | Continuous conversation | Auto-continue listening after TTS ends, tap again to exit |
Technical Implementation:
tap_detector.py- IMU acceleration spike detectionsatellite.py:_tap_conversation_mode- Continuous conversation mode flag- Threshold: 2.0g (configurable)
- Cooldown: 1.0s (prevent repeated triggers)
- Wireless version only
# satellite.py - Continuous conversation mode
def wakeup_from_tap(self):
if self._tap_conversation_mode:
# Second tap - Exit continuous conversation
self._tap_conversation_mode = False
self._reachy_on_idle()
else:
# First tap - Enter continuous conversation
self._tap_conversation_mode = True
self.send_messages([VoiceAssistantRequest(start=True)])
def _tts_finished(self):
if self._tap_conversation_mode:
# Continuous conversation mode: Auto-continue listening
self.send_messages([VoiceAssistantRequest(start=True)])
Not Implemented:
| Detection Event | Response Action | Status |
|---|---|---|
| Being shaken | Play dizzy action + voice "Don't shake me~" | β Not implemented |
| Tilted/fallen | Play help action + voice "I fell, help me" | β Not implemented |
| Long idle | Enter sleep animation | β Not implemented |
Phase 21 - Home Assistant Scene Integration (Not Implemented) β
Goal: Trigger robot actions based on Home Assistant scenes/automations.
Implementation: Via ESPHome service calls
Not Implemented Scenes:
| HA Scene | Robot Response | Status |
|---|---|---|
| Good morning scene | Play wake action + "Good morning!" | β Not implemented |
| Good night scene | Play sleep action + "Good night~" | β Not implemented |
| Someone home | Turn toward door + wave + "Welcome home!" | β Not implemented |
| Doorbell rings | Turn toward door + alert action | β Not implemented |
| Play music | Sway with music rhythm | β Not implemented |
π Feature Implementation Summary
β Completed Features
Core Voice Assistant (Phase 1-12)
- 45+ ESPHome entities - All implemented
- Basic voice interaction - Wake word detection, STT/TTS integration
- Motion feedback - Nod, shake, gaze and other basic actions
- Audio processing - AGC, noise suppression, echo cancellation
- Camera stream - MJPEG live preview
Partially Implemented Features (Phase 14-21)
- Phase 14 - Emotion action API infrastructure (manual trigger available)
- Phase 19 - Gravity compensation mode switch (teaching flow not implemented)
β Not Implemented Features
High Priority
Phase 13 - Sendspin audio playback supportβ Completed- Phase 14 - Auto emotion action feedback (needs voice assistant event association)
- Phase 15 - Continuous sound source tracking (only turn toward at wakeup)
Medium Priority
- Phase 16 - Cartoon style motion mode (needs dynamic interpolation switching)
- Phase 17 - Antenna sync animation
- Phase 18 - Face tracking and eye contact interaction
Low Priority
- Phase 19 - Teaching mode record/playback functionality
- Phase 20 - IMU environment awareness response
- Phase 21 - Home Assistant scene integration
Feature Priority Summary (Updated)
High Priority (Completed β )
- β Phase 1-12: Basic ESPHome entities (45+)
- β Core voice assistant functionality
- β Basic motion feedback (nod, shake, gaze)
High Priority (Partial π‘)
- π‘ Phase 13: Emotion action feedback system
- β Emotion Selector entity and API infrastructure
- β Auto-trigger emotion actions based on voice assistant response
- β Intent recognition and emotion matching
- β Dance action library integration
High Priority (Not Implemented β)
- β Phase 14: Smart sound source tracking enhancement
- β Turn toward sound source at wakeup
- β Continuous sound source tracking
- β Multi-person conversation switching
- β Sound source visualization
Medium Priority (Partial π‘)
- π‘ Phase 15: Cartoon style motion mode
- β 20Hz unified control loop architecture (optimized to prevent daemon crash)
- β Pose change detection + state query caching (reduces daemon load)
- β Smooth interpolation + breathing animation
- β Dynamic interpolation technique switching (CARTOON etc.)
- π‘ Phase 16: Antenna sync during speech
- β Voice-driven head sway (SpeechSwayGenerator)
- β Antenna sway with audio rhythm
Medium Priority (Not Implemented β)
- β Phase 17: Visual gaze interaction - Eye contact
Low Priority (Partial π‘)
- π‘ Phase 18: Gravity compensation interactive mode
- β Gravity compensation mode switch
- β Teaching style interaction (record/playback functionality)
Low Priority (Not Implemented β)
- β Phase 19: Environment awareness response - IMU triggered actions
- β Phase 20: Home Assistant scene integration - Smart home integration
π Completion Statistics
| Phase | Status | Completion | Notes |
|---|---|---|---|
| Phase 1-12 | β Complete | 100% | 40 ESPHome entities implemented (Phase 11 LED disabled) |
| Phase 13 | π‘ Partial | 30% | API infrastructure ready, missing auto-trigger |
| Phase 14 | β Not done | 20% | Only turn toward at wakeup implemented |
| Phase 15 | π‘ Partial | 70% | 20Hz control loop + pose change detection + state cache + breathing animation implemented |
| Phase 16 | π‘ Partial | 50% | Voice-driven head sway implemented |
| Phase 17 | β Not done | 10% | Camera implemented, missing face detection |
| Phase 18 | π‘ Partial | 40% | Mode switch implemented, missing teaching flow |
| Phase 19 | β Not done | 10% | IMU data exposed, missing trigger logic |
| Phase 20 | β Not done | 0% | Not implemented |
Overall Completion: Phase 1-12: 100% | Phase 13-20: ~35%
π§ Daemon Crash Fix (2025-01-05)
Problem Description
During long-term operation, reachy_mini daemon would crash, causing robot to become unresponsive.
Root Cause
- 100Hz control loop too frequent - Calling
robot.set_target()every 10ms, even when pose hasn't changed - Frequent state queries - Every entity state read calls
get_status(),get_current_head_pose()etc. - Missing change detection - Even when pose hasn't changed, continues sending same commands
- Zenoh message queue blocking - Accumulated 150+ messages per second, daemon cannot process in time
Fix Solution
1. Reduce control loop frequency (movement_manager.py)
# Reduced from 100Hz to 20Hz
CONTROL_LOOP_FREQUENCY_HZ = 20 # 80% reduction in messages
2. Add pose change detection (movement_manager.py)
# Only send commands on significant pose changes
if self._last_sent_pose is not None:
max_diff = max(abs(pose[k] - self._last_sent_pose.get(k, 0.0)) for k in pose.keys())
if max_diff < 0.001: # Threshold: 0.001 rad or 0.001 m
return # Skip sending
3. State query caching (reachy_controller.py)
# Cache daemon status query results
self._cache_ttl = 0.1 # 100ms TTL
self._last_status_query = 0.0
def _get_cached_status(self):
now = time.time()
if now - self._last_status_query < self._cache_ttl:
return self._state_cache.get('status') # Use cache
# ... query and update cache
4. Head pose query caching (reachy_controller.py)
# Cache get_current_head_pose() and get_current_joint_positions() results
def _get_cached_head_pose(self):
# Reuse cached results within 100ms
Fix Results
| Metric | Before Fix | After Fix | Improvement |
|---|---|---|---|
| Control message frequency | ~100 msg/s | ~20 msg/s | β 80% |
| State query frequency | ~50 msg/s | ~5 msg/s | β 90% |
| Total Zenoh messages | ~150 msg/s | ~25 msg/s | β 83% |
| Daemon CPU load | Sustained high load | Normal load | Significantly reduced |
| Expected stability | Crash within hours | Stable for days | Major improvement |
Related Files
DAEMON_CRASH_FIX_PLAN.md- Detailed fix plan and test planmovement_manager.py- Control loop optimizationreachy_controller.py- State query caching
Future Optimization Suggestions
- β³ Dynamic frequency adjustment - 50Hz during motion, 5Hz when idle
- β³ Batch state queries - Get all states at once
- β³ Performance monitoring and alerts - Real-time daemon health monitoring
π§ Daemon Crash Deep Fix (2026-01-07)
Problem Description
During long-term operation, reachy_mini daemon still crashes, previous fix not thorough enough.
Root Cause Analysis
Through deep analysis of SDK source code:
Each
set_target()sends 3 Zenoh messagesset_target_head_pose()- 1 messageset_target_antenna_joint_positions()- 1 messageset_target_body_yaw()- 1 message
Daemon control loop is 50Hz
- See
reachy_mini/daemon/backend/robot/backend.py:control_loop_frequency = 50.0 - If message send frequency exceeds 50Hz, daemon may not process in time
- See
Previous 20Hz control loop still too high
- 20Hz Γ 3 messages = 60 messages/second
- Already exceeds daemon's 50Hz processing capacity
Pose change threshold too small (0.002)
- Breathing animation, speech sway, face tracking continuously produce tiny changes
- Almost every loop triggers
set_target()
Fix Solution
1. Further reduce control loop frequency (movement_manager.py)
# Reduced from 20Hz to 10Hz
# 10Hz Γ 3 messages = 30 messages/second, safely below daemon's 50Hz capacity
CONTROL_LOOP_FREQUENCY_HZ = 10
2. Increase pose change threshold (movement_manager.py)
# Increased from 0.002 to 0.005
# 0.005 rad β 0.29 degrees, still smooth enough
self._pose_change_threshold = 0.005
3. Reduce camera/face tracking frequency (camera_server.py)
# Reduced from 15fps to 10fps
fps: int = 10
4. Reduce IMU polling frequency (tap_detector.py)
# Reduced from 50Hz to 20Hz
TAP_DETECTION_RATE_HZ = 20
5. Increase state cache TTL (reachy_controller.py)
# Increased from 1 second to 2 seconds
self._cache_ttl = 2.0
Fix Results
| Metric | Before (20Hz) | After (10Hz) | Improvement |
|---|---|---|---|
| Control loop frequency | 20 Hz | 10 Hz | β 50% |
| Max Zenoh messages | 60 msg/s | 30 msg/s | β 50% |
| Actual messages (with change detection) | ~40 msg/s | ~15 msg/s | β 62% |
| Face tracking frequency | 15 Hz | 10 Hz | β 33% |
| IMU polling frequency | 50 Hz | 20 Hz | β 60% |
| State cache TTL | 1 second | 2 seconds | β 100% |
| Expected stability | Crash within hours | Stable operation | Major improvement |
Key Finding
Reference reachy_mini_conversation_app uses 100Hz control loop, but it's an official app that may have special optimizations or runs on more powerful hardware. Our app needs more conservative settings.
Related Files
movement_manager.py- Control loop frequency and pose thresholdcamera_server.py- Face tracking frequencytap_detector.py- IMU polling frequencyreachy_controller.py- State cache TTL
π§ Tap-to-Wake and Microphone Sensitivity Fix (2026-01-07)
Problem Description
- Tap-to-wake blocking - Conversation not working properly after tap wake, blocking issues
- Low microphone sensitivity - Need to be very close for voice recognition
Root Cause
- Audio playback blocking -
_tap_continue_feedback()plays sound in continuous conversation mode, blocking audio stream processing - AGC settings not optimized - ReSpeaker XVF3800 default settings not suitable for distant voice recognition
Fix Solution
1. Remove audio playback in continuous conversation feedback (satellite.py)
def _tap_continue_feedback(self) -> None:
"""Provide feedback when continuing conversation in tap mode.
Triggers a nod to indicate ready for next input.
Sound is NOT played here to avoid blocking audio streaming.
"""
# NOTE: Do NOT play sound here - it blocks audio streaming
if self.state.motion_enabled and self.state.motion:
self.state.motion.on_continue_listening()
2. Add exception handling to tap callback (voice_assistant.py)
def _on_tap_detected(self) -> None:
"""Callback when tap is detected on the robot.
NOTE: This is called from the tap_detector background thread.
"""
try:
self._state.satellite.wakeup_from_tap()
# ... motion feedback
except Exception as e:
_LOGGER.error("Error in tap detection callback: %s", e)
3. Comprehensive microphone optimization (voice_assistant.py) - Updated 2026-01-07
def _optimize_microphone_settings(self) -> None:
"""Optimize ReSpeaker XVF3800 microphone settings for voice recognition."""
# ========== 1. AGC (Automatic Gain Control) Settings ==========
# Enable AGC for automatic volume normalization
respeaker.write("PP_AGCONOFF", [1])
# Increase AGC max gain for better distant speech pickup (default ~15dB -> 30dB)
respeaker.write("PP_AGCMAXGAIN", [30.0])
# Set AGC desired output level (default ~-25dB -> -18dB for stronger output)
respeaker.write("PP_AGCDESIREDLEVEL", [-18.0])
# Optimize AGC time constant for voice commands
respeaker.write("PP_AGCTIME", [0.5])
# ========== 2. Base Microphone Gain ==========
# Increase base microphone gain (default 1.0 -> 2.0)
respeaker.write("AUDIO_MGR_MIC_GAIN", [2.0])
# ========== 3. Noise Suppression Settings ==========
# Reduce noise suppression to preserve quiet speech (default ~0.5 -> 0.15)
respeaker.write("PP_MIN_NS", [0.15])
respeaker.write("PP_MIN_NN", [0.15])
# ========== 4. Echo Cancellation & High-pass Filter ==========
respeaker.write("PP_ECHOONOFF", [1])
respeaker.write("AEC_HPFONOFF", [1])
Fix Results
| Parameter | Before | After | Notes |
|---|---|---|---|
| Tap continuous conversation | Blocking | Working | Removed blocking audio playback |
| Microphone sensitivity | ~30cm | ~2-3m | Comprehensive AGC and gain optimization |
| AGC switch | Off | On | Auto volume normalization |
| AGC max gain | ~15dB | 30dB | Better distant speech pickup |
| AGC target level | -25dB | -18dB | Stronger output signal |
| Microphone gain | 1.0x | 2.0x | Base gain doubled |
| Noise suppression | ~0.5 | 0.15 | Reduced speech mis-suppression |
| Echo cancellation | On | On | Maintain clarity during TTS playback |
| High-pass filter | Off | On | Remove low-frequency noise |
XVF3800 Parameter Reference
| Parameter Name | Type | Range | Description |
|---|---|---|---|
PP_AGCONOFF |
int32 | 0/1 | AGC switch |
PP_AGCMAXGAIN |
float | 0-40 dB | AGC max gain |
PP_AGCDESIREDLEVEL |
float | dB | AGC target output level |
PP_AGCTIME |
float | seconds | AGC time constant |
AUDIO_MGR_MIC_GAIN |
float | 0-4.0 | Microphone gain multiplier |
PP_MIN_NS |
float | 0-1.0 | Minimum noise suppression (lower = less suppression) |
PP_MIN_NN |
float | 0-1.0 | Minimum noise estimation |
PP_ECHOONOFF |
int32 | 0/1 | Echo cancellation switch |
AEC_HPFONOFF |
int32 | 0/1 | High-pass filter switch |
Related Files
satellite.py- Removed blocking audio playbackvoice_assistant.py- Comprehensive microphone optimizationreachy_controller.py- AGC entity default value updatesentity_registry.py- AGC max gain range update (0-40dB)reachy_mini/src/reachy_mini/media/audio_control_utils.py- SDK reference
π§ v0.5.1 Bug Fixes (2026-01-08)
Issue 1: Music Not Resuming After Voice Conversation
Problem: Music doesn't resume after voice conversation ends.
Root Cause: Sendspin was incorrectly connected to tts_player instead of music_player.
Fix:
voice_assistant.py: Sendspin discovery now connects tomusic_playersatellite.py:duck()/unduck()now callmusic_player.pause_sendspin()/resume_sendspin()
Issue 2: tap_sensitivity Not Persisted
Problem: tap_sensitivity value set in ESPHome lost after restart.
Fix:
models.py: Addedtap_sensitivityfield toPreferencesdataclassentity_registry.py: Entity setter now saves topreferences.json- Load saved value on startup
Issue 3: Audio Conflict During Voice Assistant Wakeup
Problem: Audio streaming (Sendspin or ESPHome audio) conflicts when voice assistant wakes up.
Fix:
audio_player.py: Addedpause_sendspin()andresume_sendspin()methodssatellite.py:duck()now pauses Sendspin,unduck()resumes it- Improved
pause()method to actually stop audio output
Issue 4: AttributeError for _camera_server
Problem: _set_conversation_mode() referenced non-existent _camera_server attribute.
Fix: Changed self._camera_server to self.camera_server (removed underscore prefix)
Issue 5: tap_sensitivity Default Value Wrong
Problem: tap_sensitivity default was still 2.0g instead of expected 0.5g.
Fix: Use TAP_THRESHOLD_G_DEFAULT constant as default value
Issue 6: Sendspin Sample Rate Optimization
Problem: ReSpeaker hardware I/O is 16kHz (hardware limitation), but Sendspin might try higher sample rates.
Fix: Prioritize 16kHz in Sendspin supported formats list to avoid unnecessary resampling
SDK Data Structure Reference
# Motor control mode
class MotorControlMode(str, Enum):
Enabled = "enabled" # Torque on, position control
Disabled = "disabled" # Torque off
GravityCompensation = "gravity_compensation" # Gravity compensation mode
# Daemon state
class DaemonState(Enum):
NOT_INITIALIZED = "not_initialized"
STARTING = "starting"
RUNNING = "running"
STOPPING = "stopping"
STOPPED = "stopped"
ERROR = "error"
# Full state
class FullState:
control_mode: MotorControlMode
head_pose: XYZRPYPose # x, y, z (m), roll, pitch, yaw (rad)
head_joints: list[float] # 7 joint angles
body_yaw: float
antennas_position: list[float] # [right, left]
doa: DoAInfo # angle (rad), speech_detected (bool)
# IMU data (wireless version only)
imu_data = {
"accelerometer": [x, y, z], # m/sΒ²
"gyroscope": [x, y, z], # rad/s
"quaternion": [w, x, y, z], # Attitude quaternion
"temperature": float # Β°C
}
# Safety limits
HEAD_PITCH_ROLL_LIMIT = [-40Β°, +40Β°]
HEAD_YAW_LIMIT = [-180Β°, +180Β°]
BODY_YAW_LIMIT = [-160Β°, +160Β°]
YAW_DELTA_MAX = 65Β° # Max difference between head and body yaw
ESPHome Protocol Implementation Notes
ESPHome protocol communicates with Home Assistant via protobuf messages. The following message types need to be implemented:
from aioesphomeapi.api_pb2 import (
# Number entity (volume/angle control)
ListEntitiesNumberResponse,
NumberStateResponse,
NumberCommandRequest,
# Select entity (motor mode)
ListEntitiesSelectResponse,
SelectStateResponse,
SelectCommandRequest,
# Button entity (wake/sleep)
ListEntitiesButtonResponse,
ButtonCommandRequest,
# Switch entity (motor switch)
ListEntitiesSwitchResponse,
SwitchStateResponse,
SwitchCommandRequest,
# Sensor entity (numeric sensors)
ListEntitiesSensorResponse,
SensorStateResponse,
# Binary Sensor entity (boolean sensors)
ListEntitiesBinarySensorResponse,
BinarySensorStateResponse,
# Text Sensor entity (text sensors)
ListEntitiesTextSensorResponse,
TextSensorStateResponse,
)