---
title: EMMA CVPR 2026
emoji: 🚀
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
---
# EMMA: Extracting Multiple physical parameters from Multimodal Data
**CVPR 2026**
[Farhat Shaikh](https://scholar.google.com/citations?hl=en&user=mbAOSW0AAAAJ), [Ayan Banerjee](https://scholar.google.com/citations?user=UAlc7tEAAAAJ&hl=en), [Sandeep K. S. Gupta](https://scholar.google.com/citations?user=U9bcQkMAAAAJ&hl=en)
**IMPACT Lab, School of Computing & Augmented Intelligence (SCAI), Arizona State University**
[**Project page**](https://impactlabasu.github.io/EMMA-CVPR2026/) · [**Demo video**](https://youtu.be/Uo79pVlM6Rk)
---
## Overview
EMMA is a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, and assumptions about known initial conditions, EMMA performs joint inference of **explicit parameters**, **implicit dynamical components**, and **calibration invariants** within a unified continuous-time model.
The user supplies the parametric structure of the governing ODE; EMMA solves the inverse problem of recovering its parameters, along with any latent forcing and invariants, from multimodal observations.
## Key contributions
- **Multi-modal dynamical parameter extraction** from video, audio, and time-series reconstructed from visual charts.
- **Recovery under unobserved forcing inputs** by inferring latent actuation (e.g. wheel speed) from audio.
- **Estimation of implicit dynamics** associated with unmeasured physical effects (e.g. frictional drag).
- **Invariant calibration from raw video**, eliminating assumptions about known initial conditions or coordinate frames.
- **Extensive validation** on 100+ scenarios: Delfys benchmark (75 videos), real-world rover and quadrotor, and simulation charts.
## Architecture

EMMA follows a three-step pipeline: **Sense · Learn · Verify**.
1. **Sense.** Video, audio, and chart images are converted into time-aligned signals through modality-specific pipelines.
2. **Learn.** A Liquid Time-Constant (LTC) network models the system's latent dynamics in continuous time.
3. **Verify.** A differentiable ODE solver simulates the recovered parameters and checks them against the observations under a physics-informed loss.
## Results
EMMA delivers accurate multi-parameter recovery across diverse physical systems. Full tables and ablations are in the [paper](docs/42612.pdf).
| System | Parameters recovered | EMMA error | Best baseline |
|--------|----------------------|------------|---------------|
| Pendulum (90 cm) | Length *L*, damping *τ* | **L = 0.86 ± 0.07 m** (GT 0.90) | Delfys, PySINDy |
| Torricelli (med.) | Drainage *k* | **0.0132 ± 0.0008** (GT 0.0128) | matches Delfys |
| Sliding block (med.) | Angle *α*, friction *μ* | **α = 24.72°, μ = 0.205** (GT 25°, 0.20) | Delfys, PySINDy |
| LED decay (med.) | γ | **0.91 ± 0.0** (GT 0.92) | matches Delfys |
| Rover | 9 params (5 with known ground truth) | **8.8 % ± 1.7 %** mean error | *first work under hidden forcing* |
| Quadrotor | 12 params (7 with known ground truth) | **15.9 % ± 7.4 %** mean error | *first work under hidden forcing* |
| Simulation charts | Lotka-Volterra, Lorenz, F8 Crusader, HIV, AID | **>10× lower error** than PySINDy on implicit dynamics | PySINDy |
Compared against **PAIG**, **NIRPI**, and **Delfys** on the video benchmarks and **PySINDy** on the chart-based simulations.
## Supported systems
| Category | Systems |
|----------|---------|
| Delfys benchmark | Pendulum, Torricelli drainage, Sliding block, LED decay, Free fall |
| Real-world platforms | Differential-drive rover (9 params), 6-DoF quadrotor (12 params) |
| Simulation charts | Lotka-Volterra, Chaotic Lorenz, F8 Crusader, HIV therapy, AID (Type-1 diabetes) |
## Installation
Tested with **Python 3.10+** on macOS and Linux.
```bash
git clone https://github.com/ImpactLabASU/EMMA-CVPR2026.git
cd EMMA-CVPR2026
python3 -m venv .venv && source .venv/bin/activate # optional but recommended
pip install -r requirements.txt
```
**System tools**
- [FFmpeg](https://ffmpeg.org/) on your `PATH` (MoviePy uses it for audio extraction): `brew install ffmpeg` (macOS) or `sudo apt install ffmpeg` (Ubuntu).
- YOLO weights (default `yolo11m.pt`): `pip install ultralytics` then `yolo download model=yolo11m.pt`, or download from the Ultralytics releases page.
- A CUDA GPU is optional; every script falls back to CPU automatically.
## Repository layout
| Folder | Purpose | Entry points |
| --- | --- | --- |
| `Baseline/` | Physics-informed EMMA pipelines (Free Fall, LED, Pendulum, Sliding Block, Torricelli) plus ablation utilities. | `FreeFall/free_fall.py`, `LED/led.py`, `Pendulum/run-*.py`, `Sliding block/sliding_block*.py`, `Torricelli/toricelli*.py`, `architecture_ablation.py`, `run_additional_ablations.py` |
| `Rover/` | Rover perception, parameter estimation, multimodal ablations, helper shell script. | `run.py`, `rover-ablation.py`, `rover_multimodal_ablation.py`, `run_rover_ablation.sh` |
| `Drone/` | Drone pipeline orchestrator (vision + audio + EMMA optimization). | `new_run.py` |
| `CGM/` | Continuous glucose monitor chart digitizer. | `extract_cgm_data.py` |
## Data
- **Baseline datasets** come from the Delfys "Physical Parameter Prediction" set on Kaggle (https://www.kaggle.com/datasets/jaswar/physical-parameter-prediction). Download it and copy the experiment folders into `Baseline/`; the scripts discover the data automatically.
- **Sample rover and drone videos** are available here: **[Dropbox](https://www.dropbox.com/scl/fo/cjiym1h53puvv2ml6o8vn/APkfhTz64DnkYkHt554ZPj0?rlkey=hw3odtpzn6vl2nsfbe4pkekcq&dl=0)**. Place them under `Rover/` and `Drone/`.
## Usage
### Baseline pipelines
Each baseline follows the same recipe:
1. `cd Baseline//`
2. Edit the configuration block inside `main()`:
- `video_path`: path to the source video; leave empty to reuse existing data files.
- `weights_path`: YOLO weights (`yolo11m.pt` by default).
- `pixel_to_meter` (Free Fall, Torricelli, Sliding Block): set from your calibration grid.
- `output_folder`: a unique run directory (e.g. `run_01`); the script creates `output/` and `data/` under it.
3. Run `python3