--- title: EMMA CVPR 2026 emoji: 🚀 colorFrom: blue colorTo: indigo sdk: static pinned: false --- # EMMA: Extracting Multiple physical parameters from Multimodal Data **CVPR 2026** [Farhat Shaikh](https://scholar.google.com/citations?hl=en&user=mbAOSW0AAAAJ), [Ayan Banerjee](https://scholar.google.com/citations?user=UAlc7tEAAAAJ&hl=en), [Sandeep K. S. Gupta](https://scholar.google.com/citations?user=U9bcQkMAAAAJ&hl=en) **IMPACT Lab, School of Computing & Augmented Intelligence (SCAI), Arizona State University** [**Project page**](https://impactlabasu.github.io/EMMA-CVPR2026/) · [**Demo video**](https://youtu.be/Uo79pVlM6Rk) --- ## Overview EMMA is a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, and assumptions about known initial conditions, EMMA performs joint inference of **explicit parameters**, **implicit dynamical components**, and **calibration invariants** within a unified continuous-time model. The user supplies the parametric structure of the governing ODE; EMMA solves the inverse problem of recovering its parameters, along with any latent forcing and invariants, from multimodal observations. ## Key contributions - **Multi-modal dynamical parameter extraction** from video, audio, and time-series reconstructed from visual charts. - **Recovery under unobserved forcing inputs** by inferring latent actuation (e.g. wheel speed) from audio. - **Estimation of implicit dynamics** associated with unmeasured physical effects (e.g. frictional drag). - **Invariant calibration from raw video**, eliminating assumptions about known initial conditions or coordinate frames. - **Extensive validation** on 100+ scenarios: Delfys benchmark (75 videos), real-world rover and quadrotor, and simulation charts. ## Architecture

EMMA architecture

EMMA follows a three-step pipeline: **Sense · Learn · Verify**. 1. **Sense.** Video, audio, and chart images are converted into time-aligned signals through modality-specific pipelines. 2. **Learn.** A Liquid Time-Constant (LTC) network models the system's latent dynamics in continuous time. 3. **Verify.** A differentiable ODE solver simulates the recovered parameters and checks them against the observations under a physics-informed loss. ## Results EMMA delivers accurate multi-parameter recovery across diverse physical systems. Full tables and ablations are in the [paper](docs/42612.pdf). | System | Parameters recovered | EMMA error | Best baseline | |--------|----------------------|------------|---------------| | Pendulum (90 cm) | Length *L*, damping *τ* | **L = 0.86 ± 0.07 m** (GT 0.90) | Delfys, PySINDy | | Torricelli (med.) | Drainage *k* | **0.0132 ± 0.0008** (GT 0.0128) | matches Delfys | | Sliding block (med.) | Angle *α*, friction *μ* | **α = 24.72°, μ = 0.205** (GT 25°, 0.20) | Delfys, PySINDy | | LED decay (med.) | γ | **0.91 ± 0.0** (GT 0.92) | matches Delfys | | Rover | 9 params (5 with known ground truth) | **8.8 % ± 1.7 %** mean error | *first work under hidden forcing* | | Quadrotor | 12 params (7 with known ground truth) | **15.9 % ± 7.4 %** mean error | *first work under hidden forcing* | | Simulation charts | Lotka-Volterra, Lorenz, F8 Crusader, HIV, AID | **>10× lower error** than PySINDy on implicit dynamics | PySINDy | Compared against **PAIG**, **NIRPI**, and **Delfys** on the video benchmarks and **PySINDy** on the chart-based simulations. ## Supported systems | Category | Systems | |----------|---------| | Delfys benchmark | Pendulum, Torricelli drainage, Sliding block, LED decay, Free fall | | Real-world platforms | Differential-drive rover (9 params), 6-DoF quadrotor (12 params) | | Simulation charts | Lotka-Volterra, Chaotic Lorenz, F8 Crusader, HIV therapy, AID (Type-1 diabetes) | ## Installation Tested with **Python 3.10+** on macOS and Linux. ```bash git clone https://github.com/ImpactLabASU/EMMA-CVPR2026.git cd EMMA-CVPR2026 python3 -m venv .venv && source .venv/bin/activate # optional but recommended pip install -r requirements.txt ``` **System tools** - [FFmpeg](https://ffmpeg.org/) on your `PATH` (MoviePy uses it for audio extraction): `brew install ffmpeg` (macOS) or `sudo apt install ffmpeg` (Ubuntu). - YOLO weights (default `yolo11m.pt`): `pip install ultralytics` then `yolo download model=yolo11m.pt`, or download from the Ultralytics releases page. - A CUDA GPU is optional; every script falls back to CPU automatically. ## Repository layout | Folder | Purpose | Entry points | | --- | --- | --- | | `Baseline/` | Physics-informed EMMA pipelines (Free Fall, LED, Pendulum, Sliding Block, Torricelli) plus ablation utilities. | `FreeFall/free_fall.py`, `LED/led.py`, `Pendulum/run-*.py`, `Sliding block/sliding_block*.py`, `Torricelli/toricelli*.py`, `architecture_ablation.py`, `run_additional_ablations.py` | | `Rover/` | Rover perception, parameter estimation, multimodal ablations, helper shell script. | `run.py`, `rover-ablation.py`, `rover_multimodal_ablation.py`, `run_rover_ablation.sh` | | `Drone/` | Drone pipeline orchestrator (vision + audio + EMMA optimization). | `new_run.py` | | `CGM/` | Continuous glucose monitor chart digitizer. | `extract_cgm_data.py` | ## Data - **Baseline datasets** come from the Delfys "Physical Parameter Prediction" set on Kaggle (https://www.kaggle.com/datasets/jaswar/physical-parameter-prediction). Download it and copy the experiment folders into `Baseline/`; the scripts discover the data automatically. - **Sample rover and drone videos** are available here: **[Dropbox](https://www.dropbox.com/scl/fo/cjiym1h53puvv2ml6o8vn/APkfhTz64DnkYkHt554ZPj0?rlkey=hw3odtpzn6vl2nsfbe4pkekcq&dl=0)**. Place them under `Rover/` and `Drone/`. ## Usage ### Baseline pipelines Each baseline follows the same recipe: 1. `cd Baseline//` 2. Edit the configuration block inside `main()`: - `video_path`: path to the source video; leave empty to reuse existing data files. - `weights_path`: YOLO weights (`yolo11m.pt` by default). - `pixel_to_meter` (Free Fall, Torricelli, Sliding Block): set from your calibration grid. - `output_folder`: a unique run directory (e.g. `run_01`); the script creates `output/` and `data/` under it. 3. Run `python3