SMPLx Anything (DINOv3)
Multi-Person Expressive 3D Body Estimation from Any Video.
SMPLx Anything is a streaming multi-person SMPL-X regression model that detects, tracks, and estimates expressive 3D human body parameters (body, hands, face) from video frames. The backbone is a frozen DINOv3 ViT-B/16.
This repository contains:
- The full training/inference codebase (
src/,scripts/,configs/). - A pretrained checkpoint (
checkpoints/smplx_anything_best.pt). - An automatic data downloader that fetches the
Yong-Hoon/smplx_anythingdataset (~280 GB) on the first training run.
Highlights
- Multi-person β query-based detection handles up to 8 people per frame.
- Expressive β full SMPL-X estimation: body (22 joints) + hands (30) + face (3).
- Streaming β temporal slot memory keeps persistent track IDs across frames.
- Self-contained β
bash scripts/train_8gpu_worker.shwill download data, resume from the pretrained checkpoint, and train.
Architecture
Video frame
β
βΌ
DINOv3 ViT-B/16 (frozen) β multi-scale features [5, 7, 9, 11]
β
βΌ
Feature neck (FPN, 768-d)
β
βΌ
Query head β heatmap-based person detection, β€8 queries
β
βΌ
Temporal memory β cross-frame slot attention, persistent IDs
β
βΌ
Cross-attention decoder β person queries Γ image feature map
β
βΌ
SMPL-X head β pose (6D) + shape + translation + confidence
Per-person outputs every frame:
| Output | Description |
|---|---|
smpl_rotmat |
55 joint rotations (body 22 + hands 30 + face 3) as 3Γ3 matrices |
smpl_shape |
SMPL-X Ξ² parameters (10-d) |
smpl_transl |
Camera-space 3D translation |
smpl_scores |
Dense person-location heatmap |
smpl_person_scores |
Per-person confidence |
smpl_id |
Persistent track ID for streaming inference |
Installation
# Clone this repo
git clone https://huggingface.co/Yong-Hoon/smplx_anything_dinov3
cd smplx_anything_dinov3
# PyTorch (pick the CUDA version that matches your driver)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
# Python deps
pip install -r requirements.txt
# or, for editable installation
pip install -e .
Prerequisites
- Python β₯ 3.10
- PyTorch β₯ 2.2 with CUDA
huggingface_hub β₯ 1.0(for auto-download; included inrequirements.txt)- SMPL-X neutral model file
SMPLX_NEUTRAL.npz(download separately from smpl-x.is.tue.mpg.de due to licensing) and place it atsmplx_models/smplx/SMPLX_NEUTRAL.npzor updatemodel.smpl.smpl_model_pathin your config.
Pretrained checkpoint
checkpoints/smplx_anything_best.pt is a baseline trained on all four
sub-datasets for 212 epochs (val_loss = 0.2465). Load it for inference or
resume training from it:
import torch
from smplx_anything.models.model import SmplxAnythingModel
from smplx_anything.config import load_config
config = load_config("configs/server_8gpu_baseline.yaml")
model = SmplxAnythingModel(config)
state = torch.load("checkpoints/smplx_anything_best.pt", map_location="cpu")
model.load_state_dict(state["model"])
model.eval()
Training
One-line launch (8 GPUs, single node)
bash scripts/train_8gpu_worker.sh
The first run will:
- Download
Yong-Hoon/smplx_anything(280 GB) into/smplx_anything_datasets`).$SMPLX_DATA_ROOT(default: ` - Verify SHA-256 checksums.
- Extract the four sub-datasets in place.
- Remove the tar parts to reclaim disk (use
--keep-tar-partsto retain). - Start training.
To resume from the shipped checkpoint instead of starting from scratch:
RESUME=1 bash scripts/train_8gpu_worker.sh
To use a custom dataset location:
SMPLX_DATA_ROOT=/mnt/big-ssd/smplx bash scripts/train_8gpu_worker.sh
Pre-download data without training
python -m smplx_anything.data.hf_download \
--root "$HOME/smplx_anything_datasets"
Subset selection is supported:
python -m smplx_anything.data.hf_download \
--root "$HOME/smplx_anything_datasets" \
--subsets processed_agora_resized processed_bedlam_resized
Custom torchrun launch
PYOPENGL_PLATFORM=egl PYTHONPATH=src torchrun \
--nproc_per_node=8 scripts/train.py \
--config configs/server_8gpu_baseline.yaml \
--data-roots $SMPLX_DATA_ROOT/processed_bedlam_resized \
$SMPLX_DATA_ROOT/processed_bedlam2_resized \
$SMPLX_DATA_ROOT/processed_agora_resized \
$SMPLX_DATA_ROOT/processed_annyone_resized \
--single-frame-roots $SMPLX_DATA_ROOT/processed_agora_resized \
$SMPLX_DATA_ROOT/processed_annyone_resized \
--output-dir runs_baseline
To skip auto-download (e.g. data is already on a mounted volume):
torchrun ... scripts/train.py --skip-data-download ...
Monitoring
tensorboard --logdir runs_baseline/tb_logs
Key training config
configs/server_8gpu_baseline.yaml β used for the shipped checkpoint.
| Parameter | Value | Description |
|---|---|---|
batch_size |
32 | per-GPU |
lr |
2e-4 β 1e-6 | cosine annealing over 500 epochs |
clip_length |
4 | temporal sequence length |
freeze_backbone |
true | DINOv3 stays frozen |
amp |
false | full fp32 |
Inference
python scripts/demo_stream.py \
--input video.mp4 \
--config configs/server_8gpu_baseline.yaml \
--checkpoint checkpoints/smplx_anything_best.pt
Project layout
smplx_anything_dinov3/
βββ configs/ # YAML training configs
βββ scripts/
β βββ train.py # training entry with DDP
β βββ train_8gpu_worker.sh # one-line 8-GPU launcher
β βββ demo_stream.py # video inference demo
β βββ preprocess_*.py # dataset preprocessing utilities
βββ src/smplx_anything/
β βββ models/ # backbone, neck, query, temporal, SMPL head
β βββ data/
β β βββ bedlam_dataset.py # multi-dataset loader
β β βββ hf_download.py # HF Hub auto-download helper
β βββ runtime/ # streaming inference state
β βββ losses.py
β βββ visualization.py
βββ datasets/ # placeholder; populated on first run
βββ checkpoints/
β βββ smplx_anything_best.pt # pretrained weights (~2.3 GB, LFS)
βββ tests/
βββ train.py # thin wrapper around scripts/train.py
βββ requirements.txt
Citation
@misc{smplx_anything_dinov3_2026,
title = {SMPLx Anything: Multi-Person Expressive 3D Body Estimation},
author = {Yong-Hoon Kwon},
year = {2026},
url = {https://huggingface.co/Yong-Hoon/smplx_anything_dinov3}
}
License
Code: see LICENSE (TBD). Pretrained weights and the linked dataset bundle
inherit the licenses of the four underlying datasets (BEDLAM, BEDLAM 2.0,
AGORA, Anny-One). Please review and comply with each upstream license before
use.