You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SMPLx Anything (DINOv3)

Multi-Person Expressive 3D Body Estimation from Any Video.

SMPLx Anything is a streaming multi-person SMPL-X regression model that detects, tracks, and estimates expressive 3D human body parameters (body, hands, face) from video frames. The backbone is a frozen DINOv3 ViT-B/16.

This repository contains:

The full training/inference codebase (src/, scripts/, configs/).
A pretrained checkpoint (checkpoints/smplx_anything_best.pt).
An automatic data downloader that fetches the Yong-Hoon/smplx_anything dataset (~280 GB) on the first training run.

Highlights

Multi-person — query-based detection handles up to 8 people per frame.
Expressive — full SMPL-X estimation: body (22 joints) + hands (30) + face (3).
Streaming — temporal slot memory keeps persistent track IDs across frames.
Self-contained — bash scripts/train_8gpu_worker.sh will download data, resume from the pretrained checkpoint, and train.

Architecture

Video frame
    │
    ▼
DINOv3 ViT-B/16 (frozen)        ← multi-scale features [5, 7, 9, 11]
    │
    ▼
Feature neck (FPN, 768-d)
    │
    ▼
Query head                       ← heatmap-based person detection, ≤8 queries
    │
    ▼
Temporal memory                  ← cross-frame slot attention, persistent IDs
    │
    ▼
Cross-attention decoder          ← person queries × image feature map
    │
    ▼
SMPL-X head                      ← pose (6D) + shape + translation + confidence

Per-person outputs every frame:

Output	Description
`smpl_rotmat`	55 joint rotations (body 22 + hands 30 + face 3) as 3×3 matrices
`smpl_shape`	SMPL-X β parameters (10-d)
`smpl_transl`	Camera-space 3D translation
`smpl_scores`	Dense person-location heatmap
`smpl_person_scores`	Per-person confidence
`smpl_id`	Persistent track ID for streaming inference

Installation

# Clone this repo
git clone https://huggingface.co/Yong-Hoon/smplx_anything_dinov3
cd smplx_anything_dinov3

# PyTorch (pick the CUDA version that matches your driver)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124

# Python deps
pip install -r requirements.txt
# or, for editable installation
pip install -e .

Prerequisites

Python ≥ 3.10
PyTorch ≥ 2.2 with CUDA
huggingface_hub ≥ 1.0 (for auto-download; included in requirements.txt)
SMPL-X neutral model file SMPLX_NEUTRAL.npz (download separately from smpl-x.is.tue.mpg.de due to licensing) and place it at smplx_models/smplx/SMPLX_NEUTRAL.npz or update model.smpl.smpl_model_path in your config.

Pretrained checkpoint

checkpoints/smplx_anything_best.pt is a baseline trained on all four sub-datasets for 212 epochs (val_loss = 0.2465). Load it for inference or resume training from it:

import torch
from smplx_anything.models.model import SmplxAnythingModel
from smplx_anything.config import load_config

config = load_config("configs/server_8gpu_baseline.yaml")
model = SmplxAnythingModel(config)
state = torch.load("checkpoints/smplx_anything_best.pt", map_location="cpu")
model.load_state_dict(state["model"])
model.eval()

Training

One-line launch (8 GPUs, single node)

bash scripts/train_8gpu_worker.sh

The first run will:

Download Yong-Hoon/smplx_anything (~~280 GB) into $SMPLX_DATA_ROOT (default: `~~/smplx_anything_datasets`).
Verify SHA-256 checksums.
Extract the four sub-datasets in place.
Remove the tar parts to reclaim disk (use --keep-tar-parts to retain).
Start training.

To resume from the shipped checkpoint instead of starting from scratch:

RESUME=1 bash scripts/train_8gpu_worker.sh

To use a custom dataset location:

SMPLX_DATA_ROOT=/mnt/big-ssd/smplx bash scripts/train_8gpu_worker.sh

Pre-download data without training

python -m smplx_anything.data.hf_download \
    --root "$HOME/smplx_anything_datasets"

Subset selection is supported:

python -m smplx_anything.data.hf_download \
    --root "$HOME/smplx_anything_datasets" \
    --subsets processed_agora_resized processed_bedlam_resized

Custom torchrun launch

PYOPENGL_PLATFORM=egl PYTHONPATH=src torchrun \
    --nproc_per_node=8 scripts/train.py \
    --config configs/server_8gpu_baseline.yaml \
    --data-roots $SMPLX_DATA_ROOT/processed_bedlam_resized \
                 $SMPLX_DATA_ROOT/processed_bedlam2_resized \
                 $SMPLX_DATA_ROOT/processed_agora_resized \
                 $SMPLX_DATA_ROOT/processed_annyone_resized \
    --single-frame-roots $SMPLX_DATA_ROOT/processed_agora_resized \
                         $SMPLX_DATA_ROOT/processed_annyone_resized \
    --output-dir runs_baseline

To skip auto-download (e.g. data is already on a mounted volume):

torchrun ... scripts/train.py --skip-data-download ...

Monitoring

tensorboard --logdir runs_baseline/tb_logs

Key training config

configs/server_8gpu_baseline.yaml — used for the shipped checkpoint.

Parameter	Value	Description
`batch_size`	32	per-GPU
`lr`	2e-4 → 1e-6	cosine annealing over 500 epochs
`clip_length`	4	temporal sequence length
`freeze_backbone`	true	DINOv3 stays frozen
`amp`	false	full fp32

Inference

python scripts/demo_stream.py \
    --input video.mp4 \
    --config configs/server_8gpu_baseline.yaml \
    --checkpoint checkpoints/smplx_anything_best.pt

Project layout

smplx_anything_dinov3/
├── configs/                       # YAML training configs
├── scripts/
│   ├── train.py                   # training entry with DDP
│   ├── train_8gpu_worker.sh       # one-line 8-GPU launcher
│   ├── demo_stream.py             # video inference demo
│   └── preprocess_*.py            # dataset preprocessing utilities
├── src/smplx_anything/
│   ├── models/                    # backbone, neck, query, temporal, SMPL head
│   ├── data/
│   │   ├── bedlam_dataset.py      # multi-dataset loader
│   │   └── hf_download.py         # HF Hub auto-download helper
│   ├── runtime/                   # streaming inference state
│   ├── losses.py
│   └── visualization.py
├── datasets/                      # placeholder; populated on first run
├── checkpoints/
│   └── smplx_anything_best.pt     # pretrained weights (~2.3 GB, LFS)
├── tests/
├── train.py                       # thin wrapper around scripts/train.py
└── requirements.txt

Citation

@misc{smplx_anything_dinov3_2026,
  title  = {SMPLx Anything: Multi-Person Expressive 3D Body Estimation},
  author = {Yong-Hoon Kwon},
  year   = {2026},
  url    = {https://huggingface.co/Yong-Hoon/smplx_anything_dinov3}
}

License

Code: see LICENSE (TBD). Pretrained weights and the linked dataset bundle inherit the licenses of the four underlying datasets (BEDLAM, BEDLAM 2.0, AGORA, Anny-One). Please review and comply with each upstream license before use.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Yong-Hoon
/

smplx_anything_dinov3