HY-Motion T2M 1.0

Tencent Hunyuan open-source text-to-motion model integrated into the hftrainer Model Zoo as a first-class HyMotionT2MBundle / HyMotionT2MPipeline.

The release ships two variants that share representation, text encoder, training recipe, and inference protocol, and differ only in MMDiT size.

Task Text-to-Motion (T2M)
Bundle / Pipeline HyMotionT2MBundle / HyMotionT2MPipeline
Processed HF artifact full, lite
Architecture HunyuanMotionMMDiT flow matching, Euler ODE, 50 steps, CFG scale 5.0
Native representation 201-dim HY-Motion feature at 30 fps; hftrainer renders/scores from the motion_135 slice
Text encoder Qwen3-8B token context + CLIP-L sentence embedding, frozen and stored in the hftrainer artifact
Original weights tencent/HY-Motion-1.0, mirrored locally under checkpoints/HY-Motion-1.0/

Weights

Self-contained hftrainer artifacts are stored locally and reload with HyMotionT2MBundle.from_pretrained. They include the motion transformer, classifier-free null embeddings, 201-dim Mean/Std stats, and frozen Qwen3-8B / CLIP-L text encoder directories. The artifact writes model_index.json metadata for hftrainer discovery, but it is not a native diffusers DiffusionPipeline repo.

Variant Local artifact Processed Hugging Face artifact Contents
HY-Motion T2M 1.0 checkpoints/hymotion_t2m/1.0b ZeyuLing/hftrainer-hymotion-t2m-1.0 motion_transformer.safetensors, hymotion_t2m_config.json, model_index.json, Mean.npy, Std.npy, text_encoder/llm/, text_encoder/sentence/
HY-Motion T2M 1.0-Lite checkpoints/hymotion_t2m/0.46b ZeyuLing/hftrainer-hymotion-t2m-1.0-lite same layout

Use a local artifact:

from hftrainer.pipelines.motion.hymotion_t2m_pipeline import HyMotionT2MPipeline

pipe = HyMotionT2MPipeline.from_pretrained(
    "checkpoints/hymotion_t2m/1.0b",
    device="cuda",
    text_dtype="bf16",
    num_steps=50,
    text_guidance_scale=5.0,
    should_apply_smoothing=True,
)
out = pipe({"caption": ["a person walks forward."], "num_frames": [196]})
rot6d = out["rot6d"]      # (B, T, 22, 6)
transl = out["transl"]    # (B, T, 3)

For the HF artifact:

from hftrainer.pipelines.motion.hymotion_t2m_pipeline import HyMotionT2MPipeline

pipe = HyMotionT2MPipeline.from_pretrained(
    "ZeyuLing/hftrainer-hymotion-t2m-1.0",
    device="cuda",
    text_dtype="bf16",
    num_steps=50,
    text_guidance_scale=5.0,
    should_apply_smoothing=True,
)
out = pipe({"caption": ["a person walks forward."], "num_frames": [196]})
rot6d = out["rot6d"]      # (B, T, 22, 6)
transl = out["transl"]    # (B, T, 3)

The same config can be reconstructed without loading weights:

cfg_bundle = HyMotionT2MBundle.from_config(
    "checkpoints/hymotion_t2m/1.0b/hymotion_t2m_config.json"
)
assert not cfg_bundle.text_encoder_requires_external_weights()

Artifacts are produced with:

python3 scripts/eval/convert_hymotion_checkpoint.py \
    --out_dir checkpoints/hymotion_t2m/1.0b --variant 1.0b --verify

python3 scripts/eval/convert_hymotion_checkpoint.py \
    --config configs/hymotion_t2m/hymotion_t2m_201dim_046b.py \
    --ckpt checkpoints/HY-Motion-1.0/HY-Motion-1.0-Lite/latest.ckpt \
    --out_dir checkpoints/hymotion_t2m/0.46b --variant 0.46b --verify

Variants

HY-Motion T2M 1.0 HY-Motion T2M 1.0-Lite
feat_dim 1280 1024
num_layers 27 18
num_heads 20 16
input_dim / output_dim 201 / 201 201 / 201
config configs/hymotion_t2m/hymotion_t2m_201dim_full.py configs/hymotion_t2m/hymotion_t2m_201dim_046b.py
upstream checkpoint checkpoints/HY-Motion-1.0/HY-Motion-1.0/latest.ckpt checkpoints/HY-Motion-1.0/HY-Motion-1.0-Lite/latest.ckpt

Evaluation Protocol

Published Model-Zoo metrics use the official HY-Motion inference path:

  • CFG scale 5.0
  • 50 Euler ODE steps
  • MMDiT / ODE / null embeddings / Mean-Std in fp32
  • text encoder in bf16, with text features upcast to fp32 before MMDiT
  • decode smoothing enabled: SLERP on rot6d and Savitzky-Golay on root translation

HY-Motion outputs SMPL motion_135; for MotionStreamer comparison it is encoded to MotionStreamer-272 and scored with MotionStreamer272Evaluator. HumanML3D-263 cross-eval converts the same indexed MS272 predictions and paired MS272 GT clips through motion272_to_hml263, then scores with HumanML263Evaluator.

MotionStreamer-272 Evaluator

HY-Motion T2M 1.0 smooth full HumanML3D test run: outputs/evaluation/hymotion_h3d272/metrics_smooth.json.

Metric HY-Motion T2M 1.0 MS272 GT/Real
FID ↓ 16.021 0.000
R-Precision Top-1 ↑ 0.737 0.706
R-Precision Top-2 ↑ 0.881 0.857
R-Precision Top-3 ↑ 0.929 0.911
MM-Dist ↓ 14.789 15.007
Diversity β†’ 27.187 27.367

HY-Motion T2M 1.0-Lite MS272 metrics are pending outputs/evaluation/hymotion_h3d272/metrics_lite_smooth.json.

HumanML3D-263 Cross-Eval

HY-Motion T2M 1.0 smooth cross-eval: outputs/evaluation/hymotion_h3d272/metrics_smooth_h3d263.json.

This is not a native HumanML3D-263 generation run. It converts the indexed MotionStreamer-272 predictions and their paired MS272 GT clips to HML263 and scores the aligned population.

Metric HY-Motion T2M 1.0 Converted GT/Real
FID ↓ 0.103 0.000
R-Precision Top-1 ↑ 0.561 0.522
R-Precision Top-2 ↑ 0.761 0.725
R-Precision Top-3 ↑ 0.853 0.823
MM-Dist ↓ 2.532 2.691
Diversity β†’ 10.031 9.876

Run details: n_samples = 7340, n_repeats = 20, caption_selection = first, drop_last = true.

HY-Motion T2M 1.0-Lite HML263 metrics are pending outputs/evaluation/hymotion_h3d272/metrics_lite_smooth_h3d263.json.

Reproduce the full-variant cross-eval:

python3 scripts/eval/eval_272dir_h3d263.py \
    --pred_dir outputs/evaluation/hymotion_h3d272/hy_272_smooth \
    --out_json outputs/evaluation/hymotion_h3d272/metrics_smooth_h3d263.json \
    --with_fid --workers 16 --caption_selection first

Implementation Notes

  • HyMotionT2MBundle.save_pretrained writes a self-contained hftrainer artifact with motion_transformer.safetensors, null CFG embeddings, Mean/Std, text_encoder/llm/, text_encoder/sentence/, and model_index.json metadata. Passing include_text_encoder=False is a legacy lightweight export mode and is not used for Model-Zoo publishing.
  • HyMotionT2MBundle.from_pretrained accepts a local path or HF Hub id and keeps text encoder loading lazy; new artifacts resolve Qwen3-8B and CLIP-L from the artifact-local text_encoder/ directories.
  • HyMotionT2MBundle.from_config accepts either the raw bundle config or the saved hymotion_t2m_config.json, matching the hftrainer ModelBundle API.
  • Raw/no-smoothing outputs are diagnostic only. The previous bf16-ODE / wrong-CFG metrics are deprecated and must not be used for Model-Zoo reporting.
  • The h3d272 suffix in scripts and output paths is historical; the evaluator space is MotionStreamer-272 on the HumanML3D test split.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support