Skip to main navigation Skip to search Skip to main content

A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

  • Yujun Cai
  • , Yiwei Wang
  • , Yiheng Zhu
  • , Tat Jen Cham
  • , Jianfei Cai
  • , Junsong Yuan
  • , Jun Liu
  • , Chuanxia Zheng
  • , Sijie Yan
  • , Henghui Ding
  • , Xiaohui Shen
  • , Ding Liu
  • , Nadia Magnenat Thalmann
  • Nanyang Technological University
  • National University of Singapore
  • ByteDance Research
  • Monash University
  • SUTD
  • Chinese University of Hong Kong
  • University of Geneva

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

67 Scopus citations

Abstract

We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal recovery. Since these tasks have different input constraints and various fidelity and diversity requirements, most existing approaches only cater to a specific task or use different architectures to address various tasks. Here we propose a unified framework based on Conditional Variational Auto-Encoder (CVAE), where we treat any arbitrary input as a masked motion series. Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series. To further allow the flexibility of manipulating the motion style of the generated series, we design an Action-Adaptive Modulation (AAM) to propagate the given semantic guidance through the whole sequence. We also introduce a cross-attention mechanism to exploit distant relations among decoder and encoder features for better realism and global consistency. We conducted extensive experiments on Human 3.6M and CMU-Mocap. The results show that our method produces coherent and realistic results for various motion synthesis tasks, with the synthesized motions distinctly adapted by the given action labels.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages11625-11635
Number of pages11
ISBN (Electronic)9781665428125
DOIs
StatePublished - 2021
Event18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada
Duration: Oct 11 2021Oct 17 2021

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499

Conference

Conference18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Country/TerritoryCanada
CityVirtual, Online
Period10/11/2110/17/21

Fingerprint

Dive into the research topics of 'A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder'. Together they form a unique fingerprint.

Cite this