Skip to main navigation Skip to search Skip to main content

DetPoseNet: Improving Multi-Person Pose Estimation via Coarse-Pose Filtering

  • Lipeng Ke
  • , Ming Ching Chang
  • , Honggang Qi
  • , Siwei Lyu
  • SUNY Buffalo
  • University at Albany, SUNY
  • University of Chinese Academy of Sciences

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

Human detection and pose estimation are essential for understanding human activities in images and videos. Mainstream multi-human pose estimation methods take a top-down approach, where human detection is first performed, then each detected person bounding box is fed into a pose estimation network. This top-down approach suffers from the early commitment of initial detections in crowded scenes and other cases with ambiguities or occlusions, leading to pose estimation failures. We propose the DetPoseNet, an end-to-end multi-human detection and pose estimation framework in a unified three-stage network. Our method consists of a coarse-pose proposal extraction sub-net, a coarse-pose based proposal filtering module, and a multi-scale pose refinement sub-net. The coarse-pose proposal sub-net extracts whole-body bounding boxes and body keypoint proposals in a single shot. The coarse-pose filtering step based on the person and keypoint proposals can effectively rule out unlikely detections, thus improving subsequent processing. The pose refinement sub-net performs cascaded pose estimation on each refined proposal region. Multi-scale supervision and multi-scale regression are used in the pose refinement sub-net to simultaneously strengthen context feature learning. Structure-aware loss and keypoint masking are applied to further improve the pose refinement robustness. Our framework is flexible to accept most existing top-down pose estimators as the role of the pose refinement sub-net in our approach. Experiments on COCO and OCHuman datasets demonstrate the effectiveness of the proposed framework. The proposed method is computationally efficient (5-6x speedup) in estimating multi-person poses with refined bounding boxes in sub-seconds.

Original languageEnglish
Pages (from-to)2782-2795
Number of pages14
JournalIEEE Transactions on Image Processing
Volume31
DOIs
StatePublished - 2022

Keywords

  • bi-directional refinement
  • coarse-pose filtering
  • COCO
  • DetPoseNet
  • Human detection
  • human pose estimation
  • keypoint masking
  • multi-scale learning
  • multi-stage joint learning
  • structure-aware loss
  • top-down
  • unified network

Fingerprint

Dive into the research topics of 'DetPoseNet: Improving Multi-Person Pose Estimation via Coarse-Pose Filtering'. Together they form a unique fingerprint.

Cite this