Abstract
We tackle the challenging problem of human activity recognition in realistic video sequences. Unlike local features-based methods or global template-based methods, we propose to represent a video sequence by a set of middle-level parts. A part, or component, has consistent spatial structure and consistent motion. We first segment the visual motion patterns and generate a set of middle-level components by clustering keypoints-based trajectories extracted from the video. To further exploit the interdependencies of the moving parts, we then define spatio-temporal relationships between pairwise components. The resulting descriptive middle-level components and pairwise-components thereby catch the essential motion characteristics of human activities. They also give a very compact representation of the video. We apply our framework on popular and challenging video datasets: Weizmann dataset and UT-Interaction dataset. We demonstrate experimentally that our middle-level representation combined with a χ 2-SVM classifier equals to or outperforms the state-of-the-art results on these dataset.
| Original language | English |
|---|---|
| Pages (from-to) | 168-180 |
| Number of pages | 13 |
| Journal | Lecture Notes in Computer Science |
| Volume | 6553 LNCS |
| Issue number | PART 1 |
| DOIs | |
| State | Published - 2012 |
| Event | 11th European Conference on Computer Vision, ECCV 2010 - Heraklion, Crete, Greece Duration: Sep 10 2010 → Sep 11 2010 |
Fingerprint
Dive into the research topics of 'Middle-level representation for human activities recognition: The role of spatio-temporal relationships'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver