Skip to main navigation Skip to search Skip to main content

Content extraction from lecture video via speaker action classification based on pose information

  • SUNY Buffalo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.

Original languageEnglish
Title of host publicationProceedings - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
PublisherIEEE Computer Society
Pages1047-1054
Number of pages8
ISBN (Electronic)9781728128610
DOIs
StatePublished - Sep 2019
Event15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019 - Sydney, Australia
Duration: Sep 20 2019Sep 25 2019

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
ISSN (Print)1520-5363

Conference

Conference15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
Country/TerritoryAustralia
CitySydney
Period09/20/1909/25/19

Keywords

  • Action Classification
  • Key-frame Extraction
  • Lecture Video Summarization
  • Temporal Segmentation

Fingerprint

Dive into the research topics of 'Content extraction from lecture video via speaker action classification based on pose information'. Together they form a unique fingerprint.

Cite this