Skip to main navigation Skip to search Skip to main content

Synthetic Data Generation for Semantic Segmentation of Lecture Videos

  • Universidad Tecnológica Centroamericana
  • SUNY Buffalo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Lecture videos have become a great resource for students and teachers. These videos are a vast information source, but most search engines only index them by their audio. To make these videos searchable by handwritten content, it is important to develop accurate methods for analyzing such content at scale. However, training deep neural networks to their full potential requires large-scale lecture video datasets. In this paper, we use synthetic data generation to improve binarization of lecture videos. We also use it to semantically segment pixels into background, speaker, text, mathematical expressions, and graphics. Our method for synthetic data generation renders content from multiple handwritten and typeset datasets, and blends it into real images using random tight layouts and the location of the people. In addition, we also propose a mixed data approach that trains networks on two detection tasks at once: person and text. Both binarization and semantic segmentation are carried out using fully convolutional neural networks with a typical encoder-decoder architecture and residual connections. Our experiments show that pretraining on both synthetic and mixed data leads to better performance than training with real data alone. While final results are promising, more work will be needed to reduce the domain shift between synthetic and real data. Our code and data are publicly available.

Original languageEnglish
Title of host publicationFrontiers in Handwriting Recognition - 18th International Conference, ICFHR 2022, Proceedings
EditorsUtkarsh Porwal, Alicia Fornés, Faisal Shafait
PublisherSpringer Science and Business Media Deutschland GmbH
Pages468-483
Number of pages16
ISBN (Print)9783031216473
DOIs
StatePublished - 2022
Event18th International Conference on Frontiers in Handwriting Recognition, ICFHR 2022 - Hyderabad, India
Duration: Dec 4 2022Dec 7 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13639 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Frontiers in Handwriting Recognition, ICFHR 2022
Country/TerritoryIndia
CityHyderabad
Period12/4/2212/7/22

Keywords

  • Lecture videos
  • Semantic Segmentation
  • Synthetic data

Fingerprint

Dive into the research topics of 'Synthetic Data Generation for Semantic Segmentation of Lecture Videos'. Together they form a unique fingerprint.

Cite this