Skip to main navigation Skip to search Skip to main content

Multimodal Attentive Learning for Real-time Explainable Emotion Recognition in Conversations

  • SUNY Buffalo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Human emotion recognition plays a pivotal role in building an intelligent conversational agent for providing real-time automated support service in various problem settings. Recent research works have explored the temporal patterns in conversations to enable a comprehensive understanding of the content and context of conversations from a video clip, which does not fully leverage the multi-modal (facial expressions of the participants, speech tone, content, and context of the discussion) information and their temporal evolution. To address this, we propose a multimodal attentive learning framework that keeps track of spatio-temporal states of the participants and their conversation dynamics. By designing a novel contrastive loss-based optimization framework, the proposed method exhibits promise in identifying the emotion state of the individual speaker in real-time and can identify top-k words in the conversation that influence emotion recognition. The consistent superior performance over other state-of-the-art works in two large-scale datasets, MELD and IEMOCAP, demonstrate the feasibility of our approach.

Original languageEnglish
Title of host publicationIEEE International Symposium on Circuits and Systems, ISCAS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1210-1214
Number of pages5
ISBN (Electronic)9781665484855
DOIs
StatePublished - 2022
Event2022 IEEE International Symposium on Circuits and Systems, ISCAS 2022 - Austin, United States
Duration: May 27 2022Jun 1 2022

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
Volume2022-May
ISSN (Print)0271-4310

Conference

Conference2022 IEEE International Symposium on Circuits and Systems, ISCAS 2022
Country/TerritoryUnited States
CityAustin
Period05/27/2206/1/22

Keywords

  • Cross-modal Attention
  • Emotion Recognition
  • Explainable Decision Visualization
  • Multimodal
  • SpatioTemporal Feature Representation

Fingerprint

Dive into the research topics of 'Multimodal Attentive Learning for Real-time Explainable Emotion Recognition in Conversations'. Together they form a unique fingerprint.

Cite this