TY - GEN
T1 - Active Multi-Modal Approach for Enhanced User Recognition in Social Robots
AU - Kale, Ninad
AU - Ratha, Nalini
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The field of Human-Robot Interaction (HRI) is swiftly expanding, driven by notable advancements in artificial intelligence (AI). Humanoid robots, now capable of being equipped with advanced AI models, are being considered for a wide array of applications due to their ability to perform complex tasks and interact with humans in both natural and intelligent ways. In the domain of social robotics, the capability to selectively interact with authorized users is crucial for ensuring security and providing personalized user experiences. Unimodal user recognition methods, such as audio-based user recognition, are commonly used in social robots. However, these methods can be susceptible to ambient noise and might exhibit reduced accuracy. Although face recognition modality is often employed to enhance accuracy, the majority of audio-visual person recognition methods are trained on datasets with only a single user in the frame.This paper introduces a method for audio-visual user recognition with multiple users in the frame, utilizing an additional sound localization modality. The proposed method is evaluated using a dataset created from interactions between the social robot Pepper and multiple users. The results demonstrated that the proposed method significantly outperformed unimodal user recognition methods.
AB - The field of Human-Robot Interaction (HRI) is swiftly expanding, driven by notable advancements in artificial intelligence (AI). Humanoid robots, now capable of being equipped with advanced AI models, are being considered for a wide array of applications due to their ability to perform complex tasks and interact with humans in both natural and intelligent ways. In the domain of social robotics, the capability to selectively interact with authorized users is crucial for ensuring security and providing personalized user experiences. Unimodal user recognition methods, such as audio-based user recognition, are commonly used in social robots. However, these methods can be susceptible to ambient noise and might exhibit reduced accuracy. Although face recognition modality is often employed to enhance accuracy, the majority of audio-visual person recognition methods are trained on datasets with only a single user in the frame.This paper introduces a method for audio-visual user recognition with multiple users in the frame, utilizing an additional sound localization modality. The proposed method is evaluated using a dataset created from interactions between the social robot Pepper and multiple users. The results demonstrated that the proposed method significantly outperformed unimodal user recognition methods.
UR - https://www.scopus.com/pages/publications/85182025270
U2 - 10.1109/WNYISPW60588.2023.10349468
DO - 10.1109/WNYISPW60588.2023.10349468
M3 - Conference contribution
AN - SCOPUS:85182025270
T3 - 2023 IEEE Western New York Image and Signal Processing Workshop, WNYISPW 2023
BT - 2023 IEEE Western New York Image and Signal Processing Workshop, WNYISPW 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE Western New York Image and Signal Processing Workshop, WNYISPW 2023
Y2 - 3 November 2023
ER -