Multi-modal emotion recognition with temporal-band attention based on LSTM-RNN

J Liu, Y Su, Y Liu�- Advances in Multimedia Information Processing–PCM�…, 2018 - Springer
J Liu, Y Su, Y Liu
Advances in Multimedia Information Processing–PCM 2017: 18th Pacific-Rim�…, 2018Springer
Emotion recognition is a key problem in Human-Computer Interaction (HCI). The multi-
modal emotion recognition was discussed based on untrimmed visual signals and EEG
signals in this paper. We propose a model with two attention mechanisms based on multi-
layer Long short-term memory recurrent neural network (LSTM-RNN) for emotion
recognition, which combines temporal attention and band attention. At each time step, the
LSTM-RNN takes the video and EEG slice as inputs and generate representations of two�…
Abstract
Emotion recognition is a key problem in Human-Computer Interaction (HCI). The multi-modal emotion recognition was discussed based on untrimmed visual signals and EEG signals in this paper. We propose a model with two attention mechanisms based on multi-layer Long short-term memory recurrent neural network (LSTM-RNN) for emotion recognition, which combines temporal attention and band attention. At each time step, the LSTM-RNN takes the video and EEG slice as inputs and generate representations of two signals, which are fed into a multi-modal fusion unit. Based on the fusion, our network predicts the emotion label and the next time slice for analyzing. Within the process, the model applies different levels of attention to different frequency bands of EEG signals through the band attention. With the temporal attention, it determines where to analyze next signal in order to suppress the redundant information for recognition. Experiments on Mahnob-HCI database demonstrate the encouraging results; the proposed method achieves higher accuracy and boosts the computational efficiency.
Springer