Applied Mathematics & Information Sciences

Multi-Modal Emotion Recognition Fusing Video and Audio

Chao Xu, School of Computer Software, Tianjin University, 300072 Tianjin, China
Pufeng Du
Zhiyong Feng
Zhaopeng Meng
Tianyi Cao
Caichao Dong

Author Country (or Countries)

China

Abstract

Emotion plays an important role in human communications. We construct a framework for multi-modal fusion emotion recognition. Facial expression features and speech features are respectively extracted from image sequences and speech signals. In order to locate and track facial feature points, we construct an Active Appearance Model for facial images with all kinds of expressions. Facial Animation Parameters are calculated from motions of facial feature points as expression features. We extract short-term mean energy, fundamental frequency and formant frequencies from each frame as speech features. An emotion classifier is designed to fuse facial expression and speech based on Hidden Markov Models and Multi-layer Perceptron. Experiments indicate that multi-modal fusion emotion recognition algorithm which is presented in this paper has relatively high recognition accuracy. The proposed approach has better performance and robustness than methods using only video or audio separately.

Suggested Reviewers

N/A

Recommended Citation

Xu, Chao; Du, Pufeng; Feng, Zhiyong; Meng, Zhaopeng; Cao, Tianyi; and Dong, Caichao (2013) "Multi-Modal Emotion Recognition Fusing Video and Audio," Applied Mathematics & Information Sciences: Vol. 07: Iss. 2, Article 4.
Available at: https://digitalcommons.aaru.edu.jo/amis/vol07/iss2/4

Download

COinS

Applied Mathematics & Information Sciences

Multi-Modal Emotion Recognition Fusing Video and Audio

Authors

Author Country (or Countries)

Abstract

Suggested Reviewers

Recommended Citation

Share

Search