A Two-Stage Hierarchical Bilingual Emotion Recognition System Using a Hidden Markov Model and Neural Networks

  • Mohamed Deriche*
  • , Ahmed H. Abo absa
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Speech emotion recognition continues to attract a lot of research especially under mixed-language scenarios. Here, we show that emotion is language dependent and that enhanced emotion recognition systems can be built when the language is known. We propose a two-stage emotion recognition system that starts by identifying the language, followed by a dedicated language-dependent recognition system for identifying the type of emotion. The system is able to recognize accurately the four main types of emotion, namely neutral, happy, angry, and sad. These types of emotion states are widely used in practical setups. To keep the computation complexity low, we identify the language using a feature vector consisting of energies from a basic wavelet decomposition. A hidden Markov model (HMM) is then used to track the changes of this vector to identify the language, achieving recognition accuracy close to 100%. Once the language is identified, a set of speech processing features including pitch and MFCCs are used with a neural network (NN) architecture to identify the emotion type. The results show that that identifying the language first can substantially improve the overall accuracy in identifying emotions. The overall accuracy achieved with the proposed system reached more than 93%. To test the robustness of the proposed methodology, we also used a Gaussian mixture model (GMM) for both language identification and emotion recognition. Our proposed HMM-NN approach showed a better performance than the GMM-based approach. More importantly, we tested the proposed algorithm with 6 emotions which are showed that the overall accuracy continues to be excellent, while the performance of the GMM-based approach deteriorates substantially. It is worth noting that the performance we achieved is close to the one attained for single language emotion recognition systems and outperforms by far recognition systems without language identification (around 60%). The work shows the strong correlation between language and type of emotion, and can further be extended to other scenarios including gender-based, facial expression-based, and age-based emotion recognition.

Original languageEnglish
Pages (from-to)5231-5249
Number of pages19
JournalArabian Journal for Science and Engineering
Volume42
Issue number12
DOIs
StatePublished - 1 Dec 2017

Bibliographical note

Publisher Copyright:
© 2017, King Fahd University of Petroleum & Minerals.

Keywords

  • Hidden Markov model
  • Language recognition
  • Neural networks
  • Pattern recognition
  • Speech emotion recognition

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'A Two-Stage Hierarchical Bilingual Emotion Recognition System Using a Hidden Markov Model and Neural Networks'. Together they form a unique fingerprint.

Cite this