Abstract
Communication is the way of expressing one’s feelings, ideas, and thoughts. Speech is a primary medium for communication. While people communicate with each other in several human interactive applications, such as a call center, entertainment, E-learning between teachers and students, medicine, and communication between clinicians and patients (especially important in the field of psychiatry), it is crucial to identify people’s emotions to better understand what they are feeling and how they might react in a range of situations. Automated systems are constructed to recognise emotions from analysis of speech or human voice using Artificial Intelligence (AI) or Machine Learning (ML) approaches, and these approaches are gaining momentum in recent research. This research aims to recognise a range of emotional states such as happy, sad, calm, angry, fear, disgust, surprise, or neutral from input speech signals with greater accuracy than currently seen in contemporary research. In order to achieve this aim, we have used the Support Vector Machine (SVM) classification algorithm and formed a feature vector by exploring speech features such as Mel Frequency Cepstral Coefficient (MFCC), Chroma, Mel-spectrogram, Spectral Centroid, Spectral Bandwidth, Spectral Roll-off, Root Mean Squared Energy (RMSE), and Zero Crossing Rate (ZCR) from speech signals. O. The system is tested on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Toronto Emotional Speech Set (TESS), and the Surrey Audio-Visual Expressed Emotion Database (SAVEE) datasets. Our proposed approach has achieved an overall accuracy of 99.59% on the RAVDESS dataset, 99.82% on the TESS dataset, and 98.95% on the SAVEE dataset for the SVM classifier. A mixed dataset is created from the three speech emotion datasets, which achieved significantly high classification accuracy compared with state-of-the-art methods. This model performs well on a large dataset, is ready to be tested with even bigger datasets, and can be used in a range of diverse applications, including education and clinical applications. GitHub: https://github.com/Mostafiz24/Speech-Emotion-Recognition.
| Original language | English |
|---|---|
| Title of host publication | Applied Intelligence and Informatics - 3rd International Conference, AII 2023, Revised Selected Papers |
| Editors | Mufti Mahmud, Hanene Ben-Abdallah, M. Shamim Kaiser, Muhammad Raisuddin Ahmed, Ning Zhong |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 32-46 |
| Number of pages | 15 |
| ISBN (Print) | 9783031686382 |
| DOIs | |
| State | Published - 2024 |
| Externally published | Yes |
| Event | 3rd International Conference on Applied Intelligence and Informatics, AII 2023 - Dubai, United Arab Emirates Duration: 29 Oct 2023 → 31 Oct 2023 |
Publication series
| Name | Communications in Computer and Information Science |
|---|---|
| Volume | 2065 CCIS |
| ISSN (Print) | 1865-0929 |
| ISSN (Electronic) | 1865-0937 |
Conference
| Conference | 3rd International Conference on Applied Intelligence and Informatics, AII 2023 |
|---|---|
| Country/Territory | United Arab Emirates |
| City | Dubai |
| Period | 29/10/23 → 31/10/23 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Keywords
- AI
- Emotion Recognition
- Feature Extraction
- MFCC
- ML
- RAVDESS
- RMSE
- SAVEE
- SVM
- TESS
- ZCR
ASJC Scopus subject areas
- General Computer Science
- General Mathematics