Abstract
With the wide spread usage of smartphones and social media platforms, video logging is gaining an increasing popularity, especially after the advent of YouTube in 2005 with hundred millions of views per day. It has attracted interest of many people with immense emerging applications, e.g. filmmakers, journalists, product advertisers, entrepreneurs, educators and many others. Nowadays, people express and share their opinions online on various daily issues using different forms of content including texts, audios, images and videos. This study presents a multimodal approach for recognizing the speaker's age group from social media videos. Several structures of Artificial Neural Networks (ANNs) are presented and evaluated using standalone modalities. Moreover, a two-stage ensemble network is proposed to combine multiple modalities. In addition, a corpus of videos has been collected and prepared for multimodal age-group recognition with focus on Arabic language speakers. The experimental results demonstrated that combining different modalities can mitigate the limitations of unimodal recognition systems and lead to significant improvements in the results.
Original language | English |
---|---|
Pages (from-to) | 371-378 |
Number of pages | 8 |
Journal | International Journal of Advanced Computer Science and Applications |
Volume | 10 |
Issue number | 4 |
State | Published - 2019 |
Bibliographical note
Publisher Copyright:© 2018 The Science and Information (SAI) Organization Limited.
Keywords
- Acoustic features
- Age groups
- Arabic speakers
- Ensemble learning
- Information fusion
- Multimodal recognition
- Opinion mining
- Visual features
- Word embedding
ASJC Scopus subject areas
- General Computer Science