Machine-learning based classification of speech and music

M. Kashif Saeed Khan, Wasfi G. Al-Khatib*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

41 Scopus citations

Abstract

The need to classify audio into categories such as speech or music is an important aspect of many multimedia document retrieval systems. In this paper, we investigate audio features that have not been previously used in music-speech classification, such as the mean and variance of the discrete wavelet transform, the variance of Mel-frequency cepstral coefficients, the root mean square of a lowpass signal, and the difference of the maximum and minimum zero-crossings. We, then, employ fuzzy C-means clustering to the problem of selecting a viable set of features that enables better classification accuracy. Three different classification frameworks have been studied:Multi-Layer Perceptron (MLP) Neural Networks, radial basis functions (RBF) Neural Networks, and Hidden Markov Model (HMM), and results of each framework have been reported and compared. Our extensive experimentation have identified a subset of features that contributes most to accurate classification, and have shown that MLP networks are the most suitable classification framework for the problem at hand.

Original languageEnglish
Pages (from-to)55-67
Number of pages13
JournalMultimedia Systems
Volume12
Issue number1
DOIs
StatePublished - Aug 2006

Keywords

  • Audio features
  • Audio signal processing
  • Fuzzy c-means clustering
  • Hidden Markov Models
  • Neural networks
  • Speech music classification

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Machine-learning based classification of speech and music'. Together they form a unique fingerprint.

Cite this