Multi-Speaker Diarization using Long-Short Term Memory Network

Nayyer Aafaq, Usama Qamar, Sohaib Ali Khan, Zeashan Hameed Khan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The task of multi-speaker diarization involves de-tection of number of speakers and segregate the audio seg-ments corresponding to each speaker. Despite the tremendous advancements in deep learning, the problem of multi-speaker diarization is still far from achieving acceptable performance. In this work, we address the problem by first getting the timestamps employing voice activity detection and sliding window techniques. We further extract the Mel-Spectrograms / Mel-frequency Cepstral Coefficients (MFCC). We then train a Long Short-Term Memory (LSTM) network to get the audio embed dings named d-vectors. Subsequently, we employ K-Means and Spectral clustering techniques to segment all the speakers in the given audio file. We evaluate the proposed framework on publically available VoxConverse dataset and report results comparing with similar benchmarks in the existing literature. The proposed model performs better / at par with exisiting techniques despite simpler framework.

Original languageEnglish
Title of host publication3rd IEEE International Conference on Artificial Intelligence, ICAI 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages164-169
Number of pages6
ISBN (Electronic)9798350322125
DOIs
StatePublished - 2023
Externally publishedYes
Event3rd IEEE International Conference on Artificial Intelligence, ICAI 2023 - Islamabad, Pakistan
Duration: 22 Feb 202323 Feb 2023

Publication series

Name3rd IEEE International Conference on Artificial Intelligence, ICAI 2023

Conference

Conference3rd IEEE International Conference on Artificial Intelligence, ICAI 2023
Country/TerritoryPakistan
CityIslamabad
Period22/02/2323/02/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • LSTM
  • neural networks
  • segmentation
  • speaker diarization
  • spectral clustering

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Multi-Speaker Diarization using Long-Short Term Memory Network'. Together they form a unique fingerprint.

Cite this