Abstract
The task of multi-speaker diarization involves de-tection of number of speakers and segregate the audio seg-ments corresponding to each speaker. Despite the tremendous advancements in deep learning, the problem of multi-speaker diarization is still far from achieving acceptable performance. In this work, we address the problem by first getting the timestamps employing voice activity detection and sliding window techniques. We further extract the Mel-Spectrograms / Mel-frequency Cepstral Coefficients (MFCC). We then train a Long Short-Term Memory (LSTM) network to get the audio embed dings named d-vectors. Subsequently, we employ K-Means and Spectral clustering techniques to segment all the speakers in the given audio file. We evaluate the proposed framework on publically available VoxConverse dataset and report results comparing with similar benchmarks in the existing literature. The proposed model performs better / at par with exisiting techniques despite simpler framework.
| Original language | English |
|---|---|
| Title of host publication | 3rd IEEE International Conference on Artificial Intelligence, ICAI 2023 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 164-169 |
| Number of pages | 6 |
| ISBN (Electronic) | 9798350322125 |
| DOIs | |
| State | Published - 2023 |
| Externally published | Yes |
| Event | 3rd IEEE International Conference on Artificial Intelligence, ICAI 2023 - Islamabad, Pakistan Duration: 22 Feb 2023 → 23 Feb 2023 |
Publication series
| Name | 3rd IEEE International Conference on Artificial Intelligence, ICAI 2023 |
|---|
Conference
| Conference | 3rd IEEE International Conference on Artificial Intelligence, ICAI 2023 |
|---|---|
| Country/Territory | Pakistan |
| City | Islamabad |
| Period | 22/02/23 → 23/02/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- LSTM
- neural networks
- segmentation
- speaker diarization
- spectral clustering
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Computer Vision and Pattern Recognition