Speaker Diarization from Bangla Conversation

Chowdhury, Kibtia

View/Open

012211056_Thesis_Book.pdf (959.2Kb)

Date

2024-11-22

Author

Chowdhury, Kibtia

Metadata

Show full item record

Abstract

Speaker diarization is a fundamental task in speech processing that aims to identify and segment different speakers within an audio recording. It involves determining ”who spoke when” in a given conversation or speech. Speaker diarization has various applications, such as meeting transcription, speaker tracking in broadcast news, audio indexing, and speaker profiling in forensics. It is particularly challenging for languages with diverse phonetic characteristics, such as Bangla. In this study, we investigate speaker diarization techniques tailored specifically for Bangla conversations. We explore three feature extraction methods—Gammatonegram, Constant-Q Transform (CQT), and Mel-Frequency Cepstral Coefficients (MFCC)—combined with Gaussian Mixture Models (GMM) for clustering. Evaluation using Diarization Error Rate (DER) and various metrics reveals promising results. The Diarization Error Rate (DER) is a widely used metric in the speaker diarization community to measure the overall performance of a diarization system. It takes into account missed speaker errors, false alarm speaker errors, and speaker confusion errors. A lower DER indicates better diarization performance, with a DER of 0% representing a perfect diarization system. Among the approaches studied, the ANN+MFCC+GMM method demonstrates exceptional performance, achieving a DER of 0.193 and an accuracy of 0.807. This indicates its effectiveness in accurately identifying speakers in Bangla conversations. These findings underscore the potential of the proposed methods for Bangla speaker diarization. Future research aims to refine techniques and address Bangla-specific challenges, ultimately enhancing the accuracy and robustness of speaker diarization systems for Bangla conversations.

URI

http://dspace.uiu.ac.bd/handle/52243/3086

Collections

M.Sc Thesis/Project [166]