Show simple item record

dc.contributor.authorChowdhury, Kibtia
dc.date.accessioned2024-11-23T05:40:40Z
dc.date.available2024-11-23T05:40:40Z
dc.date.issued2024-11-22
dc.identifier.citationCSEen_US
dc.identifier.urihttp://dspace.uiu.ac.bd/handle/52243/3086
dc.descriptionpdf fileen_US
dc.description.abstractSpeaker diarization is a fundamental task in speech processing that aims to identify and segment different speakers within an audio recording. It involves determining ”who spoke when” in a given conversation or speech. Speaker diarization has various applications, such as meeting transcription, speaker tracking in broadcast news, audio indexing, and speaker profiling in forensics. It is particularly challenging for languages with diverse phonetic characteristics, such as Bangla. In this study, we investigate speaker diarization techniques tailored specifically for Bangla conversations. We explore three feature extraction methods—Gammatonegram, Constant-Q Transform (CQT), and Mel-Frequency Cepstral Coefficients (MFCC)—combined with Gaussian Mixture Models (GMM) for clustering. Evaluation using Diarization Error Rate (DER) and various metrics reveals promising results. The Diarization Error Rate (DER) is a widely used metric in the speaker diarization community to measure the overall performance of a diarization system. It takes into account missed speaker errors, false alarm speaker errors, and speaker confusion errors. A lower DER indicates better diarization performance, with a DER of 0% representing a perfect diarization system. Among the approaches studied, the ANN+MFCC+GMM method demonstrates exceptional performance, achieving a DER of 0.193 and an accuracy of 0.807. This indicates its effectiveness in accurately identifying speakers in Bangla conversations. These findings underscore the potential of the proposed methods for Bangla speaker diarization. Future research aims to refine techniques and address Bangla-specific challenges, ultimately enhancing the accuracy and robustness of speaker diarization systems for Bangla conversations.en_US
dc.description.sponsorshipUIU CSEen_US
dc.language.isoen_USen_US
dc.publisherUnited International Universityen_US
dc.subjectSpeaker diarizationen_US
dc.subjectaudio recordingen_US
dc.subjectBanglaen_US
dc.subjectDiarization Error Rateen_US
dc.titleSpeaker Diarization from Bangla Conversationen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record