Speaker Diarization from Bangla Conversation

UIU Institutional Repository

    • Login
    View Item 
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Speaker Diarization from Bangla Conversation

    Thumbnail
    View/Open
    012211056_Thesis_Book.pdf (959.2Kb)
    Date
    2024-11-22
    Author
    Chowdhury, Kibtia
    Metadata
    Show full item record
    Abstract
    Speaker diarization is a fundamental task in speech processing that aims to identify and segment different speakers within an audio recording. It involves determining ”who spoke when” in a given conversation or speech. Speaker diarization has various applications, such as meeting transcription, speaker tracking in broadcast news, audio indexing, and speaker profiling in forensics. It is particularly challenging for languages with diverse phonetic characteristics, such as Bangla. In this study, we investigate speaker diarization techniques tailored specifically for Bangla conversations. We explore three feature extraction methods—Gammatonegram, Constant-Q Transform (CQT), and Mel-Frequency Cepstral Coefficients (MFCC)—combined with Gaussian Mixture Models (GMM) for clustering. Evaluation using Diarization Error Rate (DER) and various metrics reveals promising results. The Diarization Error Rate (DER) is a widely used metric in the speaker diarization community to measure the overall performance of a diarization system. It takes into account missed speaker errors, false alarm speaker errors, and speaker confusion errors. A lower DER indicates better diarization performance, with a DER of 0% representing a perfect diarization system. Among the approaches studied, the ANN+MFCC+GMM method demonstrates exceptional performance, achieving a DER of 0.193 and an accuracy of 0.807. This indicates its effectiveness in accurately identifying speakers in Bangla conversations. These findings underscore the potential of the proposed methods for Bangla speaker diarization. Future research aims to refine techniques and address Bangla-specific challenges, ultimately enhancing the accuracy and robustness of speaker diarization systems for Bangla conversations.
    URI
    http://dspace.uiu.ac.bd/handle/52243/3086
    Collections
    • M.Sc Thesis/Project [151]

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS