Recognition of Bangla and English Words in Bangla Texts Using a Modified BERT-base-NER Model

UIU Institutional Repository

    • Login
    View Item 
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Recognition of Bangla and English Words in Bangla Texts Using a Modified BERT-base-NER Model

    Thumbnail
    View/Open
    UIU_MSCSE_Thesis_Parvez_Final_Report.pdf (1.280Mb)
    Date
    2026-01-12
    Author
    Hossain, Md. Parvez
    Metadata
    Show full item record
    Abstract
    A combination of Bangla and English words is commonly used, particularly on social media. This tendency greatly hampers the next generation’s ability to learn Bangla. This study suggests an approach for identifying words in Bangla texts that are both English and Bangla. This study also translates the identified English terms into standard Bangla words. The Transformer architecture, which uses an attention mechanism to identify the connections between words and their contexts inside a text, is the foundation of bidirectional encoder representations from transformers (BERT). In this study, we use the training input dataset to modify the BERT-base-NER model. For the name entity recognition (NER) task, the proposed BERT-base-NER model in this study achieves state-of-the-art performance. For both the training and testing scenarios, we employ a holdout cross-validation procedure. We used 80% of the entire data for training and 20% for testing. We use the Google Translate API (application programming interface) to translate the identified English words into standard Bangla words. In order to assess the modified BERT-base-NER model, we applied the input dataset to the current machine learning (ML) and deep learning (DL) techniques. Support vector machines (SVM) and Naive Bayes (NB) are two components of the machine learning approach. Conversely, the DL method uses bidirectional LSTM (BiLSTM), long short-term memory (LSTM), and convolutional neural network (CNN). The improved BERT-base-NER model is highly accurate and efficient at identifying Bangla and English words, according to simulation data. With an accuracy of 95%, the proposed BERT-base-NER model achieves the best result among the current methods. For Bangla–English code-mixed text, this study presents a reliable BERT-based word-level language identification system that successfully resolves Banglish ambiguity and allows downstream Bangla language processing applications such as standard Bangla conversion, machine translation, and information extraction.
    URI
    http://dspace.uiu.ac.bd/handle/52243/3391
    Collections
    • M.Sc Thesis/Project [163]

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS