Show simple item record

dc.contributor.authorHossain, Md. Parvez
dc.date.accessioned2026-01-12T04:44:18Z
dc.date.available2026-01-12T04:44:18Z
dc.date.issued2026-01-12
dc.identifier.citationCSEen_US
dc.identifier.urihttp://dspace.uiu.ac.bd/handle/52243/3391
dc.descriptionCSE UIUen_US
dc.description.abstractA combination of Bangla and English words is commonly used, particularly on social media. This tendency greatly hampers the next generation’s ability to learn Bangla. This study suggests an approach for identifying words in Bangla texts that are both English and Bangla. This study also translates the identified English terms into standard Bangla words. The Transformer architecture, which uses an attention mechanism to identify the connections between words and their contexts inside a text, is the foundation of bidirectional encoder representations from transformers (BERT). In this study, we use the training input dataset to modify the BERT-base-NER model. For the name entity recognition (NER) task, the proposed BERT-base-NER model in this study achieves state-of-the-art performance. For both the training and testing scenarios, we employ a holdout cross-validation procedure. We used 80% of the entire data for training and 20% for testing. We use the Google Translate API (application programming interface) to translate the identified English words into standard Bangla words. In order to assess the modified BERT-base-NER model, we applied the input dataset to the current machine learning (ML) and deep learning (DL) techniques. Support vector machines (SVM) and Naive Bayes (NB) are two components of the machine learning approach. Conversely, the DL method uses bidirectional LSTM (BiLSTM), long short-term memory (LSTM), and convolutional neural network (CNN). The improved BERT-base-NER model is highly accurate and efficient at identifying Bangla and English words, according to simulation data. With an accuracy of 95%, the proposed BERT-base-NER model achieves the best result among the current methods. For Bangla–English code-mixed text, this study presents a reliable BERT-based word-level language identification system that successfully resolves Banglish ambiguity and allows downstream Bangla language processing applications such as standard Bangla conversion, machine translation, and information extraction.en_US
dc.description.sponsorshipCSE UIUen_US
dc.language.isoen_USen_US
dc.publisherUIUen_US
dc.subjectBangla and English words, standard Bangla words, BERT-base-NER model, Google Translate APIen_US
dc.titleRecognition of Bangla and English Words in Bangla Texts Using a Modified BERT-base-NER Modelen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record