Bangla Content Categorization Using Text Based Supervised Learning Methods
Al Mostakim, Sadek
Hasan, Syeda Mahdiea
MetadataShow full item record
The widespread and increasing availability of text documents in electronic form increases the importance of using automatic methods to analyze the content of text documents. Specifically, there is a great development in Bangla content generation due to greater number of users in the recent years in social media. In this paper, we present a supervised learning based Bangla content classification method. We have created a large Bangla content dataset and made it available for use publicly. This dataset was tested using several machine learning algorithms using text based features. Our experiments showed logistic regression worked best compared to other algorithms. We have developed an online tool based on our method and made it available for content categorization at: http://samspark1-001site1.etempurl.com/. We have also made our data extraction tool and the dataset available for use of the other researchers from: https://github.com/sspaarkk/BanglaNLP.
- B.Sc Thesis/Project