Text Classification Using Self-Structure Extended Multinomial Naive Bayes

Text Classification Using Self-Structure Extended Multinomial Naive Bayes

Arun Solanki (Gautam Buddha University, India) and Rajat Saxena (Gautam Buddha University, India)
DOI: 10.4018/978-1-5225-9643-1.ch006

Abstract

With the advent of neural networks and its subfields like deep neural networks and convolutional neural networks, it is possible to make text classification predictions with high accuracy. Among the many subtypes of naive Bayes, multinomial naive Bayes is used for text classification. Many attempts have been made to somehow develop an algorithm that uses the simplicity of multinomial naive Bayes and at the same time incorporates feature dependency. One such effort was put in structure extended multinomial naive Bayes, which uses one-dependence estimators to inculcate dependencies. Basically, one-dependence estimators take one of the attributes as features and all other attributes as its child. This chapter proposes self structure extended multinomial naïve Bayes, which presents a hybrid model, a combination of the multinomial naive Bayes and structure extended multinomial naive Bayes. Basically, it tries to classify the instances that were misclassified by structure extended multinomial naive Bayes as there was no direct dependency between attributes.
Chapter Preview
Top

Literature Survey

Text classification using MNB (McCallum, A. et al., 1998) assumed that all the attributes or features were independent of each other. However, this assumption was seldom valid in real life. If authors go through any document, the authors found that many words were related to each other in some way or the other. This naive assumption of MNB was the prime reason for it not giving high accuracy. According to it, all the attributes were connected to only the class node and not to each other. However, this limitation can be overcome if try to extend its structure.

Extending the structure allows authors to connect attribute nodes. This could help authors to classify real-life documents and improve accuracy. However, structure extension was challenging if authors have documents that contain a vast number of attributes. An extensive dimensional data can take tremendous time and vast space complexity (Chickering, D.M., 1996).

Complete Chapter List

Search this Book:
Reset