Class-Based Weighted NB for Text Categorization

Class-Based Weighted NB for Text Categorization

Mahsa Paknezhad (Shiraz University of Technology, Iran) and Marzieh Ahmadzadeh (Shiraz University of Technology, Iran)
Copyright: © 2014 |Pages: 9
DOI: 10.4018/978-1-4666-5202-6.ch041

Chapter Preview



In what follows, we will review some enhancements carried out in order to improve the performance of Naïve Bayes algorithm.

Joshi and Nigam (2011) conducted Naïve Bayes classification in two different ways: flat and hierarchical. In flat classification general Naïve Bayes approach was used, but in hierarchical classification classes in the training dataset were arranged in a hierarchical order according to the relationship among classes. This approach did not decrease the training time of the algorithm, but it made classifying new documents faster since less comparison was required. Experiments showed that the hierarchical technique performed better than the flat technique except in some especial cases in which they were the same in performance.

Lee et al. (2011) proposed a feature weighting method using information gain to measure the significance of features. That is “a feature with a higher Information gain deserves higher weight”. Furthermore, in order to remove the bias toward features with a wide range of values they considered split information measure while defining the feature weight. This measure which is also utilized in decision trees such as C4.5 assigns large split information to features with a lot of values. They proved that this algorithm outperforms the regular naïve Bayesian, Tree Augmented Naïve Bayes, NBTree and decision tree.

Similarly, Turhan, and Bener (2007) proposed utilization of heuristics to improve software defection prediction performance. They examined GainRatio, InfoGain and PCA to measure the level of importance of software metrics and evaluated them by weighted Naïve Bayes classifier. The results showed that InfoGain and GainRatio outperform standard Naïve Bayes and PCA based heuristic. Generally speaking, they proved that “linear methods lack the ability to improve the performance of Naïve Bayes while non-linear methods give promising results”.

Key Terms in this Chapter

Text Categorization: Having a specific number of topics, it defines which topic a text belongs to.

Weight Adjusting: The process of determining the most optimal values for feature weights to have the least amount of wrong class prediction.

Suffix Striping: The process of reducing words to their roots.

Naïve Bayes Classifier: A classifying method based on the Bayes Theorem.

Feature Weighting: Defining weights for features, since different features can have different levels of importance in class prediction.

Feature Selection: Selecting the features which have the highest level of importance in predicting the class of a text.

Class-Based Weighted Naïve Bayes: Using Naïve Bayes classifier while considering different weights for features in different classes.

Complete Chapter List

Search this Book: