Studying and Analysis of a Vertical Web Page Classifier Based on Continuous Learning Naïve Bayes (CLNB) Algorithm

Studying and Analysis of a Vertical Web Page Classifier Based on Continuous Learning Naïve Bayes (CLNB) Algorithm

H. A. Ali, Ali I. El Desouky, Ahmed I. Saleh
Copyright: © 2009 |Pages: 45
ISBN13: 9781605666181|ISBN10: 1605666181|EISBN13: 9781605666198
DOI: 10.4018/978-1-60566-618-1.ch012
Cite Chapter Cite Chapter

MLA

Ali, H. A., et al. "Studying and Analysis of a Vertical Web Page Classifier Based on Continuous Learning Naïve Bayes (CLNB) Algorithm." Agent Technologies and Web Engineering: Applications and Systems, edited by Ghazi I. Alkhatib and David C. Rine, IGI Global, 2009, pp. 210-254. https://doi.org/10.4018/978-1-60566-618-1.ch012

APA

Ali, H. A., El Desouky, A. I., & Saleh, A. I. (2009). Studying and Analysis of a Vertical Web Page Classifier Based on Continuous Learning Naïve Bayes (CLNB) Algorithm. In G. Alkhatib & D. Rine (Eds.), Agent Technologies and Web Engineering: Applications and Systems (pp. 210-254). IGI Global. https://doi.org/10.4018/978-1-60566-618-1.ch012

Chicago

Ali, H. A., Ali I. El Desouky, and Ahmed I. Saleh. "Studying and Analysis of a Vertical Web Page Classifier Based on Continuous Learning Naïve Bayes (CLNB) Algorithm." In Agent Technologies and Web Engineering: Applications and Systems, edited by Ghazi I. Alkhatib and David C. Rine, 210-254. Hershey, PA: IGI Global, 2009. https://doi.org/10.4018/978-1-60566-618-1.ch012

Export Reference

Mendeley
Favorite

Abstract

Web page classification is considered one of the most challenging research areas. Where the Web has a huge volume of unstructured and distributed documents that are related to a variety of domains; so, considering one base for the classification tasks will be extremely difficult. In addition, the Web is full of noise that will certainly harm the classifier performance especially if it is found in the classifier training data. Generally, it will be more valued to build domain-oriented classifiers (vertical classifi- ers) to classify pages related to a specific domain and compensate those classifiers with novel learning techniques to achieve better performance. The contribution of this paper is three edged; firstly, a novel learning technique called .Continuous Learning. is introduced. Secondly, the paper presents a new trend for Web page classification by presenting the domain-oriented classifiers (vertical classifiers). A new way of applying Bayes and K-Nearest Neighbor algorithms is introduced in order to build Domain Oriented Naïve Bayes (DONB) and Domain Oriented K-Nearest Neighbor (DOKNN) classifiers. The third contribution is combining both disciplines by introducing a novel classification strategy. Such strategy adds the continuous learning ability to Bayes theorem to build a Continuous learning domain oriented Naïve Bayes (CLNB) classifier. Where the overfitting problem has a great impact on most Web page classification techniques, continuous learning can be considered as a proposed solution. It allows the classifier to adapt itself continuously for achieving better performance. The proposed classifiers are tested; experimental results have shown that CLNB demonstrates significant performance improvement over both DONB and DOKNN where its accuracy goes beyond 94.1% after testing 1000 pages.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.