Studying and Analysis of a Vertical Web Page Classifier Based on Continuous Learning Naïve Bayes (CLNB) Algorithm

Studying and Analysis of a Vertical Web Page Classifier Based on Continuous Learning Naïve Bayes (CLNB) Algorithm

H. A. Ali (Mansoura University, Egypt), Ali I. El Desouky (Mansoura University, Egypt) and Ahmed I. Saleh (Mansoura University, Egypt)
Copyright: © 2009 |Pages: 45
DOI: 10.4018/978-1-60566-618-1.ch012
OnDemand PDF Download:


Web page classification is considered one of the most challenging research areas. Where the Web has a huge volume of unstructured and distributed documents that are related to a variety of domains; so, considering one base for the classification tasks will be extremely difficult. In addition, the Web is full of noise that will certainly harm the classifier performance especially if it is found in the classifier training data. Generally, it will be more valued to build domain-oriented classifiers (vertical classifi- ers) to classify pages related to a specific domain and compensate those classifiers with novel learning techniques to achieve better performance. The contribution of this paper is three edged; firstly, a novel learning technique called .Continuous Learning. is introduced. Secondly, the paper presents a new trend for Web page classification by presenting the domain-oriented classifiers (vertical classifiers). A new way of applying Bayes and K-Nearest Neighbor algorithms is introduced in order to build Domain Oriented Naïve Bayes (DONB) and Domain Oriented K-Nearest Neighbor (DOKNN) classifiers. The third contribution is combining both disciplines by introducing a novel classification strategy. Such strategy adds the continuous learning ability to Bayes theorem to build a Continuous learning domain oriented Naïve Bayes (CLNB) classifier. Where the overfitting problem has a great impact on most Web page classification techniques, continuous learning can be considered as a proposed solution. It allows the classifier to adapt itself continuously for achieving better performance. The proposed classifiers are tested; experimental results have shown that CLNB demonstrates significant performance improvement over both DONB and DOKNN where its accuracy goes beyond 94.1% after testing 1000 pages.

Complete Chapter List

Search this Book:
International Editorial Advisory Board
Table of Contents
Chapter 1
John Gekas, Maria Fasli
The Web services paradigm has enabled an increasing number of providers to host remotely accessible services. However, the true potential of such a... Sample PDF
Employing Graph Network Analysis for Web Service Composition
Chapter 2
Vedran Podobnik, Krunoslav Trzec, Gordan Jezic
This paper presents an application of multi-agent system in ubiquitous computing scenarios characteristic of next-generation networks.... Sample PDF
Context-Aware Service Provisioning in Next-Generation Networks: An Agent Approach
Chapter 3
Zhiyong Weng, Thomas Tran
This paper proposes a mobile, intelligent agent-based e-business architecture that allows buyers and sellers to perform business at remote... Sample PDF
A Mobile, Intelligent Agent-Based Architecture for E-Business
Chapter 4
Elhadi Shakshuki, André Trudel, Yiqing Xu
Many real-world problems can be viewed and represented as a constraint satisfaction problem (CSP). In addition, many of these problems are... Sample PDF
A Multi-Agent Temporal Constraint Satisfaction System Based on Allen's Interval Algebra and Probabilities
Chapter 5
Vasco Furtado, Leonardo Ayres, Gustavo Fernandes
In this paper, we describe a multiagent approach that configures semantic Web services following a design problem solving method. For that, a... Sample PDF
A Multiagent Approach for Configuring and Explaining Workflow of Semantic Web Services
Chapter 6
Nitin Agarwal, Ehtesham Haque, Huan Liu, Lance Parsons
Researchers spend considerable time searching for relevant papers on the topic in which they are currently interested. Often, despite having similar... Sample PDF
A Subspace Clustering Framework for Research Group Collaboration
Chapter 7
Emilie Conté, Guy Gouardères
In Vocational and Educational Training, new trends are to social learning and more precisely to informal learning. In such settings, the article... Sample PDF
E-Portfolio to Promote the Virtual Learning Group Communities on the Grid
Chapter 8
Ding-Yi Chen, Xue Li, Zhao Yang Dong, Xia Chen
In this paper, we propose a framework namely, Prediction-Learning-Distillation (PLD) for interactive document classification and distilling the... Sample PDF
Incremental Learning for Interactive E-Mail Filtering
Chapter 9
Maytham Safar, Dariush Ebrahimi
The continuous K nearest neighbor (CKNN) query is an important type of query that finds continuously the KNN to a query point on a given path. We... Sample PDF
eDAR Algorithm for Continuous KNN Queries Based on Pine
Chapter 10
Dunren Che
This article reports the result of the author’s recent work on XML query processing/optimization, which is a very important issue in XML data... Sample PDF
A Deterministic Approach to XML Query Processing with Efficient Support for Pure and Negated Containments
Chapter 11
Jihad M. ALJa’am, Ali M. Jaoua, Ahmad M. Hasnah, F. Hassan, H. Mohamed, T. Mosaid, H. Saleh, F. Abdullah
In this paper, we present an original approach for text summarization using conceptual data classification. We show how a given text can be... Sample PDF
Text Summarization Based on Conceptual Data Classification
Chapter 12
H. A. Ali, Ali I. El Desouky, Ahmed I. Saleh
Web page classification is considered one of the most challenging research areas. Where the Web has a huge volume of unstructured and distributed... Sample PDF
Studying and Analysis of a Vertical Web Page Classifier Based on Continuous Learning Naïve Bayes (CLNB) Algorithm
Chapter 13
Zhonghua Yang, Jing Bing Zhang, Robert Gay, Liqun Zhuang
Service-orientation has emerged as a new promising paradigm for enterprise integration in the manufacturing sector. In this paper, we focus on the... Sample PDF
Building a Semantic-Rich Service-Oriented Manufacturing Environment
Chapter 14
Le Duy Ngane, Angela Goh, Cao Hoang Tru
Web services form the core of e-business and hence, have experienced a rapid development in the past few years. This has led to a demand for a... Sample PDF
A Survey of Web Service Discovery Systems
Chapter 15
Carsten Stolz, Michael Barth
With growing importance of the internet, Web sites have to be continuously improved. Web metrics help to identify improvement potentials.... Sample PDF
Web Site Performance Analysis Success Assessment of Information Driven Web Site on User Traces
About the Editors