Web Page Classification Using MDAWkNN

Web Page Classification Using MDAWkNN

J. Alamelu Mangai, V. Santhosh Kumar, Karthik Ramesh
Copyright: © 2014 |Pages: 11
DOI: 10.4018/978-1-4666-5202-6.ch239
(Individual Chapters)
No Current Special Offers

Chapter Preview



Many approaches for automatic Web page classification have been witnessed over years in literature. With no preprocessed data there is no quality mining results. Since Web pages are of higher dimensions and have noisy information they need to be properly preprocessed which would otherwise increase the learning time and complexity of the classifiers. Feature selection is one way of solving the curse of dimensionality for content based Web page classifiers. Web page classification is improved by selecting the features through various methods as in (Indra Devi, Rajaraman, & Selvakuberan, 2008; Han, Lim, & Alhashmi, 2010; Selamat & Omata 2004 ; Chen, Ming, & Chang, 2009 ; Wakaki, Itakura, & Tamura, 2004 ; Jensen, & Shen, 2006; Peng, Ming, & Wang, 2008; Farhoodi, Yari, & Mahmoudi, 2009; Xu & Wang 2011).

Key Terms in this Chapter

K-Nearest Neighbor Classification: Is a data mining algorithm that is used to classify a given set of data into pre-defined classes. This algorithm is an example of supervised learning.

Pre-Processing: This helps in transforming raw data into a format suitable for the mining task. It is done apriori to the mining task and is more significant, as with no quality data, there is no quality mining results.

Feature Selection: It is the process of identifying the features that are more significant to the mining task. This is a part of pre-processing and is one of the solutions to overcome curse of dimensionality. The redundant and irrelevant features to the mining task are eliminated

Machine Learning: Is a branch of artificial intelligence that deals with study and construction of intelligent systems by analyzing data. Data mining has its roots in machine learning.

Web Page Classification: A process of assigning labels to Web pages based on the kind of content they have.

Curse of Dimensionality: When processing Big Data of huge dimensions, much of the objects seem to be sparse after they are pre-processed. They are also dissimilar in many ways which prevents common data organization strategies from being efficient. This problem faced by the statistics community is known as curse of dimensionality.

Complete Chapter List

Search this Book: