Feature Selection for Web Page Classification

Feature Selection for Web Page Classification

K. Selvakuberan (Tata Consultancy Services, India), M. Indra Devi (Thiagarajar College of Engineering, India) and R. Rajaram (Thiagarajar College of Engineering, India)
Copyright: © 2010 |Pages: 16
DOI: 10.4018/978-1-60566-982-3.ch078

Abstract

The World Wide Web serves as a huge, widely distributed, global information service center for news, advertisements, customer information, financial management, education, government, e-commerce and many others. The Web contains a rich and dynamic collection of hyperlink information. The Web page access and usage information provide rich sources for data mining. Web pages are classified based on the content and/or contextual information embedded in them. As the Web pages contain many irrelevant, infrequent, and stop words that reduce the performance of the classifier, selecting relevant representative features from the Web page is the essential preprocessing step. This provides secured accessing of the required information. The Web access and usage information can be mined to predict the authentication of the user accessing the Web page. This information may be used to personalize the information needed for the users and to preserve the privacy of the users by hiding the personal details. The issue lies in selecting the features which represent the Web pages and processing the details of the user needed the details. In this article we focus on the feature selection, issues in feature selections, and the most important feature selection techniques described and used by researchers.

Complete Chapter List

Search this Book:
Reset