Classification of Web Pages Using Machine Learning Techniques

Classification of Web Pages Using Machine Learning Techniques

K. Selvakuberan (Innovation Labs (Web2.0), TATA Consultancy Service, India), M. Indra Devi (Thiagarajar College of Engineering, Madurai, India) and R. Rajaram (Thiagarajar College of Engineering, Madurai, India)
Copyright: © 2012 |Pages: 16
DOI: 10.4018/978-1-60960-818-7.ch105
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The explosive growth of the Web makes it a very useful information resource to all types of users. Today, everyone accesses the Internet for various purposes and retrieving the required information within the stipulated time is the major demand from users. Also, the Internet provides millions of Web pages for each and every search term. Getting interesting and required results from the Web becomes very difficult and turning the classification of Web pages into relevant categories is the current research topic. Web page classification is the current research problem that focuses on classifying the documents into different categories, which are used by search engines for producing the result. In this chapter we focus on different machine learning techniques and how Web pages can be classified using these machine learning techniques. The automatic classification of Web pages using machine learning techniques is the most efficient way used by search engines to provide accurate results to the users. Machine learning classifiers may also be trained to preserve the personal details from unauthenticated users and for privacy preserving data mining.
Chapter Preview
Top

Literature Survey

Susan Dumais and Hao Chen (2000) explore the use of hierarchical structure for classifying web Pages using Support Vector Machine Classifiers. The hierarchical structure is initially used to train different second-level classifiers. In the hierarchical case, a model is learned to distinguish a second-level category from other categories within the same top level.

Complete Chapter List

Search this Book:
Reset