Classification of Web Pages Using Machine Learning Techniques

Classification of Web Pages Using Machine Learning Techniques

K. Selvakuberan, M. Indra Devi, R. Rajaram
DOI: 10.4018/978-1-60566-196-4.ch008
(Individual Chapters)
No Current Special Offers


The explosive growth of the Web makes it a very useful information resource to all types of users. Today, everyone accesses the Internet for various purposes and retrieving the required information within the stipulated time is the major demand from users. Also, the Internet provides millions of Web pages for each and every search term. Getting interesting and required results from the Web becomes very difficult and turning the classification of Web pages into relevant categories is the current research topic. Web page classification is the current research problem that focuses on classifying the documents into different categories, which are used by search engines for producing the result. In this chapter we focus on different machine learning techniques and how Web pages can be classified using these machine learning techniques. The automatic classification of Web pages using machine learning techniques is the most efficient way used by search engines to provide accurate results to the users. Machine learning classifiers may also be trained to preserve the personal details from unauthenticated users and for privacy preserving data mining.
Chapter Preview

Literature Survey

Susan Dumais and Hao Chen (2000) explore the use of hierarchical structure for classifying web Pages using Support Vector Machine Classifiers. The hierarchical structure is initially used to train different second-level classifiers. In the hierarchical case, a model is learned to distinguish a second-level category from other categories within the same top level.

In the past, classification of the news has been done manually. Chee-Hong Chan, Aixin Sun, & Ee-Peng Lim (2001) experiment an automated approach to classify news based on SVM classifier which results in good classification accuracy. In personalized classification users define their own categories using specific keywords. By constructing search queries using these keywords, categorizer obtains positive and negative examples and performs classification. Online news represents a type of web information that is frequently referenced. The categorizer adopts SVM to classify the web pages into pre-defined categories (general categories) or user-defined categories (special categories). With personalized categories users are allowed to search their related article with the minimum effort.

Complete Chapter List

Search this Book: