In the span of a decade, the World Wide Web has been transformed from a tool for information sharing among researchers into an indispensable part of everyday activities. This transformation has been characterized by an explosion of heterogeneous data and information available electronically, as well as increasingly complex applications driving a variety of systems for content management, e-commerce, e-learning, collaboration, and other Web services. This tremendous growth, in turn, has necessitated the development of more intelligent tools for end users as well as information providers in order to more effectively extract relevant information or to discover actionable knowledge. From its very beginning, the potential of extracting valuable knowledge from the Web has been quite evident. Web mining (i.e. the application of data mining techniques to extract knowledge from Web content, structure, and usage) is the collection of technologies to fulfill this potential. In this article, we will summarize briefly each of the three primary areas of Web mining—Web usage mining, Web content mining, and Web structure mining— and discuss some of the primary applications in each area.
Knowledge discovery on and from the Web has been characterized by four different but related types of activities (Kosala & Blockeel, 2000):
Resource Discovery: Locating unfamiliar documents and services on the Web.
Information Extraction: Extracting automatically specific information from newly discovered Web resources.
Generalization: Uncovering general patterns at individual Web sites or across multiple sites.
Personalization: Presentation of the information requested by an end user of the Web.
The goal of Web mining is to discover global as well as local structures, models, patterns, or relations within and between Web pages. The research and practice in Web mining has evolved over the years from a process-centric view, which defined Web mining as a sequence of tasks (Etzioni, 1996), to a data-centric view, which defined Web mining in terms of the types of Web data that were being used in the mining process (Cooley et al., 1997).Top
The evolution of Web mining as a discipline has been characterized by a number of efforts to define and expand its underlying components and processes (Cooley et al., 1997; Kosla & Blockeel, 2000; Madria et al., 1999; Srivastava et al., 2002). These efforts have led to three commonly distinguished areas of Web mining: Web usage mining, Web content mining, and Web structure mining.