The State of the Art in Web Mining

The State of the Art in Web Mining

Tad Gonsalves
Copyright: © 2015 |Pages: 11
DOI: 10.4018/978-1-4666-5888-2.ch187
(Individual Chapters)
No Current Special Offers

Chapter Preview



Data Mining deals with extracting valuable and useful information and knowledge from large datasets (Hand et al., 2001). Three types of mining are well-known in the research mining community: data mining, web mining, and text mining. Data mining mainly deals with structured data organized in databases; text mining mainly deals with mining text. Web mining lies in between and copes with semi-structured data and/or unstructured data. Web mining calls for creative use of data mining and/or text mining techniques. Mining the web data is one of the most challenging tasks for the data mining researchers because the web is a huge warehouse of heterogeneous and semi-structured data.

Web mining is categorized into: Web content mining, Web usage mining and Web structure mining. Web content mining deals with the knowledge discovery, in which the main objects are the collections of text documents and, more recently, also the collections of multimedia documents. Web usage mining deals with the discovery of interesting patterns of user’s usage of data on the web. Web structure mining deals with the analysis of the connection structure of a web site. Each of these categories may be further divided into several sub-categories. In practice, the three web mining tasks above could be used in isolation or combined in an application, especially in web content and structure mining since the web document might also contain links.

Kosala and Blockeel (2000) present a survey of web mining research for each of the three web mining categories presented above, and distinguish web mining as different from information retrieval and information extraction. They hold that web mining techniques are not the only tools to solve information overload problems either directly or indirectly. They claim that other techniques and works from different research areas, such as database, information retrieval, natural language processing could also be used.

This article introduces some of the state of the art applications in Web mining developed by the academia and industry. It introduces some of the highly successful Web mining applications such as e-commerce (data mining application in online business, e-search (web search), e-education (distance learning) and e-auction (online auction).

Finally, three areas, namely, Semantic Web Mining, Privacy Policy and Web Application Security are suggested where the current Web Mining technology need to further develop. The current applications collect a lot of data about the individual users to design and present a personalized page to the user and thereby improve the enterprise business. However, there is a danger of violating the users’ privacy. This is one of the pressing issues the Web mining community should address. Other areas for future development in Web Mining are applications security and Semantic Web.



In Informatics, data, information and knowledge form a pyramid with data at the base, information in the middle and knowledge on top. Data refers to the facts which give a description of the world, information is data captured, while knowledge is our mental map or model of the world helping us to make informed decisions. The three are related by the act of processing – data can be processed into information and information in turn can be processed into knowledge.

One of the major problems of our data-ridden age is succinctly described by John Naisbett (1988): “We are drowning in information, but starving for knowledge.” We can further extend this statement to include the fact that “We are drowning in data, but starving for information.” Data mining - the science of extracting useful information (knowledge) from large data sets attempts to bridge the gaps among data, information and knowledge.

Key Terms in this Chapter

Web Usage Mining: The process of extracting useful usage patterns from the Web data.

Web Mining: The process of discovering previously unknown and potentially useful information from the Web data.

Data Mining: The process of extracting useful information from large datasets.

Semantic Web: The Web formed by semantically structured information which is machine-readable.

Web Content Mining: The process of extracting useful information from the contents of the Web documents.

Web Structure Mining: The process of extracting useful structure information from the Web.

Information Retrieval: The resource or document discovery from the Web.

Complete Chapter List

Search this Book: