Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Web Usage Mining and the Challenge of Big Data: A Review of Emerging Tools and Techniques

Abubakr Gafar Abdalla, Tarig Mohamed Ahmed, Mohamed Elhassan Seliaman

Source Title: Big Data: Concepts, Methodologies, Tools, and Applications

DOI: 10.4018/978-1-4666-9840-6.ch042

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The web is a rich data mining source which is dynamic and fast growing, providing great opportunities which are often not exploited. Web data represent a real challenge to traditional data mining techniques due to its huge amount and the unstructured nature. Web logs contain information about the interactions between visitors and the website. Analyzing these logs provides insights into visitors' behavior, usage patterns, and trends. Web usage mining, also known as web log mining, is the process of applying data mining techniques to discover useful information hidden in web server's logs. Web logs are primarily used by Web administrators to know how much traffic they get and to detect broken links and other types of errors. Web usage mining extracts useful information that can be beneficial to a number of application areas such as: web personalization, website restructuring, system performance improvement, and business intelligence. The Web usage mining process involves three main phases: pre-processing, pattern discovery, and pattern analysis. Various preprocessing techniques have been proposed to extract information from log files and group primitive data items into meaningful, lighter level abstractions that are suitable for mining, usually in forms of visitors' sessions. Major data mining techniques in web usage mining pattern discovery are: clustering, association analysis, classification, and sequential patterns discovery. This chapter discusses the process of web usage mining, its procedure, methods, and patterns discovery techniques. The chapter also presents a practical example using real web log data.

Chapter Preview

Top

Introduction

The explosive growth of the internet and the substantial amount of information being generated daily has turned the web into a huge information store. The relationships between the data available online are often not exploited .Web mining analyzes web data to help create a more useful environment in which users and organizations manage information in more intelligent ways. (Srivastava, Cooley, Desphande, & Tan, 2000).

The internet has become an important medium to conduct business transactions. Therefore the application of data mining techniques in the web has become increasingly important to organizations to extract useful knowledge that can be utilized in many ways such as improving the web system performance, restructuring website design, providing personalized web pages, and deriving business intelligence. Web data mining methods have strong practical applications in E-Systems and form the basis for marketing and e-commerce activities. It can be used to provide fast and efficient services to customers as well as building intelligent web sites for businesses. Data mining in e-business is considered to be a very promising research area.

Web data mining deals with different type of data, which is semi-structured or even unstructured, called web data. Web data, can be divided into three categories: content data, structure data, and usage data. This type of data differentiates web mining from data mining.

Web data represent a new challenge to traditional data mining algorithms that work with structured data. The nature of the web data which is less structured, and the rapid growth of information being generated daily, it has become necessary for users to utilize automated tools in order to find the required information. There are several commercial web analysis tools but most of them provide explicit statistics without real knowledge. These tools are also considered slow, inflexible, and provide only limited features. While some tools are being developed that using data mining techniques, but the research still in its first stages and faces real challenges such as large storage requirements and scalability problems (Rana, 2012).

The main objectives of this chapter are:

1.
To extensively review the web usage mining methods and types;
2.
To identify the mean web usage mining challenges due to the Big Data phenomena;
3.
To describe the Big Data solutions for web usage mining;
4.
To evaluate the different emerging methodologies and implementation tools for Big Data web usage mining.

This chapter discusses the web usage mining process, also known as web log mining, is a three-phase process: pre-processing, pattern discovery, and pattern analysis. There are many data sources for web usage mining, among all; the web server’s log file is the most widely used source of information. This chapter will also cover the following major techniques in web usage mining pattern discovery in relation to Big Data:

Association Rules

It is the process of finding associations within data in the log file. This technique can be used to identify pages that are most often accessed together. Association rules can be useful for many mining purposes, such as predicting the next page and to preload it from the remote server to speed up browsing.

Clustering

In web usage mining, there are two kinds of clusters: user clusters and page clusters. User clustering can be exploited to perform market segmentation in e-commerce web sites or provide personalized web page. Clustering identifies pages with related content and can be exploited by search engines and web recommendation systems (Srivastava, Cooley, Desphande, & Tan, 2000).

Classification

Classification is the process of assigning a class to each data item based on a set of predefined classes. In web mining, classification can be used, for example, to develop a profile of users belonging to a particular class or to classify HTTP request as normal or abnormal (Srivastava, Cooley, Desphande, & Tan, 2000).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference