Fuzzy Clustering of Web User Profiles for Analyzing their Behavior and Interests

Fuzzy Clustering of Web User Profiles for Analyzing their Behavior and Interests

Stanislav Kreuzer (Goethe University, Germany) and Natascha Hoebel (Goethe University, Germany)
DOI: 10.4018/978-1-4666-0095-9.ch005
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

One of the keys to building effective e-customer relationships is an understanding of consumer behavior online. However, analyzing the behavior of customers online is not necessarily an indicator of their interests. Therefore, building profiles of registered users of a website is of importance if it goes beyond collecting obvious information the user is willing to give at the time of the registration. These user profiles can contribute to the analysis of the users’ interests. Important tools for the analysis are data-mining techniques, for example, the clustering of collected user information. This chapter addresses the problem of how to define, calculate, and visualize fuzzy clusters of Web visitors with respect to their behavior and supposed interests. This chapter shows how to cluster Web users based on their profile and by their similar interests in several topics using the fuzzy and hybrid CORD (Clustering of Ordinal Data) clustering system, which is part of the Gugubarra Framework.
Chapter Preview
Top

Introduction

Companies today operate in an increasingly competitive environment. Therefore, finding and retaining customers is a critical factor for most businesses offline and online. One of the keys to building effective e-customer relationships is an understanding of user behavior online (Turban et al., 2004). However, analyzing the behavior of users online is not necessary an indicator of their interests.

Content providers compete for the user's attention, and at the same time, they have to convince their customers to keep using specific content sources or e-services. When exploring the Internet marketplace, various strategies can be observed for attracting new customers or maintaining a “significantly sized” customer base that may generate revenue directly through payments or indirectly through advertisements (Shapiro & Varian, 1998).

Efficient optimization of a service or resource of any kind requires a quantitative survey and assessment of its usage. Data-mining techniques can achieve remarkable results in extracting essential information from such seemingly amorphic data. Clustering algorithms have evolved greatly since the development of data-mining technologies in terms of significance, performance and quality. In particular, when taking into account the constantly growing number of modern clustering algorithms, research on the usability and adaptivity of data seems inevitable in a practical environment. It is clear that an efficient solution to all clustering problems cannot be achieved by one algorithm.

Consequently, it is necessary to expand the field of research to a set of partial solutions. Therefore, we introduced our fuzzy clustering approach named CORD (Clustering of Ordinal Data) at the 6th International Conference on Web Information Systems and Technologies in 2010. It was published by INSTICC Press in (Hoebel & Kreuzer, 2010) and online by SciTePress.

CORD was designed to support the analysis of web site users’ interests. Clustering web users by their behavior can also be useful for measuring “trends” in a web community and for electronic customer relation management or e-CRM, namely in the process of building long-term relationships and increasing e-customer loyalty, which is the degree to which a web customer will stay with a specific vendor or a brand (Turban et al., 2004).

As with all clustering algorithms, CORD must deal with the issue of distance computation for the observed elements. Most of the known approaches focus on numerical data that allows measuring the distance by arithmetic operations, for example, using a Euclidean or chi-squared distance metric. Others focus on categorical data (Gan et al., 2005; Huang, 1997; Parmar, 2007) and measure the distance by counting the number of different attribute values. This metric is known as the Hamming distance in information theory. The focus of the CORD approach is to efficiently cluster large amounts of ordinal data, which, unlike pure categorical data, possesses an inherent order.

The CORD approach combines modifications of three modern clustering approaches to create a hybrid solution that is able to efficiently process very large sets of ordinal data.

The algorithm An Extension of Self-organizing Maps to Categorical Data (NCSOM) by Chen and Marques (2005) is hereby used for a rough pre-clustering. NCSOM improves the result of the clustering task by helping in the decision process of choosing how many centroids/clusters should be used and where the k-centroids should be placed.

The main clustering task utilizes a k-modes algorithm and its fuzzy set extension described by Kim et al. (2004) in Fuzzy Clustering of Categorical Data using Fuzzy Centroids.

Finally to deal with large amounts of data, the BIRCH algorithm described by Zhang et al. (1996) in BIRCH: An Efficient Data Clustering Method for Very Large Databases was adapted to ordinal data.

Complete Chapter List

Search this Book:
Reset