User Behaviour Pattern Mining from Weblog

Vishnu Priya, A. Vadivel

Source Title: International Journal of Data Warehousing and Mining (IJDWM) 8(2)

DOI: 10.4018/jdwm.2012040101

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this paper, the authors build a tree using both frequent as well as non-frequent items and named as Revised PLWAP with Non-frequent Items RePLNI-tree in single scan. While mining sequential patterns, the links related to the non-frequent items are virtually discarded. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated weblog. It is not required to reconstruct the tree from scratch and re-compute the patterns each time, while weblog is updated or minimum support changed, since the algorithm supports both incremental and interactive mining. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one, while it is not so in recently proposed algorithm. For evaluation purpose, the authors have used the benchmark weblog and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.

Article Preview

Top

Introduction

Data mining has fetched a tremendous interest to explore large volumes data for extracting useful information for knowledge discovery. The association rule mining is considered as one of the popular problems among all well-known data mining techniques for knowledge discovery. The primary step is to identify all valuable patterns from the dataset, where the items occurring in each transaction has no pre-defined order.

The relationship between the items in extracted patterns gives both interesting and identical knowledge. Ashrafi et al. (2007), identifies many rules that have identical meaning as mining process. These redundant rules are removed to discover interesting knowledge/rules.

The mined interested knowledge i.e., an association rules can be either positive or negative. Tjioe and Taniar (2005) have proposed algorithms for measurement of summarized data to mine association rules in data warehouses, which represent in multidimensional model. Data are initialized efficiently using four algorithms namely VAvg, HAvg, WMAvg, and ModusFilter for mining association rules in data warehouses by concentrating on the measurement of aggregate data.

Similar to association rule, negative association rule also found to be important for generating candidate exception rule (Daly & Taniar, 2004). The candidate exception rules are evaluated by using exceptional measure. The candidate exceptions with high exceptionality will form the final set of exception rules. Later both positive and negative association rule are used for mining exception rule (Taniar et al., 2008). The relationship between exception and positive/negative association rules are considered for forming negative and positive association rules. The candidate exception rules are evaluated using exceptionality measure. The candidate exceptions with high exceptionality form the final set of exception rules.

In case, if the transactional datasets are distributed in nature, the conventional approach for mining may not perform well. This situation has been handled in Optimized Distribution Association Mining (ODAM), which is geographically distributed (Ashrafi et al., 2004). The size of the average transactions, datasets has been considerably reduced and the messages are exchanged very fast for generating the support counts of candidate item sets.

One of the important issues in data mining is to extract sequential pattern, where items are in certain order and may reoccur many times. The mining process is to discover the set of frequent sequential item sets, whose supports are greater than or equal to a user specified minimum support. Usually, the items in frequent sequences represent the occurrence of items in future after certain items were occurred. For instance, let us consider the sequential item set after mining, {a, b, c} with support=40%, which means in 40% of items, “c” occurred after “a”, “b”.

The process of mining sequential patterns consumes time due to lack of prior knowledge about the number of items in an item set. The permutation or combination of possible items in the database is used to form patterns. Kumar et al. (2010) have proposed a Sequence and Set Similarity Measure (S3M) that captures both the order of occurrence of items in sequences and the constituent items of sequences.

Sequential pattern mining is used in various applications such as discovery of access patterns in weblog, biological/DNA sequence analysis, improving storage performance, design of structured pattern mining methods, network alarm pattern mining, XML query access pattern analysis, system performance, telecommunication network, financial and scientific data analysis. In this paper, we discover sequential pattern in weblog by considering both frequent and non-frequent items in weblog.

Nearly one million pages are added every day and several hundred gigabytes are changed every month in the weblog. For handling, continuously evolving web environment and categories of online content, Giannikopoulos et al. (2010) have proposed the Frequent Generalized Pattern algorithm. The transactional data and hierarchical categories are being considered as input and the generalized association rules with transaction item are generated. This approach is found to be useful for Web2.0 applications. FGP+, which is extended version of FGP has been proposed to handle taxonomy nature of web. However, the effectiveness of this approach has not been discussed and shown for weblog data. Hence, the issue to extract interesting knowledge from this dynamic repository has gained important attention among researchers and named as web mining.

Complete Article List

Search this Journal:

Reset

Volume 20: 1 Issue (2024)

Volume 19: 6 Issues (2023)

Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

User Behaviour Pattern Mining from Weblog

Abstract

Introduction

Complete Article List