Article Preview
TopIntroduction
Data mining has fetched a tremendous interest to explore large volumes data for extracting useful information for knowledge discovery. The association rule mining is considered as one of the popular problems among all well-known data mining techniques for knowledge discovery. The primary step is to identify all valuable patterns from the dataset, where the items occurring in each transaction has no pre-defined order.
The relationship between the items in extracted patterns gives both interesting and identical knowledge. Ashrafi et al. (2007), identifies many rules that have identical meaning as mining process. These redundant rules are removed to discover interesting knowledge/rules.
The mined interested knowledge i.e., an association rules can be either positive or negative. Tjioe and Taniar (2005) have proposed algorithms for measurement of summarized data to mine association rules in data warehouses, which represent in multidimensional model. Data are initialized efficiently using four algorithms namely VAvg, HAvg, WMAvg, and ModusFilter for mining association rules in data warehouses by concentrating on the measurement of aggregate data.
Similar to association rule, negative association rule also found to be important for generating candidate exception rule (Daly & Taniar, 2004). The candidate exception rules are evaluated by using exceptional measure. The candidate exceptions with high exceptionality will form the final set of exception rules. Later both positive and negative association rule are used for mining exception rule (Taniar et al., 2008). The relationship between exception and positive/negative association rules are considered for forming negative and positive association rules. The candidate exception rules are evaluated using exceptionality measure. The candidate exceptions with high exceptionality form the final set of exception rules.
In case, if the transactional datasets are distributed in nature, the conventional approach for mining may not perform well. This situation has been handled in Optimized Distribution Association Mining (ODAM), which is geographically distributed (Ashrafi et al., 2004). The size of the average transactions, datasets has been considerably reduced and the messages are exchanged very fast for generating the support counts of candidate item sets.
One of the important issues in data mining is to extract sequential pattern, where items are in certain order and may reoccur many times. The mining process is to discover the set of frequent sequential item sets, whose supports are greater than or equal to a user specified minimum support. Usually, the items in frequent sequences represent the occurrence of items in future after certain items were occurred. For instance, let us consider the sequential item set after mining, {a, b, c} with support=40%, which means in 40% of items, “c” occurred after “a”, “b”.
The process of mining sequential patterns consumes time due to lack of prior knowledge about the number of items in an item set. The permutation or combination of possible items in the database is used to form patterns. Kumar et al. (2010) have proposed a Sequence and Set Similarity Measure (S3M) that captures both the order of occurrence of items in sequences and the constituent items of sequences.
Sequential pattern mining is used in various applications such as discovery of access patterns in weblog, biological/DNA sequence analysis, improving storage performance, design of structured pattern mining methods, network alarm pattern mining, XML query access pattern analysis, system performance, telecommunication network, financial and scientific data analysis. In this paper, we discover sequential pattern in weblog by considering both frequent and non-frequent items in weblog.
Nearly one million pages are added every day and several hundred gigabytes are changed every month in the weblog. For handling, continuously evolving web environment and categories of online content, Giannikopoulos et al. (2010) have proposed the Frequent Generalized Pattern algorithm. The transactional data and hierarchical categories are being considered as input and the generalized association rules with transaction item are generated. This approach is found to be useful for Web2.0 applications. FGP+, which is extended version of FGP has been proposed to handle taxonomy nature of web. However, the effectiveness of this approach has not been discussed and shown for weblog data. Hence, the issue to extract interesting knowledge from this dynamic repository has gained important attention among researchers and named as web mining.