Overview of Pattern Mining
In last few decades, information is being generated at a rapid pace. If Moore’s law is applicable in information generation, then we surely can say it will be a 100 or 1000 times faster than the normal chipset designing rate to new trends. Owing to this huge amount of information, database systems have been and are being developed to manage such a pile. To store information is one thing, but to deal with it is another. To recognize and extract the hidden knowledge and potentially interesting patterns from these large databases is accomplished by Data Mining (DM) an essential process of Knowledge Discovery in Database (KDD) (Han & Kamber, 2001). In fact, Association Rule Mining (ARM) (Agrawal et al., 1993), a classical KDD technique algorithm is capable to generate several rules and patterns, but all the rules generate by mining algorithms not necessarily are interesting. As the rules generated by association rule mining algorithms depends only on the statistical significance which due to diverse information is incapable of producing utilitarian results for contemporary Market-Basket Analysis. The ARM technique has been used by many researchers and industry professionals in order to find the most important disclosure in market strategy, i.e., finding the best correlation among objects with a statistical significance which will govern the generation of rules that formerly were hidden in the raw data.
Figure 1. Hierarchy of profit pattern mining
ARM created a buzz when it first came to light. Since then, as the database technology grew to allow more business to process and store data in databases, it still could lead to knowledge discovery but it will have subverted its meaning. As databases are becoming pervasive exponentially, it is important to consider ordinal parameters other than just support and confidence. The objective of Profit Pattern Mining (PPM) (Wang et al., 2002) is directly associated with businesses. It is one of the imperative application areas of association rule mining. Figure 1 shows the hierarchy of PPM.
Profit pattern discovery from a huge volume of data is one of the most desired attributes of Data Mining. The emerging growth of data mining raises a multitude of complex applications. Mining frequent sets over data streams present attractive new challenges over traditional mining in static databases (Tiwari et al., 2010) for retrieving the desired information to make it into knowledge from the large size databases.
The study confirms that interestingness measures are distinct for different applications (Tew et al., 2014) and substantial domain knowledge is necessary for selection of an appropriate measure for a particular business objective and the calculative risk of any business is to generate profit. Henceforth, the profit can be taken as one of the measures with an appropriate mining technique that helps in the decision-making process of businesses. Yesteryear data mining technology follows a traditional approach that offers only statistical analysis and discovers rules. In light of recent advancements in database technology, the data stored in databases are more real world and vague thus they are prone to have some amount of uncertainty (Motro, 1995). Dealing with uncertainty is a challenge faced in many research areas including database systems and data mining. There are many works that use mathematical approach to deal uncertainty. To name a few are probability theory, fuzzy logic approach (Hong et al., 2003; Weng & Chen, 2010; Hong & Lee, 2008; Ma et al., 2011), rough set (Guan et al.,2003; Jiao et al., 2012), soft set approach (Herawan & Deris, 2010) and soft computing approach (Mitra & Acharya, 2003; Lal & Mahanti, 2011). All these theories deal with different kinds of uncertainties. To deal with vagueness, a mathematical model of the Vague Set Theory (VST) (Gau & Buehrer, 1993) can be applied on databases and define few new expressions and formulas that allow generating more effective vague association rules which satisfy vague set theory principles completely and ensures better results than the traditional approach. As these databases have information from various sources, they are liable to have some magnitude of uncertainty and vagueness in them. To administer this vagueness, Vague Association Rule (VAR) (Lu et al., 2007) generation is an innovative direction in finding out the correlations and rules that eventually maximize the business profit as well as an inclination towards decision-making process.