Article Preview
Top1. Introduction
Breast cancer is the group of cancer cells that starts developing in the cells of breast. The term Breast cancer refers to a malignant tumor that has developed from cells in the breast. BC starts with in the cells of the breast as a group of cancer cells that can invade its surrounding tissues or spread to other areas of the body. In general, the cancer-related death (BC) is the consequences of tumor cells that start spreading from the primary tumor and forms metastases in resident organs.
Cancer metastasis is the main cause of cancer-related death and the dissemination of tumor cells through the blood circulation is an important intermediate step that also exemplifies the switch from localized to systemic disease. Circulating tumor cells in the peripheral blood (PB) arise from the primary tumor and they are indicative for the tumor aggressiveness and metastasis. Several discriminant factors have to be identified in detecting the BC.
The difference between the normal cells and cancer cells can be identified with their large number of diving cells, large variable shaped nuclei, small cytoplasmic volume relative to nuclei, variation in cell size and shape, loss of normal specialized features, disorganized cell features, poorly defined tumor boundary (Figure 1). Breast cancer is a second most cancer that affects both women and men in western countries. Women’s are affected in larger ratio when compared to men and this is because of the endogenous and exogenous hormone exposure in their body. BRCA1 and BRCA2 are the identified as the genes involved in fixing damaged DNA.
Figure 1. Normal cells vs. cancer cells
These are also processed by applying the data mining techniques to the datasets. The process of obtaining the golden information from the raw data is termed as data mining. These data are collected from the Wisconsin databases and GEO Databases. The raw data will not be sufficient to manipulate, for this data pre-processing is made. The data pre-processed will be rich in information which omits the missing values and attributes. Data modelling involves a logic solution with the help of decision trees and decision rules. Data modelling gives an interpretation and conclusion to the whole process.
Association rule mining is the discovery of association relationships among a set of items in a dataset. Association rule mining has become an important data mining technique that correlates the presence of set of items with another range of values for the set of variables. Association rule mining is used to extract association from the market based data which was suggested by Agarwal et al. (1993). It has also proved to be useful in many other domains such as microarray data analysis, recommender systems, and network intrusion detection.An association rule is of the form,X Ywhere X = and Y = are sets of genes items, with xi and yj being distinct items for all i and all j. This association states that if a gene is chosen as a victim X, it is also likely to choose Y. In general, any association rule has the form LHS (left-hand side) RHS (right-hand side), where LHS and RHS are sets of items. Association rules should supply both support and confidence.
Top2. Association Rule Generation
The goal of mining association rule, is to generate all possible rules that exceed some minimum user-specified support and confidence thresholds. The problem is thus decomposed into two sub problems:
- 1.
Generate all item sets that have a support that exceeds the threshold. These sets of items are called large item sets. Note that large here means large support.
- 2.
For each large item set, all the rules that have a minimum confidence are generated as follows: for a large item set X and Y X, let Z = X - Y;
Then if support (X)/support (Z) minimum confidence, the rule Z Y (i.e., X - Y Y) is a valid rule. [Note: In the previous sentence, Y X reads “Y is a subset of X.”]