Detection of Breast Cancer by the Identification of Circulating Tumor Cells Using Association Rule Mining

Detection of Breast Cancer by the Identification of Circulating Tumor Cells Using Association Rule Mining

Jananee S. (Sri Venkateswara College of Engineering, Chennai, India) and Nedunchelian R. (Sri Venkateswara College of Engineering, Chennai, India)
Copyright: © 2016 |Pages: 9
DOI: 10.4018/IJKDB.2016010102

Abstract

Circulating Tumor Cells (CTCs) are cells that have shed into the vasculate from the primary tumor and circulate into the blood stream. In this proposed work, the major genes causing the breast cancer is identified by the principle of Association Rule. The trained set and training set is made to upload on the data store. By associating each row of a training set to all the rows of the trained data is done and the report is generated. The Baum welch process is called for the estimation of actual probabilities and emission probabilities by calculating its log likelihood factor which gives the high Priority gene values that are responsible for the cause of cancer. Based on this cell category is splitted into three clusters such as carcinoma level, metastasis level and Kaposi sarcoma. On each cluster it finds the highest priority value in it and classifies into high, low and medium values. On extraction of these higher gene values yields the major responsible genes causing breast cancer. Finally, the obtained results are validated through hierarchical clustering.
Article Preview

1. Introduction

Breast cancer is the group of cancer cells that starts developing in the cells of breast. The term Breast cancer refers to a malignant tumor that has developed from cells in the breast. BC starts with in the cells of the breast as a group of cancer cells that can invade its surrounding tissues or spread to other areas of the body. In general, the cancer-related death (BC) is the consequences of tumor cells that start spreading from the primary tumor and forms metastases in resident organs.

Cancer metastasis is the main cause of cancer-related death and the dissemination of tumor cells through the blood circulation is an important intermediate step that also exemplifies the switch from localized to systemic disease. Circulating tumor cells in the peripheral blood (PB) arise from the primary tumor and they are indicative for the tumor aggressiveness and metastasis. Several discriminant factors have to be identified in detecting the BC.

The difference between the normal cells and cancer cells can be identified with their large number of diving cells, large variable shaped nuclei, small cytoplasmic volume relative to nuclei, variation in cell size and shape, loss of normal specialized features, disorganized cell features, poorly defined tumor boundary (Figure 1). Breast cancer is a second most cancer that affects both women and men in western countries. Women’s are affected in larger ratio when compared to men and this is because of the endogenous and exogenous hormone exposure in their body. BRCA1 and BRCA2 are the identified as the genes involved in fixing damaged DNA.

Figure 1.

Normal cells vs. cancer cells

These are also processed by applying the data mining techniques to the datasets. The process of obtaining the golden information from the raw data is termed as data mining. These data are collected from the Wisconsin databases and GEO Databases. The raw data will not be sufficient to manipulate, for this data pre-processing is made. The data pre-processed will be rich in information which omits the missing values and attributes. Data modelling involves a logic solution with the help of decision trees and decision rules. Data modelling gives an interpretation and conclusion to the whole process.

Association rule mining is the discovery of association relationships among a set of items in a dataset. Association rule mining has become an important data mining technique that correlates the presence of set of items with another range of values for the set of variables. Association rule mining is used to extract association from the market based data which was suggested by Agarwal et al. (1993). It has also proved to be useful in many other domains such as microarray data analysis, recommender systems, and network intrusion detection.An association rule is of the form,X Ywhere X = and Y = are sets of genes items, with xi and yj being distinct items for all i and all j. This association states that if a gene is chosen as a victim X, it is also likely to choose Y. In general, any association rule has the form LHS (left-hand side) RHS (right-hand side), where LHS and RHS are sets of items. Association rules should supply both support and confidence.

2. Association Rule Generation

The goal of mining association rule, is to generate all possible rules that exceed some minimum user-specified support and confidence thresholds. The problem is thus decomposed into two sub problems:

  • 1.

    Generate all item sets that have a support that exceeds the threshold. These sets of items are called large item sets. Note that large here means large support.

  • 2.

    For each large item set, all the rules that have a minimum confidence are generated as follows: for a large item set X and Y X, let Z = X - Y;

Then if support (X)/support (Z) minimum confidence, the rule Z Y (i.e., X - Y Y) is a valid rule. [Note: In the previous sentence, Y X reads “Y is a subset of X.”]

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 8: 2 Issues (2018)
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing