Data Mining and Analysis of Lung Cancer

Data Mining and Analysis of Lung Cancer

Guoxin Tang (University of Louisville, USA)
DOI: 10.4018/978-1-61520-723-7.ch006
OnDemand PDF Download:
No Current Special Offers


Lung cancer is the leading cause of cancer death in the United States and the world, with more than 1.3 million deaths worldwide per year. However, because of a lack of effective tools to diagnose lung cancer, more than half of all cases are diagnosed at an advanced stage, when surgical resection is unlikely to be feasible. The purpose of this study is to examine the relationship between patient outcomes and conditions of the patients undergoing different treatments for lung cancer and to develop models to estimate the population burden, the cost of cancer, and to help physicians and patients determine appropriate treatment in clinical decision-making. We use a national database, and also claim data to investigate treatments for lung cancer.
Chapter Preview


Lung Cancer

Lung cancer is a disease of uncontrolled cell growth in tissues of the lung. This growth may lead to metastasis, which is an invasion of adjacent tissue and infiltration beyond the lungs. It is usually suspected in individuals who have abnormal chest radiograph findings or have symptoms caused by either local or systemic effects of the tumor. There are two main types of lung carcinoma categorized by the size and appearance of the malignant cells seen by a histopathologist under a microscope: non-small cell lung carcinoma (NSCLC) (80.4%) and small-cell lung carcinoma (SCLC) (16.8%) (Travis, WD, 1995).

At the end of the 20th century, lung cancer had become one of the world’s leading causes of preventable death. It was a rare disease at the start of that century, but exposures to new etiologic agents and an increasing lifespan combined to make lung cancer a scourge of the 20th century. Table 1 shows the estimated numbers of cases and deaths for 26 different types of cancer in men and women, together with the standardized incidence and mortality rates and the cumulative risk (%) between ages 0 and 64. There are some differences in the profile of cancers worldwide, depending on whether the incidence or mortality is the focus of interest. Lung cancer is the main cancer in the world today, whether considered in terms of number of cases (1.35 million) or deaths (1.18 million), because of the high case fatality (ratio of mortality to incidence, 0.87) (Parkin, D, 2005).

Table 1.
Incidence and mortality by sex and cancer site worldwide, 2002
CasesCasesCumulative risk(age 0-64)DeathsCumulative risk(age 0-64)DeathsCumulative risk(age 0-64)
Oral Cavity175,9160.498,3730.280,7360.246,7230.1
Other pharynx106,2190.324,0770.167,9640.216,0290.0
Melanoma of Skin79,0430.281,1340.221,952018,8290
Cervix uteri493,2431.3273,5050.7
Corpus uteri198,7830.450,3270.1
Brain,nervous system108,2210.281,2640.280,0340.261,6160.1
Non-Hodgkin lymphoma175,1230.3125,4480.298,8650.272,9550.1
Hodgkin Disease38,2180.124,1110.114,46008,3520
Multiple myeloma46,5120.139,1920.132,6960.129,8390

There were 1.35 million new cases, representing12.4% of all new cancers. Lung cancer is also the most common cause of death from cancer, with 1.18 million deaths, or 17.6% of the world total. Almost half (49.9%) of the cases occur in the developing countries of the world; this is a big change since 1980, when it was estimated that 69% were in developed countries. Worldwide, it is by far the most common cancer of men, with the highest rates observed in North America and Europe (especially Eastern Europe). Moderately high rates are also seen in Australia and New Zealand, and eastern Asia (China and Japan). In women, incidence rates are lower (globally, the rate is 12.1 per 100,000 women compared with 35.5 per 100,000 in men). The highest rates are in North America and Northern Europe. It is of note that the incidence in China is rather high (approximately 19.0 per 100,000); this rate is similar to that in, for example, Australia and New Zealand at 17.4 per 100,000 (Parkin, D, 2005).

Lung cancer remains a highly lethal disease. Survival at 5 years measured by the Surveillance, Epidemiology and End Results (SEER) program in the United States is 15%, the best recorded at the population level. The average survival in Europe is 10%, not much better than the 8.9% observed in developing countries (Alberg, Anthony J, 2003).

Because of the high case and fatality rate of lung cancer, the incidence and mortality rates are nearly equivalent, and, consequently, routinely collected vital statistics provide a long record of the occurrence of lung cancer. Figure 1 shows the epidemic of lung cancer that dates to the mid-20th century (Wingo, Phyllis A, 2003).

Figure 1.

Lung cancer mortality rates for the United States from 1930 to 1998, age-standardized to the 1970 US population


Lung cancer was rare until the disease began a sharp rise around 1930 that culminated by mid-century with lung cancer becoming the leading cause of cancer death among men. The epidemic among women followed that among men, with a sharp rise in rates from the 1960s to the present, propelling lung cancer to become the most frequent cause of female cancer mortality. The epidemic among women not only occurred later, but would not peak at as high a level as that among men. Note that the level for men is declining; women have a static rate but not yet a decline.

Complete Chapter List

Search this Book: