Data Mining in Atherosclerosis Risk Factor Data

Data Mining in Atherosclerosis Risk Factor Data

Petr Berka (University of Economics and the Academy of Sciences of the Czech Republic, Czech Republic), Jan Rauch (University of Economics and the Academy of Sciences of the Czech Republic, Czech Republic) and Marie Tomecková (Academy of Sciences, Czech Republic)
DOI: 10.4018/978-1-60566-218-3.ch018
OnDemand PDF Download:


The aim of this chapter is to describe goals, current results, and further plans of long-time activity concerning application of data mining and machine learning methods to the complex medical data set. The analyzed data set concerns a longitudinal study of atherosclerosis risk factors. The structure and main features of this data set, as well as methodology of observation of risk factors, are introduced. The important first steps of analysis of atherosclerosis data are described in details together with a large set of analytical questions defined on the basis of first results. Experience in solving these tasks is summarized and further directions of analysis are outlined.
Chapter Preview


Atherosclerosis is a slow, complex disease that typically starts in childhood and often progresses when people grow older. In some people it progresses rapidly, even in their third decade. Many scientists think it begins with damage to the innermost layer of the artery. Atherosclerosis involves the slow buildup of deposits of fatty substances, cholesterol, body cellular waste products, calcium, and fibrin (a clotting material in the blood) in the inside lining of an artery. The buildup (referred as a plaque) with the formation of the blood clot (thrombus) on the surface of the plaque can partially or totally block the flow of blood through the artery. If either of these events occurs and blocks the entire artery, a heart attack or stroke or other life-threatening events may result.

People with a family history of premature cardiovascular disease (CVD) and with other risk factors of atherosclerosis have an increased risk of the complications of atherosclerosis. Research shows the benefits of reducing the controllable risk factors for atherosclerosis: high blood cholesterol, cigarette smoking and exposure to tobacco smoke, high blood pressure, diabetes mellitus, obesity, physical inactivity.

Atherosclerosis-related diseases are a leading cause of death and impairment in the United States, affecting over 60 million people. Additionally, 50% of Americans have levels of cholesterol that place them at high risk for developing coronary artery disease. Similar situation can be observed in other countries. So the education of patients about prevention of atherosclerosis is very important.

In the early seventies of the twentieth century, a project of extensive epidemiological study of atherosclerosis primary prevention was developed under the name National Preventive Multifactorial Study of Hard Attacks and Strokes in the former Czechoslovakia. The aims of the study were:

  • Identify atherosclerosis risk factors prevalence in a population generally considered to be the most endangered by possible atherosclerosis complications, i.e. middle aged men.

  • Follow the development of these risk factors and their impact on the examined men health, especially with respect to atherosclerotic cardiovascular diseases.

  • Study the impact of complex risk factors intervention on their development and cardiovascular morbidity and mortality.

  • 10–12 years into the study, compare risk factors profile and health of the selected men, who originally did not show any atherosclerosis risk factors with a group of men showing risk factors from the beginning of the study.

Men born between 1926 and 1937 living in centre of the capital of the Czechoslovakia -Prague - were selected from election lists in year 1975. The invitation for examination included a short explanation of the first examination purpose, procedure and later observations and asked for co-operation. At that time, no informed signature of the respondent was required. Entry examinations were performed in the years 1976–1979 and 1 419 out of 2 370 invited men came for the first examination and risk factors of atherosclerosis were classified according to the well defined methodology. The primary data covers both entry and control examination. 244 attributes have been surveyed with each patient at entry examination and there are 219 attributes, which values are codes or results of size measurements of different variables. 10 610 control examination were further made, each examination concerns 66 attributes. Some additional irregular data collections concerning these men were performed. Study is named STULONG – LONGitudinal STUdy - and continues for twenty years.

The observation resulted into data set consisting of four data matrices that are suitable for application of both classical statistical data analysis method and for application of data mining and machine learning. The project to analyze these data by methods of data mining started by setting large set of analytical questions. The goal of this chapter is to describe first steps in application of data mining and machine learning methods to the STULONG data. We also summarize the additional analyzes inspired by set of analytical questions and we introduce further planned work.

Complete Chapter List

Search this Book:
Editorial Advisory Board
Table of Contents
Riccardo Bellazzi
Petr Berka, Jan Rauch, Djamel Abdelkader Zighed
Petr Berka, Jan Rauch, Djamel Abdelkader Zighed
Chapter 1
Jana Zvárová, Arnošt Veselý
This chapter introduces the basic concepts of medical informatics: data, information, and knowledge. Data are classified into various types and... Sample PDF
Data, Information and Knowledge
Chapter 2
Michel Simonet, Radja Messai, Gayo Diallo
Health data and knowledge had been structured through medical classifications and taxonomies long before ontologies had acquired their pivot status... Sample PDF
Ontologies in the Health Field
Chapter 3
Alberto Freitas, Pavel Brazdil, Altamiro Costa-Pereira
This chapter introduces cost-sensitive learning and its importance in medicine. Health managers and clinicians often need models that try to... Sample PDF
Cost-Sensitive Learning in Medicine
Chapter 4
Arnošt Veselý
This chapter deals with applications of artificial neural networks in classification and regression problems. Based on theoretical analysis it... Sample PDF
Classification and Prediction with Neural Networks
Chapter 5
Patrik Eklund, Lena Kallin Westin
Classification networks, consisting of preprocessing layers combined with well-known classification networks, are well suited for medical data... Sample PDF
Preprocessing Perceptrons and Multivariate Decision Limits
Chapter 6
Xiu Ying Wang, Dagan Feng
The rapid advance and innovation in medical imaging techniques offer significant improvement in healthcare services, as well as provide new... Sample PDF
Image Registration for Biomedical Information Integration
Chapter 7
ECG Processing  (pages 137-160)
Lenka Lhotská, Václav Chudácek, Michal Huptych
This chapter describes methods for preprocessing, analysis, feature extraction, visualization, and classification of electrocardiogram (ECG)... Sample PDF
ECG Processing
Chapter 8
EEG Data Mining Using PCA  (pages 161-180)
Lenka Lhotská, Vladimír Krajca, Jitka Mohylová, Svojmil Petránek, Václav Gerla
This chapter deals with the application of principal components analysis (PCA) to the field of data mining in electroencephalogram (EEG) processing.... Sample PDF
EEG Data Mining Using PCA
Chapter 9
Darryl N. Davis, Thuy T.T. Nguyen
Risk prediction models are of great interest to clinicians. They offer an explicit and repeatable means to aide the selection, from a general... Sample PDF
Generating and Verifying Risk Prediction Models using Data Mining
Chapter 10
Vangelis Karkaletsis, Konstantinos Stamatakis, Karampiperis, Karampiperis, Pythagoras Karampiperis, Pythagoras Karampiperis
The World Wide Web is an important channel of information exchange in many domains, including the medical one. The ever increasing amount of freely... Sample PDF
Management of Medical Website Quality Labels via Web Mining
Chapter 11
Rainer Schmidt
In medicine, a lot of exceptions usually occur. In medical practice and in knowledge-based systems, it is necessary to consider them and to deal... Sample PDF
Two Case-Based Systems for Explaining Exceptions in Medicine
Chapter 12
Bruno Crémilleux, Arnaud Soulet, Jiri Kléma, Céline Hébert, Olivier Gandrillon
The discovery of biologically interpretable knowledge from gene expression data is a crucial issue. Current gene data analysis is often based on... Sample PDF
Discovering Knowledge from Local Patterns in SAGE Data
Chapter 13
Jirí Kléma, Filip Železný, Igor Trajkovski, Filip Karel, Bruno Crémilleux
This chapter points out the role of genomic background knowledge in gene expression data mining. The authors demonstrate its application in several... Sample PDF
Gene Expression Mining Guided by Background Knowledge
Chapter 14
Pamela L. Thompson, Xin Zhang, Wenxin Jiang, Zbigniew W. Ras, Pawel Jastreboff
This chapter describes the process used to mine a database containing data, related to patient visits during Tinnitus Retraining Therapy. The... Sample PDF
Mining Tinnitus Database for Knowledge
Chapter 15
Dinora A. Morales, Endika Bengoetxea, Pedro Larrañaga
Infertility is currently considered an important social problem that has been subject to special interest by medical doctors and biologists. Due to... Sample PDF
Gaussian-Stacking Multiclassifiers for Human Embryo Selection
Chapter 16
Mining Tuberculosis Data  (pages 332-349)
Marisa A. Sánchez, Sonia Uremovich, Pablo Acrogliano
This chapter reviews the current policies of tuberculosis control programs for the diagnosis of tuberculosis. The international standard for... Sample PDF
Mining Tuberculosis Data
Chapter 17
Mila Kwiatkowska, M. Stella Atkins, Les Matthews, Najib T. Ayas, C. Frank Ryan
This chapter describes how to integrate medical knowledge with purely inductive (data-driven) methods for the creation of clinical prediction rules.... Sample PDF
Knowledge-Based Induction of Clinical Prediction Rules
Chapter 18
Petr Berka, Jan Rauch, Marie Tomecková
The aim of this chapter is to describe goals, current results, and further plans of long-time activity concerning application of data mining and... Sample PDF
Data Mining in Atherosclerosis Risk Factor Data
About the Contributors