Medical Applications of Intelligent Data Analysis: Research Advancements

Medical Applications of Intelligent Data Analysis: Research Advancements

Rafael Magdalena-Benedito (Intelligent Data Analysis Laboratory, University of Valencia, Spain), Emilio Soria-Olivas (Intelligent Data Analysis Laboratory, University of Valencia, Spain), Juan Guerrero Martínez (Intelligent Data Analysis Laboratory, University of Valencia, Spain), Juan Gómez-Sanchis (Intelligent Data Analysis Laboratory, University of Valencia, Spain) and Antonio Jose Serrano-López (Intelligent Data Analysis Laboratory, University of Valencia, Spain)
Indexed In: SCOPUS View 1 More Indices
Release Date: June, 2012|Copyright: © 2012 |Pages: 372|DOI: 10.4018/978-1-4666-1803-9
ISBN13: 9781466618039|ISBN10: 1466618035|EISBN13: 9781466618046
Hardcover:
Available
$156.00
List Price: $195.00
20% Discount:-$39.00
TOTAL SAVINGS: $39.00
E-Book:
Available
$156.00
List Price: $195.00
20% Discount:-$39.00
TOTAL SAVINGS: $39.00
Hardcover +
E-Book:
Available
$188.00
List Price: $235.00
20% Discount:-$47.00
TOTAL SAVINGS: $47.00
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Description

One major challenge facing modern medicine is to fulfill the promise and potential of the enormous increase in medical data sets of all kinds.

Medical Applications of Intelligent Data Analysis: Research Advancements explores the potential of utilizing this medical data through the implementation of developed models in practical applications. This premier reference source features chapters contributed by specialists in a variety of intelligent data analysis (IDA) fields related to medicine, health, and bioinformatics. All of these contributions are devoted to describing the practical medical applications of IDA and how the theory described in many other books can be implemented to improve medical care.

Topics Covered

The many academic areas covered in this publication include, but are not limited to:

  • Artificial intelligence in medical education
  • Computational intelligence in bio- and clinical medicine
  • Decision and regression trees
  • Discovery techniques
  • Graphical models
  • Information Retrieval
  • Intelligent medical information systems
  • Medical knowledge engineering
  • Neural Networks
  • Swarm Intelligence

Reviews and Testimonials

The authors of the different chapters in this handbook show the applications of those techniques [soft computing, data mining, machine learning, intelligent data analysis, extracting knowledge of massive data sets] to different clinical problems. Summarizing, the reader hold between his/her hands a complete exposition of the state of the art of the applications of those techniques in clinical practice.

– Josep Redón I Mas, European Society of Hypertension, & University of Valencia, Spain

Table of Contents and List of Contributors

Search this Book:
Reset

Preface

What is Intelligent Data Analysis? And furthermore, what can Intelligent Data Analysis bring to Health Sciences? These questions are well grounded, because the name “Intelligent Data Analysis” itself is very ambiguous. The main idea underlying this concept is extracting knowledge from data. 

Man now lives in the Age of Information. Health Sciences are fully embedded in Information Technologies. Technology is ubiquitous; technology is cheap. Technology is everything nowadays. Moore’s Law has brought the world to the Technology Information Society, and even the furthest away corner in the world is today covered by telecommunications technology. A high-end technology cellular phone exhibits more computing power that the computer that drove man to the moon 30 years ago, and we use it for playing bird-killing online games! 

It is easy and cheap to acquire, monitor, and measure any set of variables, digitize the data, and store in a hard disk or in the Cloud. Because it is easy and cheap, the society and management boards are prone to do it. But to whom much has been given, much will be expected, or this should be the right way. The cheap, powerful computing capabilities of nearly every appliance, the fast data highways that plough and fly through the Earth, and the nearly unlimited storage resources available everywhere, at any time, are flooding us with digital data. The Age of Information could also be defined as the Curse of Data, because it is quite cheap and easy to gather and store data. But people need information, so they chase knowledge. They have the haystack, but they want the needle.

Biology and Health Sciences are very complex fields. These sciences have made a long walk from the ancient times, but processes involved in biology, medicine and physiology are much too intricate to be faithfully modeled. It is not easy to extract knowledge starting from raw data, and it is also not cheap. The curse of cheap hardware, cheap bandwidth, and a cheap processor is an extraordinary large amount of data, a very large number of variables, and very little knowledge about what is cooking inside this data.

During the recent past, scientists and technologists have relied on traditional statistics to cope with the task of extracting information from data. Statistics building has been deeply rooted in the ground of Mathematics since the seventeenth century, but during the last few decades, this enormous amount of data and variables has overwhelmed the capabilities of classical statistics. There is no way for classical methods to deal with such amount of data. It’s impossible to visualize even the lesser information, and man is unable to extract knowledge from these radiant, brand-new gathered datasets.

Mathematics is also now coming to help, going back to classical statistics and bringing tools that enable us to extract some information from these huge datasets. These new tools are called “Intelligent Data Analysis.” But Mathematics is not the only discipline involved in Data Analysis. Engineering, computing sciences, database science, machine learning, and even artificial intelligence are bringing their powers to this newly born data analysis discipline.

So Intelligent Data Analysis is defined as the tools that enable for extracting the information under lying a very large amount of data, with a very large amount of variables, data that represents very complex, non-linear, real-life problems, which are intractable with the old tools people were used to. People must be able to cope with high dimensionality, sparse data, very complex and unknown relationships, biased, wrong or incomplete data, and mathematics algorithms or methods that lie in the foggy frontier of Mathematics, Engineering, Physics, Computer Science, Statistics, Biology or even Philosophy.

Moreover, Intelligent Data Analysis can help starting from the raw data, coping with prediction tasks without knowing the theoretical description of the underlying process, classification tasks of new events based off of past ones, or modeling the aforementioned unknown process. Classification, prediction, and modeling are the cornerstones that Intelligent Data Analysis can bring to us.

And in this brave New Information World, information is the key. It is the key, the power, and the engine that moves the economy. The world is moving with markets data, medical epidemiologic sets, Internet browsing records, geological surveys data, complex engineering models, and so on. Nearly every man activity nowadays is generating a big amount of data that can be easily gathered and stored, and the greatest value of that data is the information that lies behind it.

This book approaches Intelligent Data Analysis from a very practical point of view. There are many theoretical, academic books about theory on data mining and analysis, but the approach in this book comes from a real health-world view: solving common life problems with data analysis tools. It is an “engineering” point of view, in the sense that the book presents a real problem, usually defined by complex, non-linear and unknown processes, and then offers a Data Analysis based solution that enables for solving the problem or even to infer the process underlying the raw data. The book gives practical experiences with intelligent data analysis.

So this book is aimed to medicine and biology scientists and engineers carrying out research in very complex, non linear areas, such as  medicine, genetics, biology, and data processing, with large amounts of data that need to extract some knowledge starting from the data, knowledge that can take the flavor of prediction, classification, or modeling. But this book also brings a valuable point of view to engineers and businessmen that work in companies, trying to solve practical, economical, or technical problems in the field of their company activities or expertise. The pure practical approach helps to transmit the idea and the aim of the author to communicate the way to approach and to cope with problems that would be intractable in any other way. And at last, final courses of academic degrees in Engineering, Mathematics, Medicine, or Biology can use this book to provide students with a new point of view for approaching and solving real, practical problems when underlying processes are not clear.

Obviously a prior knowledge of statistics, discrete mathematics, and machine learning is desirable, although authors provide several references to help engineers and scientists use the experience and the know-how described in every chapter to their own benefit. The book is structured as follows. 

Chapter 1, “Intelligent Management of Sepsis in the Intensive Care Unit,” Ribas, Ruiz, and Vellido is about sepsis. Sepsis is a pathology affecting all people and is one of the main causes of death in the Intensive Care Unit. Indeed, it is tenth most common cause of death in western countries (death rates up to 60% for its most severe stages). The aim of this chapter is to provide interpretable and actionable for the assessment of the Risk of Death in Severe Sepsis.  Three different methods are presented: Relevance Vector Machines (a sub-class of Support Vector Machines) that provides an automated ranking of relevance of the mortality predictors, Logistic-Regression models that are widely used by the medical community, and Logistic-Regression over Latent Factors (i.e. Logistic-Regression combined with Factor Analysis). The new methods are compared against other state-of-the-art methods widely used in clinical practice (APACHE II). 

Chapter 2, “Statistical Pattern Recognition Techniques for Early Diagnosis of Diabetic Neuropathy by Posturographic Data,” by Diamantini, Fioretti, and Potena, describes the use of Statistical Pattern Recognition techniques for the early diagnosis of Peripheral Diabetic Neuropathy, with the twofold aim of distinguishing between non-neuropathic and neuropathic patients and of recognizing the severity of the neuropathy. The chapter presents two experimental methodologies, which are based on Linear Discriminant Analysis and Bayes Vector Quantizer algorithms, respectively. 

In Chapter 3, “Preprocessing MRS Information for Classification of Human Brain Tumours,” Arizmendi, Vellido and Romero analyze Magnetic Resonance Spectroscopy data from Brain Tumors database in order to prove the importance of data preprocessing prior to diagnostic classification. 

In Chapter 4, “Semi-supervised Clustering for the Identification of Different Cancer Types using the Gene Expression Profiles,” Martín-Merino covers the DNA Microarrays, which allow for monitoring the expression level of thousands of genes simultaneously across a collection of related samples. Supervised learning algorithms such as k-NN or SVM (Support Vector Machines) have been applied to the classification of cancer samples using the gene expression profiles. However, they are not able to discover new subtypes of diseases. This chapter studies several supervised clustering algorithms suitable to discover new subtypes of cancer. Next, a semi-supervised clustering algorithm is introduced that allows for incorporating a priori knowledge provided by human experts. The performance of the algorithms is illustrated considering several complex human cancer problems. 

Chapter 5, “Real-Time Robust Heart Rate Estimation Based on Bayesian Framework and Grid Filters,” by Bortel and Sovka, describes a robust real-time algorithm for the estimation of heart rate (HR) from strongly corrupted electrocardiogram records. The problem of HR estimation is formulated as a problem of inference in a Bayesian network, which utilizes prior information about the probability distribution of HR changes. From this formulation, an inference procedure is derived and implemented as a grid filter. The resulting algorithm can then follow even a rapidly changing HR, whilst withstanding a series of missed or false QRS detections. Additionally, the computational complexity of this algorithm is acceptable for battery powered portable devices. 

In Chapter 6, “Automated Diagnostics of Coronary Artery Disease: Long-term Results and Recent Advancements,” Kukar, Kononenko, and Groselj analyze the clinical diagnostics of coronary artery disease by using advanced analytical and decision support tools. They study various topics, such as improving the predictive power of clinical tests by utilizing pre-test and post-test probabilities, texture representation, multi-resolution feature extraction, feature construction and data mining algorithms that significantly outperform the medical practice. Finally, they present the results during a long term study. 

Chapter 7, “The Use of Prediction Reliability Estimates on Imbalanced Datasets: A Case Study of Wall Shear Stress in the Human Carotid Artery Bifurcation,” by Kosir, Bosnic, and Kononenko demonstrate the positive effects of using proposed algorithms on artificial datasets. They then apply the developed methodology on the problem of predicting the maximal wall shear stress (MWSS) in the human carotid artery bifurcation. The results indicate that it is feasible to improve the classifier's performance by balancing the data with the authors’ versions of the SMOTE algorithm. 

In Chapter 8, “Pattern Mining for Outbreak Discovery Preparedness,” Long, Hamdan, Bakar, and Sahani review data mining techniques focusing on frequent and outlier mining to develop generic outbreak detection process model, named as “Frequent-outlier” model. The process model was tested against the real dengue dataset obtained from FSK, UKM, and also tested on the synthetic respiratory dataset obtained from AUTON LAB. The ROC was run to analyze the overall performance of “frequent-outlier” with CUSUM and Moving Average (MA). 

Chapter 9, “Development of Surrogate Models of Orthopedic Screws to Improve Biomechanical Performance: Comparisons of Artificial Neural Networks and Multiple Linear Regressions,” Hsu evaluates the strengths and limitations of the surrogate methods in developing the objective functions of the lag screws used in double screw nails and investigates the design improvements of this orthopaedic device. 

In Chapter 10, “Dashboard to Support the Decision-making within a Chronic Disease: A Framework for Automatic Generation of Alerts and KPIs,” by Teixeira, Saavedra, and Simoes, given the importance that the real-time information has within the scope of clinical decisions, with increased relevancy in the context of chronic diseases, the present chapter discusses the role of an application for monitoring real-time data in a specific chronic disease, based on alerts and KPIs. Moreover, those concepts are demonstrated by a practical application, developed in collaboration with the Haematology Service of Coimbra Hospital Centre (SH_CHC), in order to provide a quick reading of the relevant information for decision-making through a set of alerts and KPIs, based on a push strategy, displayed on a dashboard. 

Chapter 11, “Identification of Motor Functions Based on an EEG Analysis,” by Belic and Logar, is about phase characteristics of the electroencephalographic (EEG) signals are getting a lot of attention latey as phase-locking seems to be one of the most important mechanisms for binding of brain regions during complex activity. Working memory and motor activity tasks have been extensively used in search for existence of the phase locking activity in the EEG. The area of 2D image compression also showed that when using frequency based methods for image compression, the phase properties of the image carry the most important information of the image composition while amplitude is of secondary importance and does not affect the possibility to recognise the de-compressed image. However, phase properties of signals are relatively difficult to extract from the signals in real-time and are strongly affected by the measurement noise. A phase demodulation method for the analysis of the EEG signals is shown and illustrated in the Chapter Identification of motor functions based on EEG analysis. The method provides promising results with respect to brain-computer interface development.

In Chapter 12, “Visual Data Mining in Physiotherapy Using Self-Organizing Maps: A New Approximation to the Data Analysis,” by Alakhdar, Martínez, Guimerá, Escandell, Benitez, and Soria, deals with Anterior Cruciate Ligament injury (ACL), which is the most frequent lesion in the knee joint, and the most of torn ligaments occurs during the participation in sports activities. Among the different surgical techniques, most authors consider the intra-articular reconstruction techniques. In this study, the semitendinosus tendon graft was used. After surgery, the subject must undergo a period of rehabilitation. This period is considered as important as the surgery or even more. Thus, in order to facilitate the functional recovery of the affected knee, monitoring, control, and an evaluation of the patient are crucial. For this purpose, a neural approximation based on multidimensional visual data mining methods, the self-organizing maps (SOM), is shown by means of the valuation analysis of the knee in athletes in the pre- and post-surgery of the anterior cruciate ligament, studying variables of force and measurements at different distances of the knee. The goal is to check if the analysis of these variables permits to know if the recovery process has satisfied its final aim. Together with the measurements of the thigh contour and the muscle strength, in the SOM analysis it is also included the age, weight, and height of each patient. 

In Chapter 13, “Kernel Generative Topographic Mapping of Protein Sequences,” Cárdenas, Vellido, Olier, Rovira, and Giraldo work on the world of pharmacology, which is becoming increasingly dependent on the advances in the fields of genomics and proteomics. The –omics sciences bring about the challenge of how to deal with the large amounts of complex data they generate from an intelligent data analysis perspective. In this chapter, the authors focus on the analysis of a specific type of proteins, the G protein-coupled receptors (GPCRs), which regulate the function of most cells in living organisms. They describe a kernel method of the manifold learning family to analyze the grouping of their amino acid symbolic sequences. This grouping into types and subtypes, based on sequence analysis, may significantly contribute to helping drug design and to a better understanding of the molecular processes involved in receptor signaling both in normal and pathological conditions. 

In Chapter 14, “Medical Critiquing Systems,” Douglas provide an overview of the history of the critiquing approach to knowledge systems that illustrates a more human-centered approach. It is an approach that unlike traditional knowledge-based systems, aims to provide a check on human reasoning, rather than a replacement for it. The chapter will also discuss future possibilities for research, in particular the use of social networking and recommender systems, as a means to enhance the approach. 

In Chapter 15, “Learning Probabilistic Graphical Models: a review of techniques and applications in medicine,” by Alonso, Nielsen, de la Ossa, and Puerta,  First a brief introduction to the most important and most commonly used types of probabilistic graphical models is given, besides their specification, parametrization, and interpretation. A special focus on the Bayesian Belief network model is made. Then an overview of the most typical frameworks for learning models from data is studied and the ideas that lie behind the development of these frameworks are discussed. A review of recent and classical applications of probabilistic graphical models and learning in the areas of diagnostic and prognostic reasoning, and automatic discovery of causal relationships and regulatory networks in genetics. The chapter concludes with a brief discussion on the most interesting publicly available software packages for learning and modelling using probabilistic graphical models.  

Chapter 16, “Natural Language Processing and Machine Learning Techniques Help Achieve a Better Medical Practice,” by Frunza and Inkpen, presents several natural language processing and machine learning techniques that can help the medical practice by means of extracting relevant medical information from the wealth of textual data. The chapter describes three major tasks: building intelligent tools that can help in the clinical decision making, tools that can automatically identify relevant medical information from the life-science literature, and tools that can extract semantic relations between medical concepts. The chapter also presents methodological settings accompanied by representative results obtained on real-life data sets for all three tasks. 

Chapter 17, “Modeling Interpretable Fuzzy Rule-Based Classifiers for Medical Decision Support,” by Alonso, Castielo, Lucarelli, and Mencar, is about intelligent systems for medical decision support may be of little use if the knowledge at the basis of decisions is not easily comprehensible to physicians (and patients). This chapter describes a methodology for designing fuzzy rule-based classifiers based on linguistic rules that are easy to read and understand. Moreover, it shows a proof of concept based on a real-world case study for predicting the evolution of the end-stage renal disease in subjects affected by Immunoglobin-A Nephropathy. 

In Chapter 18, “Extraction of Medical Pathways from Electronic Patient Records,” Antonelli, Baralis, Bruno, Chiusano, Mahoto, and Petrigni, a huge amount of medical data storing the medical history of patients has made available in recent years by the introduction of electronic medical records. An actual problem in this domain is to perform reverse engineering of the medical treatment process to highlight medical pathways typically adopted for specific health conditions, as well as discovering deviations with respect to predefined care guidelines. This information can support healthcare organizations in improving the current treatment process or assessing new guidelines. The chapter addresses the ability of sequential data mining techniques to reconstruct the actual medical pathways followed by patients. Detected medical pathways are in the form of sets of exams frequently done together, sequences of exam sets frequently followed by patients and frequent correlations between exam sets. The analysis shows that the majority of the extracted pathways are consistent with the medical guidelines, but also reveals some unexpected results that can be useful both to enrich existing guidelines and to improve the public sanitary service. 

Finally, in Chapter 19, “Building a Lazy Domain Theory for Characterizing Malignant Melanoma,” Armengol and Puig describe an application focused on building a model able to characterize (and distinguish) early malignant melanoma from benignant skin lesions. The procedure followed for constructing such model is using lazy learning methods instead of inductive learning methods that is the most usual approach. Authors experimentally compared the performance of the domain theories generated by two lazy learning methods (k-NN and LID) with the ones generated by decision trees. Results show that lazy learning theories have aspects that allow practitioners to consider them better than the inductive domain theories. In addition, when comparing the predictivity of the theories, the lazy domain theories show to be better than the inductive ones.

The Editors

Indices

Editorial Board

  • Ales Belic, University of Ljubljana, Slovenia
  • Eva Armengol, Artificial Intelligence Research Institute, Spain 
  • Pavel Sovka, Czech Technical University in Prague, Czech Republic
  • Leonor Teixeira, University of Aveiro, Portugal
  • Oana Frunza, University of Ottawa, Canada