A Data Representation Model for Personalized Medicine

A Data Representation Model for Personalized Medicine

Hafid Kadi, Mohammed Rebbah, Boudjelal Meftah, Olivier Lézoray
DOI: 10.4018/IJHISI.295822
Article PDF Download
Open access articles are freely available for download

Abstract

Personalized medicine exploits the patient data, for example, genetic compositions, and key biomarkers. During the data mining process, the key challenges are the information loss, the data types heterogeneity and the time series representation. In this paper, a novel data representation model for personalized medicine is proposed in light of these challenges. The proposed model will account for the structured, temporal and non-temporal data and their types, namely, numeric, nominal, date, and Boolean. After the "Date and Boolean" data transformation, the nominal data are treated by dispersion while several clustering techniques are deployed to control the numeric data distribution. Ultimately, the transformation process results in three homogeneous representations with these representations having only two dimensions to ease the exploration of the represented dataset. Compared to the Symbolic Aggregate Approximation technique, the proposed model preserves the time-series information, conserves as much data as possible and offers multiple simple representations to be explored.
Article Preview
Top

1. Introduction

Personalized medicine (PM) refers to the individualization of medical treatments based on the unique dataset of each patient. It generates and exploits stored patient data, which are often captured digitally in an “Electronic Health Record (EHR)” comprising profiles of many different patients. Essentially, an EHR refers to a longstanding, comprehensive health database resource that stores and manages all patient data files digitally under the custody of a licensed health entity. More specifically, it provides a digitalized view of the patient’s demographics, data associated with the patient’s clinical and medication history, diagnostic trajectory, social and economic environmental conditioning, geographical relocation, if any, as well as the patient’s genetic data, if these exist (Jensen et al., 2012).

Together, this massive data resource available via the EHR often includes not only homogeneous, heterogeneous, structured, unstructured and/or semi-structured data, but also the temporal and non-temporal data. Mixed in this huge bag of patient data are many captured medical events of different individual patients such as their body temperature measurements, blood pressure recordings and other time-series information, with different sorts and forms of data. As Ghazi (2015) noted, we consider time-series data to include all the observational sequences of a patient being captured vis-à-vis a medical event. Moreover, the EHR data resource contains a lot of hidden information and knowledge waiting to be mined and/or discovered. The process of reporting, evaluation, and medical decision-making based on the EHR data involves the extraction of relevant information and knowledge via specialized methods known as data mining techniques. The quality of information processing and knowledge discovery are thus directly linked to the availability, accessibility, type and form of the data to be extracted and aggregated for analysis. The objective of our work is to produce a high fidelity model for the representation of PM structured data. This is a challenging problem and our proposed model addresses several important scientific gaps: data heterogeneity, loss of data during data transformation, and interpretability of the representation over the course of a data mining process. To accomplish this non-trivial task, we represent the data by two parts. The first is dedicated to the representation of numeric data with clustering techniques, whereas the second part considers the representation of nominal data with respect to its dispersion. These two bodies of information are then joined into a single global representation table. Thanks to the simplicity of the obtained representation, healthcare specialists will be able to identify in the dataset both the key patient events, as well as the variations in the information conveyed by the data series. However, it is intended for the obtained representation to be used within automated medical decision-making processes such as disease prevention and/or adverse drug events prediction. Importantly, this paper emphasizes the need to explore the EHR data mining process that informs and challenges PM, which will ultimately enhance the ability of physicians and other care professionals to personalize high quality care to the inflicted individuals.

The rest of the paper is organized as follows. Section 2 explains the time-series representation process limitations. A novel data representation model proposed for PM is then detailed in Section 3 with Section 4 continuing on the discussion about the experimentation and the evaluation of the proposed model and the results analysis. The final section, Section 5, will provide concluding remarks and offer insights into practical implications and potential future works.

Complete Article List

Search this Journal:
Reset
Volume 19: 1 Issue (2024)
Volume 18: 1 Issue (2023)
Volume 17: 2 Issues (2022)
Volume 16: 4 Issues (2021)
Volume 15: 4 Issues (2020)
Volume 14: 4 Issues (2019)
Volume 13: 4 Issues (2018)
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing