Healthcare big data streams from multiple information sources at an alarming volume, velocity, and variety. The challenge that faces the healthcare industry is extracting meaningful value from such sources. This chapter investigates the diversity and forms of data in the healthcare sector, reviews the methods used to search and analyze these data throughout the past years, and the use of machine learning and data mining techniques to mine useful knowledge from such data. The chapter will also highlight innovations of particular systems and tools which spot the fine approaches for different healthcare data, raise the standard of care and recap the tools and data collection methods. The authors emphasize some of ethical issues regarding processing these records and some data privacy issues.
Medical data are at once the most rewarding and challenging of all biological data. For decades everyone was infatuated by the liability of keeping every record and collecting any possible information about everything in their life. The healthcare industry has also experienced these practices about generating and keeping large amounts of data driven by record keeping at physicians’ clinics, which is referred to as patient records. This includes forms filled by the patient regarding his/her personal information and oral examination recorded by physicians during visits. Other forms of checkups, different laboratory examinations, and CT scan as well as X-ray images are also kept in hospital’s emergency room when examining patients. Moreover, data about compliance & regulatory requirements, and patient care is also evolving from national and international organizations that monitor and administer the healthcare industry.
Electronic health records have experienced several studies. Drug safety study (Trifirò et al., 2009) investigated adverse drug reactions with other diseases, in (Jensen, Jensen, & Brunak, 2012), they combined the HER with the genetic data to reveal gene-disease association, (Almodaifer, Hafez, & Mathkour, 2011) discovered the interesting and concise medical rules for prediction purpose to assist the medical decision makers.
Medical diagnosis researches have proven a great success, because the data about the disease and the patient under examination is always available. In fact the medical diagnostic knowledge can be automatically derived from the description of cases solved in the past. (Kumar, Sathyadevi, & Sivanesh, 2011) proposed using an intelligent clinical decision support system to assist physicians in diagnosing. An automatic diagnosis system was presented in (Karabatak & Ince, 2009b). Soni & Ansari, 2011; Kharya, 2012; Huang, Chen, & Lee, 2007; (Ha, 2011) (Kononenko, 2001) summarized several machine learning techniques used for classifying diseases such as naïve Bayesian and neural networks, his work also highlighted the specific requirements for good performing machine learning algorithms in solving medical diagnostic tasks.
Key Terms in this Chapter
Electronic Health (Medical) Record: The narrative record written by the nurse or the physician during patient examination.
Big Data: Large amount of data, in many different formats, that flows rapidly in real time. This data should undergo some sort of analysis in order to extract useful information or knowledge.
Machine Learning: The subfield of artificial intelligence that uses learning algorithms in handling problems.
Medical Diagnosis: Investigating the symptoms and causes of certain disease, either by oral examination or laboratory tests.
Hadoop: Hadoop is an open-source software framework from IBM, capable for storing and processing big data in a distributed fashion on large clusters.
Data Mining: The process of extracting new, useful, understandable and previously unknown knowledge from information that might help in decision making.
Cloud Computing: High performance computing service which does not require location setting, but instead use a grid of computers over the internet.