Intelligent Techniques for Analysis of Big Data About Healthcare and Medical Records

Intelligent Techniques for Analysis of Big Data About Healthcare and Medical Records

Pinar Kirci (Istanbul University, Turkey)
DOI: 10.4018/978-1-5225-3232-3.ch029

Abstract

To define huge datasets, the term of big data is used. The considered “4 V” datasets imply volume, variety, velocity and value for many areas especially in medical images, electronic medical records (EMR) and biometrics data. To process and manage such datasets at storage, analysis and visualization states are challenging processes. Recent improvements in communication and transmission technologies provide efficient solutions. Big data solutions should be multithreaded and data access approaches should be tailored to big amounts of semi-structured/unstructured data. Software programming frameworks with a distributed file system (DFS) that owns more units compared with the disk blocks in an operating system to multithread computing task are utilized to cope with these difficulties. Huge datasets in data storage and analysis of healthcare industry need new solutions because old fashioned and traditional analytic tools become useless.
Chapter Preview
Top

Introduction

In the work of Mayer-Schönberger and Cukier (2013), people have collected vast amounts of data for centuries in libraries since ancient times, thus collecting and accumulating data is not new for human race. Today, they want to keep whole of the considered data. Companies want to gather the data of their suppliers, customers and staff. Also, the data of business transactions, of purchases and sales, of expenses and profits for many years. Nearly twenty years ago, this kind of data were kept in books and files. But today it is impossible to keep this kind of valuable data in files because it is not safe and there is not enough space. Today, electronic storage on databases is utilized to keep data. Thus, people save time, space, money and effort with keeping the past and present data of the companies with larger databases, faster processing computers and greater storage capacities. Larger databases are provided by ever improving computer technology. Databases are a kind of two-dimensional table of data and thus they grow in two ways as in the number of rows and the number of columns. These rows and columns keep various types of data including names, expiration dates, addresses, salaries of customers, goods, purchase and staffs. In time, the data entries increase but there is no need to remove older entries because today storage is not a problem, thus gathering and keeping data is easier. Besides, with one barcode scanner, much of the vital data can be easily gathered about a product such as transaction’s time and date, payment type and customer’s data. Also, internet presents large amounts of data, easy access, gathering and storing only with a click.

According to Mayer-Schönberger and Cukier (2013), in addition to the ever growing databases, many databases come together and form data warehouses. Nowadays, even small companies need data warehouses instead of using a database because of the variety of data needed. By connecting databases, it is easier to reach large amounts of data such as several petabytes, thus relationships are emerged among many variables and vast amounts of data can be received from data warehouses so that this will rise to data mining. But it is obvious that in a few years data warehouses will be insufficient because of the ascending data. Therefore, Cloud Computing is offered together with online storage of data with vast networks of servers. The data that cannot be handled by traditional database technology because of its unstructured nature or large amounts is called as big data. It is composed of three V’s which are volume, velocity and variability. Velocity is the speed of the processes which are data collection, storage and analysis. It is easy to handle numerical data but video clips, music and text recognition including unstructured data is more complex and difficult for processing. Thus, sometimes veracity named fourth V emerges because truth is an important factor. In a day, Google handles nearly 24 petabytes of data, 3 billion comments are left in Facebook and 400 million messages are left on Twitter. This means that at every minute nearly five exabytes of data circulates over the world.

Obaidat et al. (2011) discuss that the security, efficiency, patient based, seasonableness, impartialness and efficiency over different nations are basic goals that should be ensured for the development of healthcare. These goals were mentioned by The American Institute of Medicine. A mobile data infrastructure that includes telemedicine systems and information processing techniques for patients’ needs will be able to ensure these aims. Lately, wearable and pervasive sensors, working with personal mobile devices addresses new e-health systems which are pervasive healthcare systems. The system combines patient based and hospital based systems with ensuring medical workflow inside the facility with collecting more data.

Key Terms in this Chapter

Volume, Variety, and Velocity: Three Vs, they are three commonly utilized specification of Big Data.

Visualization: Provides display of acquired data within graphical form.

Data Cleansing: To specify, remove or correct incorrect data.

Aggregation: The state by which particulars are collected based on a classification.

Unstructured Data: Data that cannot adapt to a predefined structure.

Tools: The main building stone through which many assets are developed.

Structured Query Language (SQL): Used to store data to and retrieve data from relational databases.

Big Data: A term presenting to datasets that own big amounts of data (volume), various data (variety), and rising speed of generation (velocity).

The Internet of Things (IoT): Objects that are recognizable, locatable and controllable over the Internet.

Value: The real worth of an outcome to an individual.

Network: More than one subnets connected with routers.

Data Management Process: The process of considering source data, performing some operations on it and transmit it to a predefined location.

Structured Data: Data that adapts to a predefined structure.

Algorithm: A finite series of deterministic or random elements including steps that provide a desired outcome.

Sensor Data: Machine produced data.

Analytics: A data driven, manual analysis and optimization models including process that creates insight.

Data Warehouse: A group of data from different sources organized to present useful guidance to an organization’s decision makers. A shared repository of data.

Complete Chapter List

Search this Book:
Reset