NLP for Clinical Data Analysis: Handling the Unstructured Clinical Information

NLP for Clinical Data Analysis: Handling the Unstructured Clinical Information

Partha Sarathy Banerjee, Jaya Banerjee
DOI: 10.4018/978-1-7998-2120-5.ch018
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This work focuses on the topic of natural language processing for clinical data analysis. In a world where information is being generated at an exponential rate, the need for this information handling and management finds wide attention. The majority of the data being generated is in the form of unstructured data. The processing of structured information is relatively easier as compared to semi-structured or unstructured data. In the case of clinical data, the larger chunk is in unstructured form like the patient's case study and history. This chapter will provide a deeper insight into this class of data and will provide various solutions to how this data can be interpreted and represented for better healthcare of the common masses. In this chapter, the authors discuss a generic system developed for unstructured data handling: Natural Language Information Interpretation and Representation System (NLIIRS).
Chapter Preview
Top

Background

As the population is increasing at an enormous rate the need for doctors took an exponential rise. As better and skilled doctors had to handle higher volume of patients they always needed assistive technologies which can provide better processing of information and hence better treatment of the ailing patient. With the emergence of Artificial Intelligence and its dominant role in the health care sector the most specialized area that finds the highest utility is the natural language processing. Hence this work will focus on the broader aspect of NLP and clinical data analysis. Since the work focuses on handling unstructured information of clinical domain hence we will try to look at some of the previous work done in this domain. In the work stated in (Iyyer, M et al, 2014) the author provides a neural network based approach for handling the unstructured data. Way back in 1961 a module was developed for extracting information related to the game of base ball as stated in (Green, B et al, 1961). In 1973 a comprehensive amount of work has been done as stated in (Woods W.A, et al, 1973). The authors in (Androutsopoulos, I et al, 1994) proposed an interface for natural language input and the standard SQL backend. The work stated in (Androutsopoulos, I et al, 1995) also discusses in the similar direction of creation of natural language interface of between a traditional RDBMS and the natural language input information. The work of these interfaces is to reduce the complexity involved in totally removing the RDBMS. The work proposed in (Yan-hong, F et al, 2018) highlights the utility of the skip gram model instead of Named Entity Recognizer (NER). The work stated in (Banerjee PS et al, 2019) is a state of the art technology for information retrieval from totally unstructured corpus by using question and answering. In this work the authors have removed the use of SQL at the back end for information extraction and instead of that the information is retrieved using questions asked in the form of natural language text. In (Banerjee PS et al, 2015) the author have worked on the semantic as well as the syntactic level of linguistic analysis so that any natural language information can be processed in a better way. The authors in (Sharma J.A., et al, 2020) have the option of the debugger for better query handling in case of natural language. In most of cases as in (Ranjan, P., & Balabantaray, R. C, et al, 2020) or (Ma, R., Zhang, et al, 2018) the stress is on factoid output of the data that will be processed hence its applicability will be higher in case of clinical data handling. The work in (Yulianti, E, et al, 2018) and in (Iyyer, M, Boyd-Graber, et al, 2014) clearly mentions about the summarization of information also which will be discussed in the later part of the chapter also. Even though a lot of work has been stated but in this chapter the aspect that will be addressed will be the clinical records mostly handwritten. To somehow manage these kind of data the best solution is to make use of natural language understanding.

Key Terms in this Chapter

Electronic Medical Records (EMR): It is a combination of structured medical history as well as handwritten and unstructured information.

Text Heavy Information (THI): Text heavy information is an unstructured data which is mostly in the form of Natural Language Text.

Long Short-Term Memory (LSTM): Long short-term memory (LSTM) is artificial recurrent neural network (ARNN) which finds utility in the field of deep learning.

Natural Language Information Interpretation and Representation System (NLIIRS): NLIIRS in an aggregation of modules which accept information in unstructured format and represent it in usable form. It assists question and answering.

Complete Chapter List

Search this Book:
Reset