RDF Model Generation for Unstructured Dengue Patients' Clinical and Pathological Data

RDF Model Generation for Unstructured Dengue Patients' Clinical and Pathological Data

Runumi Devi (Amity University, Noida, India), Deepti Mehrotra (Amity University, Noida, India) and Hajer Baazaoui-Zghal (University of Manouba, Manouba, Tunisia)
Copyright: © 2019 |Pages: 19
DOI: 10.4018/IJISMD.2019100104
OnDemand PDF Download:
No Current Special Offers


The automatic extraction of triplets from unstructured patient records and transforming them into resource description framework (RDF) models has remained a huge challenge so far, and would provide significant benefit to potential applications like knowledge discovery, machine interoperability, and ontology design in the health care domain. This article describes an approach that extracts semantics (triplets) from dengue patient case-sheets and clinical reports and transforms them into an RDF model. A Text2Ontology framework is used for extracting relations from text and was found to have limited capability. The TypedDependency parsing-based algorithm is designed for extracting RDF facts from patients' case-sheets and subsequent conversion into RDF models. A mapping-driven semantifying approach is also designed for mapping clinical details extracted from patients' reports to its corresponding triplet components and subsequent RDF model generations. The exhaustiveness of the RDF models generated are measured based on the number of axioms generated with respect to the facts available.
Article Preview


Dengue Fever has emerged as the most infectious mosquito-borne viral disease spreading all over the world and even may lead to an epidemic in absence of proper precautions. As reported by the World Health Organization (WHO), in the year 2015, India had the worst outbreak having more than 15,000 confirmed dengue cases in Delhi itself (Siddiqui, Chakravarti, & Abhishek, 2016). In India, most of the hospitals still maintain Dengue patients’ clinical data in the form of case-sheets filled by doctors and pathological records in diagnostics (clinical tests) reports, both of which are available in unstructured form. At times, the patients need to be referred to some other hospitals they have to carry along their information and due to negligence, some vital information are lost. If these data are stored, as asserted unit of RDF (Resource Description Framework) knowledge by defining attribute-value pair of patients’ instance and making triplets visible to multiple processor among multiple organization, then the data will be more powerful as it will provide opportunities not only for machine interoperability but also for ontology design and discovering new knowledge. Information extraction from unstructured text buried in the healthcare domain in the form of patients’ case-sheets and diagnostic reports has always been a challenging task towards achieving smart data continuum in the field of clinical decision support system. This domain had been largely attracting researchers, however an exhaustive resource description framework (RDF) interpretation of unstructured text offering interoperability is still an open challenge (Brahim, Claire, & Anne, 1989). Thus, the RDF model generation of Dengue patients’ case-sheets and diagnostics reports that will allow semantics to be stored along with data in the form of triplets has become critical. The focus of the present work is to accomplish this objective by providing an approach of generating an RDF model that allows formalization of unstructured dengue patients’ case-sheets and diagnostics reports so that sharing of data across the interoperable environment is easily achieved. In this study the dengue patients’ medical records have been considered as case study. Two types of documents containing dengue patients’ information are considered in this study-one is case-sheets and the other is diagnostic reports. The hospital (Swami Dayanand Hospital, Dilshad Garden Delhi-110095, India) manages case-sheets in a textual format whereas diagnostic reports in a portable document format (PDF) format which are available in pathology department.

Information extraction through RDF model generation from textual data available in the healthcare domain (Wang et al., 2018) requires natural language processing (NLP) techniques to be performed for semantic analysis.

It enables representation of sentences in formal language that supports automated reasoning. The sentence of English language uses a fixed order that is Subject Verb and Object. There can be a little variation in the word order, in case of presence of an auxiliary verb like do, have, be, will, can, etc., or indirect object. There are different types of sentences in a language for expressing variety of information. These sentences can be of declarative, interrogative, imperative, and exclamatory type. Since the document of this study for information extraction considers the patients’ case-sheets and the case-sheets maintained in the hospital contain facts about the patients, hence from semantic view, the type of sentences limited for this experiment is the declarative sentence – both active and passive form. The English language uses quite similar structure for both positive and negative declarative sentences with an exception of the auxiliary word “not” in a negative Declarative sentence.

Complete Article List

Search this Journal:
Volume 13: 7 Issues (2022): 5 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing