Learning Models for Concept Extraction From Images With Drug Labels for a Unified Knowledge Base Utilizing NLP and IoT Tasks

Learning Models for Concept Extraction From Images With Drug Labels for a Unified Knowledge Base Utilizing NLP and IoT Tasks

Sukumar Rajendran (Vellore Institute of Technology, Vellore, India) and Prabhu J. (Vellore Institute of Technology, Vellore, India)
DOI: 10.4018/IJITWE.2020070102
OnDemand PDF Download:
List Price: $37.50
10% Discount:-$3.75


The evolution of humankind is through the exchange of information and extraction of knowledge from available information. The process of exchange of the information differs by the probability of the medium through which the information is exchanged. The Internet of things (IoT) contains millions of devices with sensors simultaneously transferring real time information to devices as rapid streams of data that need to be processed on the go. This leads to the need for development of effective and efficient approaches for segregating data based on class, relatedness, and differences in the information. The extraction of text from images is performed through tesseract irrespective of the language. SCIBERT models to extract scientific information and evaluating on a suite of tasks specially in classifying drugs based on free data (tweets, images, etc.). The images and text-based semantic similarity analysis provide similar drugs grouped together by composition or manufacturer.
Article Preview


Concept extraction is used to extract the concept and the named entities from the text available freely moreover, it is also referred to as named entity recognition (NER). It is being widely used in clinical domain for recommendation of treatments or the classification of diseases by predicting the available concepts in the electronic healthcare records (EHR).

NER helps in spotting locations, persons, and organizations from the available text and categorizing them into a predefined set of classes. NER uses a knowledge base that contains the named entities that are extracted from the vocabulary is given as text.

The need for structured data from the massive repository of freely available text is the need of the hour for making specific decision, i.e. (CDSS). The different issues with the clinical decision support system (CDSS) that make concept extraction difficult are as follows,

  • Access to the healthcare data

  • Sensitive information (PII)

  • Limited annotated source

  • Large variations in data

Furthermore, different annotations are used within terms that refer to clinical symptoms and pertain only specific to physicians of the particular hospital. This specified concept cannot be transferred directly without training and need for standard corpus that can decode the annotations.

The different aspect in the clinical text is that it can be very noisy with acronyms, spelling mistakes, and handwriting making some part of data unusable. The available text may be from a selective group of EHR with limited data and missing critical features for a different context of assessment.

The components that might inhibit data from being assessed by the regular NLP and NER are

  • Extraction from clinical literature

  • De-identified data

  • Clustered entities

The recent advancement in online social network (OSN) has resulted in the availability of massive public data sets. In the year 2018, there were about 500 million tweets published per day on the popular micro-blogging platform Twitter. These data sets have offered numerous research opportunities for deriving meaningful cause-effect relationships for many valuable application domains, including that of drugs and their side effects. Large-scale studies are usually needed to find out why people take certain drugs or why certain drugs cause unpredicted effects. Drugs may cause different side effects, which are mostly discovered during drug development. Pharmaceutical companies, for example, traditionally rely on clinical trials to establish the efficacy and side effects of drugs. However, some side effects might not be revealed during that stage because of the limited sample size of clinical trials. The monitoring post-marketing of approved drugs with Patients reporting undesired side effects to their healthcare provider. However, many minor side effects might remain unreported by these established means, yet they often appear in social media postings.

Several methods have been proposed for extracting text and meaningful relationships from publicly available data sets. Among them, sentiment analysis, also called opinion mining, has attracted much attention. Sentiment analysis is a field of NLP processing, focusing on finding how people feel about entities, events, or other objects. In the focus of this paper, it is to find the tweets that talk about how people feel about drugs and their side effects. More specifically, the goal of this paper is to find tweets that contain information about why people taking drugs and the effects caused by drugs. Extracting cause-effect relations from Twitter is challenging. First, many, if not most, tweets are incomplete sentences; they often do not follow syntactic or semantic rules. Furthermore, for extracting a drug-effect relationship, the automatic extraction process is often hindered by the complexity of drug names, often causing misspellings. Additionally, the text processing of drug-related tweets would need a robust spam filtering procedure because a significant number of drug advertisements and news appear on Twitter.

The challenge is to address in this work, a pipeline for extracting drug-related cause-effect relations from Twitter data is presented. The pipeline consists of four sub-tasks:

  • 1.

    Data streaming

  • 2.

    Spam filtering

  • 3.

    Data preprocessing

  • 4.

    Relationship classification

Complete Article List

Search this Journal:
Volume 18: 1 Issue (2023): Forthcoming, Available for Pre-Order
Volume 17: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 16: 4 Issues (2021)
Volume 15: 4 Issues (2020)
Volume 14: 4 Issues (2019)
Volume 13: 4 Issues (2018)
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing