A Named Entity Recognition Approach for Electronic Medical Records Using BERT Semantic Enhancement and BiLSTM

A Named Entity Recognition Approach for Electronic Medical Records Using BERT Semantic Enhancement and BiLSTM

Xuewei Lai, Qingqing Jie
Copyright: © 2023 |Pages: 14
DOI: 10.4018/IJSWIS.333711
Article PDF Download
Open access articles are freely available for download

Abstract

Aiming at the problems of missing local context features, single word vector representation, and low entity recognition accuracy, a method for e-medical recording with named entity recognition, which is based on BERT and model fusion, is proposed. First, with the model of BERT for pre-training, the preceding and following contextual information is fused for the enhancement of word semantic representation and alleviation of the problem of polysemy; second, the network of bi-directional long-short term memory is for obtaining the sequence feature matrix, generation of optimal sequence in global sense achieved through the conditional random field model; finally, data enhancement is used to alleviate the class imbalance and improve the model ability in generalization. Results of the experiments find model proposal measured by F1 on CCKS21 data set reaches 0.8548, which is 0.51% and 0.08% higher than models with ID-CNNs-CRF and multi-task RNN. This demonstrates the excellent performance of the method proposed in this paper in improving named entity recognition.
Article Preview
Top

Introduction

The of natural language processing (NLP) using named entity recognition (NER) is challenging (Pathoee et al., 2022; Capuano et al., 2022). In the medical field, staff enter patients' information in computers that store the information in the medical institution’s information system, creating electronic medical records (EMR) (Alomani et al., 2022; Zhan et al., 2022; Zemmouchi-Ghomari, 2021). EMR named entity recognition is an important application and extension of named entity recognition in EMR text analysis (Pareek et al., 2022; Hu et al., 2022; Ismail et al., 2022). Its purpose lies in recognizing and classifying named entities of EMR automatically (Xiao et al., 2021; Lample et al., 2016; Chen et al., 2023; Marrero et al, year). These named entity objects, represented by a decision-making system of clinical information and medical knowledge maps, are used to analyze and study the information of EMR (Yadav & Bethar, 2019; Li et al., 2020; Liu et al., 2021; Güneş & Tantu, 2018; Dong et al., 2016; Wu et al., 2015).

The electronic medical record is mainly used to record a series of important information related to the patient's health status, such as the patient's past medical history, diseases and symptoms, physical examination data, diagnosis opinions and treatment effects (Song et al., 2021; Wu et al., 2017; Yadav & Bethard, 2017; Li et al., 2020). Early research on EMR named entity recognition used dictionary and rule-based methods, only relying on the existing dictionaries and manually edited rules to recognize medical named entities (Shen et al., 2017; Habibi et al., 2017; Ji et al., 2019). In order to fully mine the hidden features and disease associations in patient diagnosis and treatment data, efficient and accurate NER is necessary. Although there has been significant research on named entity recognition for EMR, there are relatively few studies on EMR in China (Yu et al., 2019). The complex structure of the Chinese language makes recording texts of EMR in Chinese challenging, due to the characteristics of many special words, nonstandard language structure, serious entity nesting, and fuzzy Chinese word boundaries. Using a model of named entity recognition based on tradition, it is hard to achieve satisfactory classification results.

The NER methods based on deep neural networks often have problems with entity recognition processes due to a lack of local context features, single word vector representation and low entity recognition accuracy. Regarding these problems, the author proposes a method of recognition with named entity for EMR with model fusion of BERT. Through the BERT pre-trained model, the preceding and succeeding contextual information is fused to enhance semantic representation. The sequence feature matrix is obtained by using the BiLSTM network, and the global optimal sequence is generated by the conditional random field model. The problem of category imbalance is alleviated by data enhancement.

The paper identifies the related work in this field, describes in detail the suggested technical scheme (the named entity recognition method of medical electronic medical record that utilizes the fusion of Bert and model), and describes the experiments designed to confirm the effectiveness of the suggested method.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing