Article Preview
TopIntroduction
The of natural language processing (NLP) using named entity recognition (NER) is challenging (Pathoee et al., 2022; Capuano et al., 2022). In the medical field, staff enter patients' information in computers that store the information in the medical institution’s information system, creating electronic medical records (EMR) (Alomani et al., 2022; Zhan et al., 2022; Zemmouchi-Ghomari, 2021). EMR named entity recognition is an important application and extension of named entity recognition in EMR text analysis (Pareek et al., 2022; Hu et al., 2022; Ismail et al., 2022). Its purpose lies in recognizing and classifying named entities of EMR automatically (Xiao et al., 2021; Lample et al., 2016; Chen et al., 2023; Marrero et al, year). These named entity objects, represented by a decision-making system of clinical information and medical knowledge maps, are used to analyze and study the information of EMR (Yadav & Bethar, 2019; Li et al., 2020; Liu et al., 2021; Güneş & Tantu, 2018; Dong et al., 2016; Wu et al., 2015).
The electronic medical record is mainly used to record a series of important information related to the patient's health status, such as the patient's past medical history, diseases and symptoms, physical examination data, diagnosis opinions and treatment effects (Song et al., 2021; Wu et al., 2017; Yadav & Bethard, 2017; Li et al., 2020). Early research on EMR named entity recognition used dictionary and rule-based methods, only relying on the existing dictionaries and manually edited rules to recognize medical named entities (Shen et al., 2017; Habibi et al., 2017; Ji et al., 2019). In order to fully mine the hidden features and disease associations in patient diagnosis and treatment data, efficient and accurate NER is necessary. Although there has been significant research on named entity recognition for EMR, there are relatively few studies on EMR in China (Yu et al., 2019). The complex structure of the Chinese language makes recording texts of EMR in Chinese challenging, due to the characteristics of many special words, nonstandard language structure, serious entity nesting, and fuzzy Chinese word boundaries. Using a model of named entity recognition based on tradition, it is hard to achieve satisfactory classification results.
The NER methods based on deep neural networks often have problems with entity recognition processes due to a lack of local context features, single word vector representation and low entity recognition accuracy. Regarding these problems, the author proposes a method of recognition with named entity for EMR with model fusion of BERT. Through the BERT pre-trained model, the preceding and succeeding contextual information is fused to enhance semantic representation. The sequence feature matrix is obtained by using the BiLSTM network, and the global optimal sequence is generated by the conditional random field model. The problem of category imbalance is alleviated by data enhancement.
The paper identifies the related work in this field, describes in detail the suggested technical scheme (the named entity recognition method of medical electronic medical record that utilizes the fusion of Bert and model), and describes the experiments designed to confirm the effectiveness of the suggested method.