Named Entity Recognition Method of Chinese Legal Documents Based on Parallel Instance Query Network

Named Entity Recognition Method of Chinese Legal Documents Based on Parallel Instance Query Network

Rui Lu (Liaoning Police College, China) and Linying Li (Dalian University of Foreign Languages, China)
Copyright: © 2024 |Pages: 19
DOI: 10.4018/IJDCF.367470
Article PDF Download
Open access articles are freely available for download

Abstract

Legal Named Entity Recognition (NER) is crucial in intelligent judiciary systems, focusing on identifying case-specific entities in legal texts. It helps convert unstructured legal documents into structured data, improving e-discovery efficiency. However, challenges arise from insufficient understanding of legal terminology, leading to errors in identifying long and nested entity boundaries. To address this, a Legal NER method based on a parallel instance query network is proposed. This method uses learnable instance queries to extract entities in parallel, with a BERT+BiLSTM+attention structure to encode context and query information. Entity prediction is performed using a pointer network to identify span boundaries and entity types. A linear label assignment mechanism aligns legal entities with queries for more accurate labeling. Experimental results show that the model outperforms existing methods, and further validation through ablation experiments and case studies supports its effectiveness, offering valuable insights for advancing legal NER research.
Article Preview
Top

Introduction

Named entity recognition (NER) is a fundamental task in natural language processing (NLP), aimed at identifying general entities such as person, time, and location from text. In the judicial domain, legal named entity recognition (LNER) is a specialized task that focuses on case-specific entities closely related to legal proceedings. These entities typically consist of terms or phrases that hold significant meaning within the legal context, such as “suspect” and “victim,” which are subtypes of the “person” entity commonly identified in general NER. With advancements in NLP, extracting named entities from vast and unstructured legal texts has become a critical task for constructing legal knowledge graphs and developing intelligent justice systems (Correia et al., 2021; Guo et al., 2021). Additionally, LNER plays a foundational role in downstream tasks, such as judicial summarization, question answering, and case recommendation. However, the presence of specialized terminology, unclear entity boundaries, long entities, and nested entities in legal texts poses significant challenges. Most existing NER models struggle to effectively address these issues, resulting in suboptimal performance in legal entity recognition (Shen et al., 2022). Chinese legal texts, in particular, pose additional challenges due to the frequent occurrence of multi-word phrases or lengthy noun entities. This complexity complicates word segmentation, as Chinese lacks spaces between words, unlike English. Moreover, legal texts often contain nested entities, where one entity is embedded within another. For instance, in the phrase “a gold ring from the victim Lin’s home,” the entity “a gold ring” (stolen item) is nested within “the victim Lin’s home” (location), which is further nested within “Lin” (victim). General NER methods may correctly identify “a gold ring” as a stolen item but may fail to recognize nested entities like “victim” or “location,” resulting in incomplete recognition and a limited understanding of the relationships between entities such as location, person, and stolen item.

LNER faces the following main challenges:

  • Due to the specificity of the legal domain, legal documents contain long entities and nested entities. Long entities are composed of multiple nouns or phrases, which complicates their segmentation. Nested entities, on the other hand, have multi-layer structures where entity boundaries intertwine and overlap, making their recognition particularly challenging.

  • General NER methods primarily predict entity labels based on context, often overlooking the semantic relationships between the textual context and entity label types. While the machine reading comprehension (MRC) approach addresses some of these limitations, it is inefficient as it can only identify one entity type per inference. Furthermore, the quality of manually constructed queries in this approach can vary significantly, further affecting its accuracy.

To comprehensively address the issues mentioned above, this paper introduces a LNER method designed specifically for recognizing entities in Chinese legal documents. The method is based on the parallel instance query network-NER (PIQN-NER), which uses trainable queries to replace the fixed queries in MRC and extract entities simultaneously. Unlike previous methods, these queries can be constructed in advance without relying on external knowledge. A linear label assignment mechanism is employed to align gold entities with the instance queries. First, PIQN-NER fine-tunes bidirectional encoder representation from transformers (BERT) to encode character sequences. Then, a bidirectional long short-term memory (BiLSTM) combined with an attention mechanism is applied to assign different attention weights to both the context and instance queries, which improves the model's ability to correctly determine entity boundaries. Finally, the entity prediction component leverages a pointer network to capture both the span boundaries and types of legal entities. Experiments demonstrate that the proposed method outperforms related methods when applied to legal datasets.

The main contributions of this work are as follows:

Complete Article List

Search this Journal:
Reset
Volume 17: 1 Issue (2025)
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 3 Issues (2022)
Volume 13: 6 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing