Information Structure Parsing for Chinese Legal Texts: A Discourse Analysis Perspective

Information Structure Parsing for Chinese Legal Texts: A Discourse Analysis Perspective

Bo Sun (Hefei Normal University, Hefei, China)
Copyright: © 2019 |Pages: 19
DOI: 10.4018/IJTHI.2019010104
OnDemand PDF Download:
No Current Special Offers


Information processing is one of the main concerns in the field of artificial intelligence, because it can benefit many related downstream tasks. To facilitate information processing, information structure parsing is assumed to be of great significance. This article proposes a discourse analysis based approach so that information structure of Chinese legal texts can be recognized automatically. This article employs Discourse Information Theory to explore information features of Chinese legal texts. The texts used in this study include 6 types, each type containing 60 training texts and 30 testing texts. After that, a set of rules is formulated to classify legal texts and identify the categories of information units. Finally, to examine the performance of the rules, a comparison is made by designing a Support Vector Machine classifier and a Viterbi algorithm decoder. The experiment demonstrates that the rule based approach outperforms the statistics based approaches. This research suggests that discourse analysis may provide some linguistic features conducive to discourse parsing.
Article Preview


With the advancements of the Free Access to Law Movement across the world (Greenleaf, 2011), legal information becomes more and more easily available for lay people, legal professionals and law makers. Portals and databases, including but not limited to China Judgements Online ( and Westlaw, can provide users with thousands of judicial cases, court decisions and regulations.

Meanwhile, text-based legal information processing has always been a heated topic within the field of artificial intelligence and law (Bench-Capon et al. 2012). Researchers pay much attention to such issues as legal information retrieval (Rissland & Daniels, 1995; van Opijnen & Santos, 2017), legal information extraction (Moens et al., 1999; Webber et al., 2005), Online Dispute Resolution (Carneiro et al., 2014; Zeleznikow, 2017) and eMediation (Jelali et al., 2015). From legal texts, litigants can find out possible outcomes of an impending trial and evaluate their potential risks; attorneys can foretell what counterclaims will probably be put forward by opposing counsels; judges can have a good understanding of other judges’ inferences and attitudes in similar cases, especially when they are faced with high-profile, complicated and difficult cases.

However, as the exponential increase of legal texts in the Internet age incurs an overload of information (Koniaris et al., 2017; Opijnen & Santos, 2017), useful information has to be extracted through Text Mining technique (Andrade & Santos, 2017). Although text classification can help to exclude legal texts irrelevant to users’ queries (Ashley & Brueninghaus, 2009), redundant information is still included in the remaining texts, as users might just need some part of a text. Therefore, to enhance user satisfaction, legal texts should be processed in a more fine-grained way.

To this end, techniques from several disciplines like linguistics, machine learning and Natural Language Processing (NLP) can be exploited. Among all these, discourse analysis may offer some special insight, because discourse is not a random combination of sentences or words. Moens et al. (1999) made rules through discourse analysis and built the SALOMON system to abstract Belgian criminal cases. Kunc et al. (2013) believed that knowledge of discourse structure in dialogue could help to optimize human-computer interaction. Taboada et al. (2009) demonstrated that the accuracy of sentiment analysis could be boosted if discourse structure was taken into account. Lin et al. (2005) even used discourse-based features such as cue phrases to segment lecture videos which are a kind of multimodal discourse. All these studies manifest the usefulness of discursive factors in text processing.

In this paper, the author addresses the problem of identifying information structure of Chinese legal texts automatically by proposing a discourse analysis based approach. The analysis is made within the framework of Discourse Information Theory (Du, 2014).

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 18: 4 Issues (2022): Forthcoming, Available for Pre-Order
Volume 17: 4 Issues (2021): 3 Released, 1 Forthcoming
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing