Article Preview
TopIntroduction
With the advancements of the Free Access to Law Movement across the world (Greenleaf, 2011), legal information becomes more and more easily available for lay people, legal professionals and law makers. Portals and databases, including but not limited to China Judgements Online (http://www.pkulaw.cn/) and Westlaw, can provide users with thousands of judicial cases, court decisions and regulations.
Meanwhile, text-based legal information processing has always been a heated topic within the field of artificial intelligence and law (Bench-Capon et al. 2012). Researchers pay much attention to such issues as legal information retrieval (Rissland & Daniels, 1995; van Opijnen & Santos, 2017), legal information extraction (Moens et al., 1999; Webber et al., 2005), Online Dispute Resolution (Carneiro et al., 2014; Zeleznikow, 2017) and eMediation (Jelali et al., 2015). From legal texts, litigants can find out possible outcomes of an impending trial and evaluate their potential risks; attorneys can foretell what counterclaims will probably be put forward by opposing counsels; judges can have a good understanding of other judges’ inferences and attitudes in similar cases, especially when they are faced with high-profile, complicated and difficult cases.
However, as the exponential increase of legal texts in the Internet age incurs an overload of information (Koniaris et al., 2017; Opijnen & Santos, 2017), useful information has to be extracted through Text Mining technique (Andrade & Santos, 2017). Although text classification can help to exclude legal texts irrelevant to users’ queries (Ashley & Brueninghaus, 2009), redundant information is still included in the remaining texts, as users might just need some part of a text. Therefore, to enhance user satisfaction, legal texts should be processed in a more fine-grained way.
To this end, techniques from several disciplines like linguistics, machine learning and Natural Language Processing (NLP) can be exploited. Among all these, discourse analysis may offer some special insight, because discourse is not a random combination of sentences or words. Moens et al. (1999) made rules through discourse analysis and built the SALOMON system to abstract Belgian criminal cases. Kunc et al. (2013) believed that knowledge of discourse structure in dialogue could help to optimize human-computer interaction. Taboada et al. (2009) demonstrated that the accuracy of sentiment analysis could be boosted if discourse structure was taken into account. Lin et al. (2005) even used discourse-based features such as cue phrases to segment lecture videos which are a kind of multimodal discourse. All these studies manifest the usefulness of discursive factors in text processing.
In this paper, the author addresses the problem of identifying information structure of Chinese legal texts automatically by proposing a discourse analysis based approach. The analysis is made within the framework of Discourse Information Theory (Du, 2014).