Deep Learning Approach for Extracting Catch Phrases from Legal Documents

Deep Learning Approach for Extracting Catch Phrases from Legal Documents

Kayalvizhi S. (SSN College of Engineering, India) and Thenmozhi D. (SSN College of Engineering, India)
Copyright: © 2020 |Pages: 16
DOI: 10.4018/978-1-7998-1159-6.ch009

Abstract

Catch phrases are the important phrases that precisely explain the document. They represent the context of the whole document. They can also be used to retrieve relevant prior cases by the judges and lawyers for assuring justice in the domain of law. Currently, catch phrases are extracted using statistical methods, machine learning techniques, and deep learning techniques. The authors propose a sequence to sequence (Seq2Seq) deep neural network to extract catch phrases from legal documents. They have employed several layers, namely embedding layer, encoder-decoder layer, projection layer, and loss layer to build the deep neural network. The methodology is evaluated on IRLeD@FIRE-2017 dataset and the method has obtained 0.787 and 0.607 as mean average precision and recall scores respectively. Results show that the proposed method outperforms the existing systems.
Chapter Preview
Top

Background

Natural Language Processing

Text processing refers to manipulation of text in an automated way. Manipulation of text can be anything like correcting errors, analyzing the mood, classifying the e-mails, tagging the important terms, summarizing the content, answering the questions, correcting the answers, translating the sentences, etc., Natural Language Processing (NLP) is concerned with making the computer to perform these text manipulation scenarios in natural language. The basic steps in NLP of text includes sentence segmentation, word tokenization, predicting Parts-of-Speech (POS), text lemmatization, cleaning the text, dependency parsing, named entity recognition. Segmentation refers to splitting up of sentences into words or meaningful phrases. Word tokenization is the process of making the sentences into tokens. Predicting the POS is the step where the parts of speech like nouns, adjectives, verbs, etc. are guessed. In lemmatization, the words of different forms like tenses, plurals are all made in into a single form of word. Cleaning the text refers to the removal of stop words and unwanted punctuation marks. In dependency parsing, a single parent will be selected for the depended words in the sentence. Named Entity Recognition is the process of labeling the noun phrases with named entities such as location, person, place, etc.

Considering a scenario of sentiment analysis of the text data, there are many approaches and classification methods. A typical machine learning approach is done by a steps which includes the above NLP phases of segmentation, tokenization, predicting POS, lemmatization, parsing and then vectorizing the text by any of the methods which includes Bag Of Words, word embeddings, glove, etc. and finally classifying by using classifiers.

Key Terms in this Chapter

LSTM: Long Short Term Memory is a type of Recurrent Neural Networks (RNN) which is capable of learning long term dependencies from the sequence of terms.

Catch Phrases: The keywords that precisely describe the whole document in legal domain.

Convolutional Neural Network: A neural network with a convolutional layer which does the mathematical operation of convolution in addition to the other layers of deep neural network.

Deep Neural Network: Neural Network with more than two layers in depth is known as deep neural network.

Deep Learning: A sub-field of machine learning which is based on the algorithms and layers of artificial networks.

Information Extraction: Automated retrieval of needed information from the whole documents, databases, etc.

Recurrent Neural Network: Deep neural network with recursive operation of giving the output of previous as input for next state so that the inputs and outputs are all dependent to each other.

Machine Learning: A field of study of algorithms and statistical methods that allows software application to predict the accurate result.

Neural Network: Fully connected network with minimum of three layers namely input layer, output layer and hidden layer.

Complete Chapter List

Search this Book:
Reset