Article Preview
Top1. Introduction
With the lot of focus towards the search engines in the recent days, it becomes necessary to improve the efficiency of the search engines and its related applications. One such application is summary generation which aids in giving a quick overview of a large document. It is necessary to identify the semantic relation between the text fragments present in the document, to extract the lines that are semantically relevant to the user query from the document to form a summary. This paper proposes one such approach in which a summary generator is constructed on top of a language-independent discourse structure built using three semantic text representation techniques, namely, Universal Networking Language (UNL), Rhetorical Structure Theory (RST) and saṅgatis.
The language- independent discourse parser proposed by (Krishnan and Parthasarathi, 2014) which uses UNL and RST is extended in this paper to use saṅgatis as well. saṅgatis are identical to RST based discourse relations which has been used in ancient Sanskrit literature (Charya, 1989). A clause and sentence level discourse parser using UNL, RST and saṅgatis has been proposed by (Subalalitha & Parthasarathi, 2012). The proposed UNL-RST- saṅgati discourse parser can construct discourse structure at clause, sentence, paragraph and document levels. The discourse structure constructed by UNL-RST- saṅgati discourse parser is indexed by a concept called sūtra which has been used in both Tamil and Sanskrit literatures Subalalitha & Ranjani (2014). In this paper, the factors used in the current version of sūtra construction have been modified and it is tested with a query focused summary generation system.
To sum up, this paper puts forth the following three main contributions:
- 1.
Construction of summary generation system that uses unique UNL-RST- saṅgati discourse structure and a unique discourse structure indexing technique based on sūtra;
- 2.
Extension of the language independent UNL-RST- saṅgati discourse parser that currently handles text at clause, sentence to handle text at paragraph and document levels;
- 3.
Modification of factors involved in the existing sūtra based indexing technique.
The rest of the paper is organized as follows. Section 2 gives the overview of saṅgatis, RST and UNL. Section 3 describes the existing works on text level discourse parsing and indexing technique for discourse structures. Section 4 illustrates the proposed work. Section 5 gives the details of evaluation and section 6 gives the conclusion of the work proposed.