Building Text Summary Generation System Using Universal Networking Language, Rhetorical Structure Theory, Sangatis and Sutra: Summary Generation Using Discourse Structures

Building Text Summary Generation System Using Universal Networking Language, Rhetorical Structure Theory, Sangatis and Sutra: Summary Generation Using Discourse Structures

Subalalitha C. N. (SRM Institute of Science and Technology, India)
Copyright: © 2020 |Pages: 22
DOI: 10.4018/978-1-7998-1021-6.ch006

Abstract

This chapter discusses how text summaries could be generated by using a high-level semantic representation. The semantic representation is built using the discourse structure which is comprised of three text representation techniques, namely, universal networking language (UNL), rhetorical structure theory (RST), and Saṅgatis. Sangati is an ancient concept that is used in Sanskrit language literature to capture coherence. This discourse structure is indexed using a concept called sūtra which has been used in both Tamil language and Sanskrit literatures. The chapter mainly focusses on how summary could be generated by using this unique discourse structure and the indexing technique concept, sūtra. Forum for information retreival (FIRE) corpus has been used to test the system and a performance comparison has been done with the one of the state-of-art summary generation systems that is built on discourse structure.
Chapter Preview
Top

Background

Universal Networking Language

The UNL expresses information and knowledge present in an NL text in the form of a semantic network which is represented as a directed graph (Uchida et al., 1999). The UNL graph is composed of Universal Words (UWs) and UNL relations. The UWs indicate the conceptual representation of an NL word and they constitute the UNL vocabulary which in turn consists of components such as head word, semantic constraints and UNL attributes. The head word represents the concept of an NL word in English and the semantic constraint. A headword of a UW is an English expression, a word, a compound word, a phase or a sentence in English. The semantic constraint restricts the interpretation of a UW to a specific concept and the UNL attributes are normally used to represent information conveyed by natural language grammatical categories, such as tense, mood, aspect, number, etc. In order to identify these components, resources such as UNL Knowledge Base (UNL KB) and UW dictionary are mainly required. Figure 1 shows the example of UNL graph representation for an English sentence given in Example 1.

  • Example 1: He won the race

Figure 1.

UNL graph for Example 1

978-1-7998-1021-6.ch006.f01

Complete Chapter List

Search this Book:
Reset