Scaling and Semantically-Enriching Language-Agnostic Summarization

Scaling and Semantically-Enriching Language-Agnostic Summarization

George Giannakopoulos (NCSR Demokritos, Greece & SciFY PNPC, Greece), George Kiomourtzis (SciFY PNPC, Greece & NCSR Demokritos, Greece), Nikiforos Pittaras (NCSR Demokritos, Greece & National and Kapodistrian University of Athens, Greece) and Vangelis Karkaletsis (NCSR Demokritos, Greece)
Copyright: © 2020 |Pages: 49
DOI: 10.4018/978-1-5225-9373-7.ch009

Abstract

This chapter describes the evolution of a real, multi-document, multilingual news summarization methodology and application, named NewSum, the research problems behind it, as well as the steps taken to solve these problems. The system uses the representation of n-gram graphs to perform sentence selection and redundancy removal towards summary generation. In addition, it tackles problems related to topic and subtopic detection (via clustering), demonstrates multi-lingual applicability, and—through recent advances—scalability to big data. Furthermore, recent developments over the algorithm allow it to utilize semantic information to better identify and outline events, so as to offer an overall improvement over the base approach.
Chapter Preview
Top

Introduction

Automatic summarization has been under research since the late 50's (Luhn, 1958) and has tackled a variety of interesting real-world problems. The problems faced range from news summarization (Barzilay & McKeown, 2005; Huang, Wan, & Xiao, 2013; Kabadjov, Atkinson, Steinberger, Steinberger, & Goot, 2010; D. Radev, Otterbacher, Winkel, & Blair-Goldensohn, 2005; Wu & Liu, 2003) to scientific summarization (Baralis & Fiori, 2010; Teufel & Moens, 2002; Yeloglu, Milios, & Zincir-Heywood, 2011) and meeting summarization (Erol, Lee, Hull, Center, & Menlo Park, 2003; Niekrasz, Purver, Dowding, & Peters, 2005). More recently, document summarization has moved on to specific genres and domains, such as (micro-)review summarization (Nguyen, Lauw & Tsaparas, 2015; Gerani, Carenini & Ng, 2019) and financial summarization (Isonuma et al, 2017).

The significant increase in the rate of content creation due to the Internet and its social media aspect, moved automatic summarization research to a multi-document requirement, taking into account the redundancy of information across sources (Afantenos, Doura, Kapellou, & Karkaletsis, 2004; Barzilay & McKeown, 2005; J. M Conroy, Schlesinger, & Stewart, 2005; Erkan & Radev, 2004; Farzindar & Lapalme, 2003). Recently, the fact that the content generated by people around the world is clearly multilingual, has urged research to revisiting summarization under a multilingual prism (Evans, Klavans, & McKeown, 2004; Giannakopoulos et al., 2011; Saggion, 2006; Turchi, Steinberger, Kabadjov, & Steinberger, 2010; Wan, Jia, Huang, & Xiao, 2011).

However, this volume of summarization research does not appear to have reached a wider audience, possibly based on the evaluated performance of automatic systems, which consistently perform worse than humans (John M Conroy & Dang, 2008; Hoa Trang Dang & Owczarzak, 2009; Giannakopoulos et al., 2011). We should note at this point, however, that even summary evaluation itself is a challenging scientific topic (Lloret, Aker & Plaza, 2018).

Key Terms in this Chapter

Summary Evaluation: The process of evaluating a summary.

N-Gram Graph Framework: The set of algorithms applicable on the n-gram graph representation, together with the representation itself, usable as an analysis method and toolkit.

N-Gram Graph: A text representation representing how n-grams co-occur within a given text.

Mobile Application: A software application which runs on a mobile platform (e.g., Android).

Summary: A reductive transformation of a text, keeping as much information as possible.

Multi-Document Summarization: The process of applying summarization to a set of documents to create one representative summary for the whole set.

Multilingual Summarization: The process of applying a summarization algorithm on texts of different languages (possibly not simultaneously).

Complete Chapter List

Search this Book:
Reset