NewSum: “N-Gram Graph”-Based Summarization in the Real World

NewSum: “N-Gram Graph”-Based Summarization in the Real World

George Giannakopoulos (NCSR “Demokritos”, Greece & SciFY Not-for-Profit Company, Greece), George Kiomourtzis (SciFY Not-for-Profit Company, Greece & NCSR “Demokritos”, Greece) and Vangelis Karkaletsis (NCSR “Demokritos”, Greece)
DOI: 10.4018/978-1-4666-5019-0.ch009
OnDemand PDF Download:


This chapter describes a real, multi-document, multilingual news summarization application, named NewSum, the research problems behind it, as well as the novel methods proposed and tested to solve these problems. The system uses the representation of n-gram graphs in a novel manner to perform sentence selection and redundancy removal for the summaries and faces problems related to topic and subtopic detection (via clustering) and multi-lingual applicability, which are caused by the nature of the real-world news summarization sources.
Chapter Preview


Automatic summarization has been under research since the late 50's (Luhn, 1958) and has tackled a variety of interesting real-world problems. The problems faced range from news summarization (Barzilay & McKeown, 2005; Huang, Wan, & Xiao, 2013; Kabadjov, Atkinson, Steinberger, Steinberger, & Goot, 2010; D. Radev, Otterbacher, Winkel, & Blair-Goldensohn, 2005; Wu & Liu, 2003) to scientific summarization (Baralis & Fiori, 2010; Teufel & Moens, 2002; Yeloglu, Milios, & Zincir-Heywood, 2011) and meeting summarization (Erol, Lee, Hull, Center, & Menlo Park, 2003; Niekrasz, Purver, Dowding, & Peters, 2005).

The significant increase in the rate of content creation due to the Internet and its social media aspect, moved automatic summarization research to a multi-document requirement, taking into account the redundancy of information across sources (Afantenos, Doura, Kapellou, & Karkaletsis, 2004; Barzilay & McKeown, 2005; J. M Conroy, Schlesinger, & Stewart, 2005; Erkan & Radev, 2004; Farzindar & Lapalme, 2003). Recently, the fact that the content generated by people around the world is clearly multilingual, has urged research to revisiting summarization under a multilingual prism (Evans, Klavans, & McKeown, 2004; Giannakopoulos et al., 2011; Saggion, 2006; Turchi, Steinberger, Kabadjov, & Steinberger, 2010; Wan, Jia, Huang, & Xiao, 2011).

However, this volume of summarization research does not appear to have reached a wider audience, possibly based on the evaluated performance of automatic systems, which consistently perform worse than humans (John M Conroy & Dang, 2008; Hoa Trang Dang & Owczarzak, 2009; Giannakopoulos et al., 2011).

In this chapter, we show how a novel, multilingual multi-document news summarization method, without the need for training, can be used as an everyday tool. We show how we designed and implemented an automatic summarization solution, named NewSum, which summarizes news from a variety of sources, using language-agnostic methods. We describe the requirements studied during the design and implementation of NewSum, how these requirements were met and how people evaluated the outcome of the effort.

Our main contributions in this chapter are, thus, as follows:

  • We briefly study the requirements of a real-world summarization application, named NewSum. We describe task-aware specifications based on user and application context limitations (e.g. device, communication), source limitations and legal limitations.

  • We describe a generic, language-agnostic method for extractive summarization, taking into account redundancy constraints. The method needs no training and minimizes the effort of crossing language boundaries, since it functions at the character level.

  • We describe an open architecture for responsive summarization on a mobile setting.

  • We provide an evaluation of the system based on non-expert evaluations, to represent market applicability of the system.

In the following section we provide some background on automatic summarization to sketch the related summarization research.



In this section, we briefly discuss summarization methods and systems that have been available as either research efforts, but also as real applications. We refer to the projects that aim at summarization and sketch the current state-of-the-art of the summarization sub-domains of salience detection and redundancy removal.

Complete Chapter List

Search this Book:
Editorial Advisory Board and List of Reviewers
Table of Contents
Elena Baralis
Alessandro Fiori
Alessandro Fiori
Chapter 1
Sean Sovine, Hyoil Han
Modern information technology allows text information to be produced and disseminated at a very rapid pace. This situation leads to the problem of... Sample PDF
Classification of Sentence Ranking Methods for Multi-Document Summarization
Chapter 2
Uri Mirchev, Mark Last
Automatic multi-document summarization is aimed at recognizing important text content in a collection of topic-related documents and representing it... Sample PDF
Multi-Document Summarization by Extended Graph Text Representation and Importance Refinement
Chapter 3
Marina Litvak, Natalia Vanetik
The problem of extractive summarization for a collection of documents is defined as the problem of selecting a small subset of sentences so that the... Sample PDF
Efficient Summarization with Polytopes
Chapter 4
Angela Locoro, Massimo Ancona
Understanding and describing past or present societies is a complex task, as it involves a multi-faceted analysis of the norms, interactions, and... Sample PDF
Interactive Summaries by Multi-Pole Information Extraction for the Archaeological Domain
Chapter 5
Paulo Cesar Fernandes de Oliveira
Summary evaluation is a challenging issue. It is subjective, costly, time consuming, and, if it is human-assisted, can generate some bias. Due to... Sample PDF
Evaluation Metrics for the Summarization Task
Chapter 6
Atefeh Farzindar
In this chapter, the author presents the new role of summarization in the dynamic network of social media and its importance in semantic analysis of... Sample PDF
Social Network Integration in Document Summarization
Chapter 7
William Darling
This chapter discusses approaches to applying text summarization research to the real-world problem of opinion summarization of user comments.... Sample PDF
Approaches to Large-Scale User Opinion Summarization for the Web
Chapter 8
Giuliano Armano, Alessandro Giuliani
Recently, there has been a renewed interest on automatic text summarization techniques. The Internet has caused a continuous growth of information... Sample PDF
Novel Text Summarization Techniques for Contextual Advertising
Chapter 9
George Giannakopoulos, George Kiomourtzis, Vangelis Karkaletsis
This chapter describes a real, multi-document, multilingual news summarization application, named NewSum, the research problems behind it, as well... Sample PDF
NewSum: “N-Gram Graph”-Based Summarization in the Real World
Chapter 10
Bettina Berendt, Mark Last, Ilija Subašić, Mathias Verbeke
News production, delivery, and consumption are increasing in ubiquity and speed, spreading over more software and hardware platforms, in particular... Sample PDF
New Formats and Interfaces for Multi-Document News Summarization and its Evaluation
Chapter 11
Kamal Sarkar
As the amount of on-line information in the languages other than English (such as Chinese, Japanese, German, French, Hindi, etc.) increases, systems... Sample PDF
Multilingual Summarization Approaches
Chapter 12
Josef Steinberger, Ralf Steinberger, Hristo Tanev, Vanni Zavarella, Marco Turchi
In this chapter, the authors discuss several pertinent aspects of an automatic system that generates summaries in multiple languages for sets of... Sample PDF
Aspects of Multilingual News Summarisation
Chapter 13
Firas Hmida
In this chapter, the authors introduce monolingual and multilingual summarization and present the problem of dependence of language and linguistic... Sample PDF
Language Independent Summarization Approaches
About the Contributors