Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Multi-Document Summarization by Extended Graph Text Representation and Importance Refinement

Uri Mirchev, Mark Last

Source Title: Innovative Document Summarization Techniques: Revolutionizing Knowledge Understanding

DOI: 10.4018/978-1-4666-5019-0.ch002

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Automatic multi-document summarization is aimed at recognizing important text content in a collection of topic-related documents and representing it in the form of a short abstract or extract. This chapter presents a novel approach to the multi-document summarization problem, focusing on the generic summarization task. The proposed SentRel (Sentence Relations) multi-document summarization algorithm assigns importance scores to documents and sentences in a collection based on two aspects: static and dynamic. In the static aspect, the significance score is recursively inferred from a novel, tripartite graph representation of the text corpus. In the dynamic aspect, the significance score is continuously refined with respect to the current summary content. The resulting summary is generated in the form of complete sentences exactly as they appear in the summarized documents, ensuring the summary's grammatical correctness. The proposed algorithm is evaluated on the TAC 2011 dataset using DUC 2001 for training and DUC 2004 for parameter tuning. The SentRel ROUGE-1 and ROUGE-2 scores are comparable to state-of-the-art summarization systems, which require a different set of textual entities.

Chapter Preview

Top

1. Introduction

The amount of information on the web is huge and it continues to increase dramatically, causing the effect of data overload. The purpose of multi-document summarization is extracting important information from an input collection of topic-related documents and representing it in a concise and usable form. Since one of the reasons for data overload is the fact that many documents share the same or similar topics, automatic multi-document summarization has drawn much attention in recent years. Text summarization is challenging because of its cognitive nature and interesting because of its practical applications. For example, every day many news websites publish articles discussing the same hot topic of the day. One can read all these articles to achieve the complete understanding of the news topic. Alternatively, multi-document summarization can be used, giving the reader one exhaustive story covering the topic. Summarization can also be applied to information retrieval. We can run a summarizer on a search engine output, generating a unified summary of the information contained in result pages, hence letting the user save the time spent on viewing these pages.

Manual summarization of large document collections is a time-consuming and difficult task, which requires a significant intellectual effort. Therefore, automation of the summarization process is required. McKeown, et al. (2005) conducted experiments to determine whether multi-document summaries measurably improve the user performance and experience. Four groups of users were asked to perform the same fact-gathering tasks by reading online news under different conditions: no summaries at all, single-sentence summaries drawn from one of the articles, automated summaries, and human summaries. The results showed that the quality of submitted reports was significantly better and the user satisfaction was higher using both automated and human multi-document summaries rather than relying on the source documents only.

The automated text summarization area has been extensively explored during the last decade, mostly due to DUC and TAC annual competitions. Thousands of research works have been conducted and published on the subject of multi-document generic summarization. However, despite the significant efforts dedicated to design of novel summarization approaches, the automated summary quality is still far from being perfect. Thus, in the TAC 2011 competition (Text Analysis Conference www.nist.gov/tac) on English dataset, the best summarization system (ID2) achieved performance of 0.46 in terms of ROUGE-1 recall score vs. the upper bound of 0.52 obtained by the topline system based on human summaries.

In this chapter, we offer a fresh look at the summarization process by enhancing the graph representation of a document collection. We also propose that decision about including a sentence in a summary should be influenced by the previously selected sentences. This feature is expressed in the continuous refinement of the sentence importance score. We introduce an algorithm called SentRel (Sentence Relations) for automated summarization of a topic-related document collection. The algorithm copes with the generic summarization task, where the goal is to reflect the most important information described by the input collection. To achieve this goal, the proposed extractive summarization algorithm distills the most relevant sentences from the collection into a short extract, which can be quickly digested by the end-user.

Our summarization approach is based on the mutual reinforcement principle used to compute global importance of the sentences, representing the text corpus as a tripartite graph. In addition, the importance scores of textual entities (i.e. documents and sentences) are iteratively updated by the current summary content. The ordinal and chronological dependencies of sentences in a multi-document summary are calculated beforehand from a training dataset(s) accompanied by gold standard summaries. The SentRel algorithm is greedy, since it iteratively chooses the most important sentences for the summary. The choice of each summary sentence is based on the global information recursively inferred from the tripartite graph, taking into account the partial summary built during the previous iterations.

The goal of the current research is to explore the contribution of the following features:

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Multi-Document Summarization by Extended Graph Text Representation and Importance Refinement

Abstract

1. Introduction

Complete Chapter List