Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Exploiting Semantic Term Relations in Text Summarization

Kamal Sarkar, Santanu Dam

Source Title: International Journal of Information Retrieval Research (IJIRR) 12(1)

DOI: 10.4018/IJIRR.289607

Article PDF Download Open access articles are freely available for download

Abstract

The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences. In this approach, TF or term frequency is calculated based on how frequently a term (word) occurs in the input and TF calculated in this way does not take into account the semantic relations among terms. In this paper, we propose methods that exploits semantic term relations for improving sentence ranking and redundancy removal steps of a summarization system. Our proposed summarization system has been tested on DUC 2003 and DUC 2004 benchmark multi-document summarization datasets. The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations. Our multi-document text summarizer also outperforms some well known summarization baselines to which it is compared.

Article Preview

Top

Introduction

Information overload is a critical problem on the Internet. For managing information load on the Internet, one of the most effective mechanisms is text summarization. Text summarization reduces the input document(s) to a summary which is a condensed version of the input. Summarization can help users in many ways-(1) it enables readers to quickly understand what the document(s) is about, (2) summaries can be presented with the search results to help users in understanding whether the linked documents are relevant or not, (3) bandwidth can be saved if instead of sending the whole document, a summary is sent first to the small screen devices. Summaries are also useful in many other applications such as text clustering and classification. Though many researchers have been trying to find solutions to text summarization problem for the last many years, there is further scope for doing research for finding better solution to the problem. The reason is that text summarization is a human ability which is very difficult to model.

According to the previous research works (Goldstein et al., 2000; Gupta & Siddiqui, 2012; Sarkar, 2014; Sarkar, 2009a), summary can be of two types: extract and abstract. An extract is a summary created by selecting text segments (sentences) from the input whereas an abstract is a summary which is created by reformulation of text segments selected from the summary. An abstract may contain some words which are not present in the input. The most existing abstractive summarization methods deal with generating very short or ultra-summary (Sarkar & Bandyopadhyay, 2005; Zajic, Dorr & Schwart, 2002; Rush, Chopra & Weston, 2015; Nallapati et al.,2016; Nallapati, Zhai, & Zhou, 2017). In this paper, we focus on generating extractive multi-document summaries. Multi-document summaries are relatively longer than very short summaries.

The most previous works on extraction based summarization are sentence ranking based. The sentence ranking based approach ranks sentences based on scores where score of a sentence is calculated by combining various feature based scores such as frequency of terms, sentence position and/or cue phrases (Luhn, 1959; Sarkar, 2009b; Sarkar, Nasipuri & Ghosh, 2011). After ranking sentences, the top n sentences are chosen based on the compression ratio.

Centroid based summarization approach (Radev et al., 2004) is also an extractive approach to multi-document summarization that ranks sentences based on their similarities to the centroid which is created by choosing a set of most important words from the input cluster of documents. Here word importance is measured by the TF*IDF weight. In this approach, calculation of TF or term frequency is based on how frequently a term (word) occurs in the input.

Not only the centroid-based approach, but many other summarization approaches also use term frequency calculation which considers syntactic term matching without taking into account the semantic relations among terms. Due to this problem, the traditional TF*IDF based sentence-ranking approach places some summary worthy sentences in the rank positions far from the top ranked sentences and so, those sentences are not selected into the summary due to a predefined summary length restriction.

Since the input to a multi-document summarizer is a set of related documents, a multi-document summary may contain redundancy. Redundancy is one of the crucial factors in multi-document summarization because redundant information in a summary makes the summary less informative. Maximal marginal relevance is a popular technique for redundancy removal while selecting top n sentences for creating an extract (Carbonell & Goldstein, 1998). Maximal marginal relevance technique uses cosine similarity between sentences for identifying similar sentences where each sentence is represented using TF*IDF based Bag-of-words model. Hence term mismatch problem leads to data sparseness problem which affects redundancy removal performance.

So, other than above mentioned cases, there are a number of existing extractive summarization methods (mentioned in the subsequent section) that use TF*IDF based Bag-of-Words model for text representation where a term weight is calculated using the traditional TF*IDF method.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Exploiting Semantic Term Relations in Text Summarization

Abstract

Introduction

Complete Article List