Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Named Entity Recognition in Document Summarization

Sandhya P., Mahek Laxmikant Kantesaria

Source Title: Trends and Applications of Text Summarization Techniques

DOI: 10.4018/978-1-5225-9373-7.ch005

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Named entity recognition (NER) is a subtask of the information extraction. NER system reads the text and highlights the entities. NER will separate different entities according to the project. NER is the process of two steps. The steps are detection of names and classifications of them. The first step is further divided into the segmentation. The second step will consist to choose an ontology which will organize the things categorically. Document summarization is also called automatic summarization. It is a process in which the text document with the help of software will create a summary by selecting the important points of the original text. In this chapter, the authors explain how document summarization is performed using named entity recognition. They discuss about the different types of summarization techniques. They also discuss about how NER works and its applications. The libraries available for NER-based information extraction are explained. They finally explain how NER is applied into document summarization.

Chapter Preview

Top

Introduction

Named-entity Recognition (NER) is the process in which the entities are extracted for searching, sorting and storing textual information into the categories such as names of organizations, places, persons, expressions of time, quantities or any other measurable quantity. NER system extracts from the plain text in English language or in any other language. NER is also called as entity extraction or entity identification. NER finds the entities from the raw and unstructured data and then define them into different categories. NER reacts differently with different systems. Hence output of one project may not be the same as the output of another project. Although the required outputs of two different systems will be different.

NER is the subtask of the information extraction. It is also a significant component of natural language processing applications. Part-of-Speech tagging, semantic parsers and thematic meaning representations will all outperform when NER is integrated. NER plays a vital role in systems like question answers system, textual entailment, automatic forwarding and news and document searching. NER provides proper and good analytical results. NER is carried out based on different learning methods according to the systems it is being used in. There are three learning methods: Supervised Learning (SL), unsupervised learning (UL) and semi-supervised learning (SSL) (Sekine & Ranchhod, 2007). Supervised learning needs a large dataset. As there is shortage of such datasets, the other two methods are preferred over supervised learning.

Document summarization is a process by which the text is automatically condensed to a summary with the most important information. In general for a human it is required to read the documents and then summarize it. Hence we can extract vital information, we can use them in the use cases such as; dates from feedback system, famous product or model of an item and reviews about the locations. There are many ways to identify the phrases from the text. The simplest method for text identification is by using the dictionary of words.

NER can also be used to process the document. It will extract the words, which are called as entities. These entities will be categorized like persons, organizations, places, time and measurement, and many more. The most important words will then be selected. These words would work as summary for the given document.

In this chapter we explain how document summarization is performed using Named Entity Recognition. First, we discuss about the Named-entity recognition. Then we explain document summarization. The evaluation techniques for text summarization are explained. We then explain how NER works practically with its applications. Then we have mentioned about applying NER to document summarization and issues with it. Then recent advances are explained.

Key Terms in this Chapter

Abstraction-Based Summary: Abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might express.

Document Summarization: Automatic summarization is the process of shortening a text document with software, to create a summary with the major points of the original document.

ROUGE: ROUGE, or recall-oriented understudy for gisting evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing.

Natural Language Processing: Natural language processing is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human languages, how to program computers to process and analyze large amounts of natural language data.

Information Extraction: Information extraction is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents.

Named-Entity Recognition: Named-entity recognition is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Extraction-Based Summarization: In extraction-based summarization an extract is constructed by selecting pieces of text (words, phrases, sentences, paragraphs) from the original source and organizing them in a way to produce a coherent summary.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Named Entity Recognition in Document Summarization

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List