Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Effective Entity Linking and Disambiguation Algorithms for User-Generated Content (UGC)

Senthil Kumar Narayanasamy, Dinakaran Muruganantham

Source Title: Handbook of Research on Contemporary Perspectives on Web-Based Systems

DOI: 10.4018/978-1-5225-5384-7.ch018

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The exponential growth of data emerging out of social media is causing challenges in decision-making systems and poses a critical hindrance in searching for the potential information. The major objective of this chapter is to convert the unstructured data in social media into the meaningful structure format, which in return brings the robustness to the information extraction process. Further, it has the inherent capability to prune for named entities from the unstructured data and store the entities into the knowledge base for important facts. In this chapter, the authors explain the methods to identify all the critical interpretations taken over to find the named entities from Twitter streams and the techniques to proportionally link it with appropriate knowledge sources such as DBpedia.

Chapter Preview

Top

Introduction

The conventional methods followed for information extraction in text documents (as they are structured and well-formed) is totally different with information extraction in social media contents. The social media contents are mostly unstructured and especially ill-formed to extract the information from it. As stated by the authors Laere, Schockaert, Tanasescu, Dhoedt, & Jones (2014) and Giridhar, Abdelzaher, George, & Kaplan (2015, March), it was estimated that the accuracy rate of precision in structured documents is pointing to 89% whereas unstructured documents hold below 64%. To culminate this difference, several approaches have been discussed and techniques were proposed to boost the precision and recall rate of unstructured documents as given by Lee, Ganti, Srivatsa, & Li (2014, December) and Imran, Castillo, Diaz, & Vieweg (2015); but still, problems persist and pertaining in many situations. In order to streamline the accuracy rate over precision and recall, we have here proposed some methods to augment the precision and use new strategies to overcome the impeding difficulties.

To start with the extraction process, the principal task is to find the potential named entities out from the unstructured text. In our case, we have taken Twitter social media content and identified the named entities from its streams. But the objectivity comes when we deal with real world entities which have been mapped with one-to-many cardinality over knowledge sources and pinches in for the major setbacks for further processes. Besides as the tweets are very short and most of the instances informal in nature, finding potential named entities out of tweet is a crucial task for any automated systems. This sort of ambiguity conundrum is very high in information retrieval context and yields huge difficulties to Named Entity Recognition (NER) systems. To conduct entity identification process, we have used the Markov Network (Lee et al., 2014, December), that was deployed for many conventional information extraction tasks and yielded high accuracy rate. In our cases as we have taken Twitter social media streams, the entities were represented with nodes and the edges will get connected between the conditional dependencies over selected named entities. If we dig deep closer to this whole network, it would almost resemble to Bayesian Network except the fact that edges were cyclic and undirected. For any document, the entity is appropriately mapped with its sheer interpretation of selected named entities suggested by the knowledge source. In some worst cases as we had witnessed in the empirical results, it has shown that few entities has no link to relate with the knowledge source and it has paved way for ambiguous connection and lead to bad search results. This was taken as one of the research gap identified in the extraction process and we had given the solution for the same in the following sections.

The Hidden Markov Model uses many language processing tasks such as POS tagging, Named Entity Detection, and Classification, etc. In this proposed approach, we have taken Twitter as a social media site and carry out the process of identifying the potential named entities from Twitter streams. As the tweets are very short and noisy, finding named entities is a challenging task and linking named entities to appropriate knowledge base mentions is yet another cumbersome process to deal with. Hence, in this proposed system, we have explained the mechanism to link entities to knowledge base, removing the ambiguity persisting over the extracted named entities and enhance the capabilities of searching much easier than before using semantic Web technologies like RDF/SPARQL.

Key Terms in this Chapter

RDF: Resource description framework (RDF) is a family of world wide web consortium (W3C) specifications originally designed as a metadata data model.

SPARQL: SPARQL (pronounced “sparkle,” a recursive acronym for SPARQL protocol and RDF query language) is an RDF query language, that is, a semantic query language for databases, able to retrieve and manipulate data stored in resource description framework (RDF) format.

NER: Named-entity recognition (NER; also known as entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities.

DBpedia: The DBpedia DataID vocabulary is a metadata system for detailed descriptions of datasets and their physical instances, as well as their relation to agents like persons or organizations in regard to their rights and responsibilities.

Word Sense Disambiguation: In computational linguistics, word-sense disambiguation (WSD) is an open problem of natural language processing and ontology. WSD is identifying which sense of a word (i.e., meaning) is used in a sentence, when the word has multiple meanings.

LDA: In natural language processing, latent dirichlet allocation is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Effective Entity Linking and Disambiguation Algorithms for User-Generated Content (UGC)

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List