Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Effective and Efficient Classification of Topically-Enriched Domain-Specific Text Snippets: The TETSC Method

Marco Spruit, Bas Vlug

Source Title: International Journal of Strategic Decision Sciences (IJSDS) 6(3)

DOI: 10.4018/IJSDS.2015070101

Article PDF Download Open access articles are freely available for download

Abstract

Due to the explosive growth in the amount of text snippets over the past few years and their sparsity of text, organizations are unable to effectively and efficiently classify them, missing out on business opportunities. This paper presents TETSC: the Topically-Enriched Text Snippet Classification method. TETSC aims to solve the classification problem for text snippets in any domain. TETSC recognizes that there are different types of text snippets and, therefore, allows for stop word removal, named-entity recognition, and topical enrichment for the different types of text snippets. TETSC has been implemented in the production systems of a personal finance organization, which resulted in a classification error reduction of over 21%. Highlights: The authors create the TETSC method for classifying topically-enriched text snippets; the authors differentiate between different types of text snippets; the authors show a successful application of Named-Entity Recognition to text snippets; using multiple enrichment strategies appears to reduce effectivity.

Article Preview

Top

1. Introduction: The Wicked Problem Of Classifying Text Snippets

The recent years have witnessed an unprecedented growth in the amount of text snippets. The Washington Post reports that in March 2013 over 400 million tweets are sent per day, an increase from 200 million since 2011 (Tsukayama, 2013; Twitter Engineering, 2011). This is only the increase in the number of text snippets from one source. In today’s society there are plenty of places where text snippets are found. Twitter is one place mentioned earlier, but also for instance search engines or banks produce a large amount of text snippets per day in the form of search result snippets or financial transactions.

Most of these text snippets are taken as being of no domain. This, however, is far from the truth. There are plenty of domain-related tweets being sent on a daily basis, customer service tweets of companies being one example thereof. Furthermore, there even exist domain-specific search engines, such as MEDLINE, which are designed to yield better results as they are aimed at specific domains.

While a lot of text snippets are created and generated on a daily basis, it currently still is a problem to even only summarize these so-called text snippets through classification. While the classification of large documents has reached effectiveness levels comparable to those of trained professionals, the classification of short texts, in this research denoted as text snippets, is different (Sebastiani, 2002). Chen, Xiaoming & Shen (2011) identify the reason being mainly that text snippets are of short length and therefore suffer from sparsity.

By not being able to correctly classify text snippets, companies miss out on business opportunities. Being able to correctly classify tweets, for instance, could provide a lot of information that can be used to identify trends, or, being able to correctly classify financial transactions could provide account owners with valuable overviews of expenses, which in turn can make them more in control of their finances. Another application domain which is well known to suffer from valuable information in unstructured text snippets, is healthcare, where doctors often record a patient’s diagnosis and/or prognosis in the dossier’s comment field only (Spruit, Vroon & Batenburg, 2014).

This paper attempts to solve the problem of correctly classifying domain-specific text snippets to predefined categories. A vast amount of literature can be found intended to solve this problem. Most of this literature is related to the enrichment of text snippets through various means:

1.
Search query results (e.g.Sahami & Heilman, 2006; Shen et al., 2006);
2.
The categorical structure of an intermediary (such as Wikipedia or Yahoo, see, e.g.Shen et al., 2006; Gabrilovich & Markovitch, 2005);
3.
An external corpus (e.g.Gabrilovich & Markovitch, 2006; Wang & Domeniconi, 2008);
4.
Topic models (e.g.Phan, Nguyen & Horiguchi, 2008; Ramage, Dumais & Liebling, 2010); or
5.
Lexical information (e.g.Hu et al., 2009).

Complete Article List

Search this Journal:

Reset

Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 14: 1 Issue (2023)

Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 12: 3 Issues (2021)

Volume 11: 4 Issues (2020)

Volume 10: 4 Issues (2019)

Volume 9: 4 Issues (2018)

Volume 8: 4 Issues (2017)

Volume 7: 4 Issues (2016)

Volume 6: 4 Issues (2015)

Volume 5: 4 Issues (2014)

Volume 4: 4 Issues (2013)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Effective and Efficient Classification of Topically-Enriched Domain-Specific Text Snippets: The TETSC Method

Abstract

1. Introduction: The Wicked Problem Of Classifying Text Snippets

Complete Article List