Web Service Architectures for Text Mining: An Exploration of the Issues via an E-Science Demonstrator

Neil Davis

doi:10.4018/978-1-59904-990-8.ch047

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Web Service Architectures for Text Mining: An Exploration of the Issues via an E-Science Demonstrator

Neil Davis

Source Title: Handbook of Research on Text and Web Mining Technologies

DOI: 10.4018/978-1-59904-990-8.ch047

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Text mining technology can be used to assist in finding relevant or novel information in large volumes of unstructured data, such as that which is increasingly available in the electronic scientific literature. However, publishers are not text mining specialists, nor typically are the end-user scientists who consume their products. This situation suggests a Web services based solution, where text mining specialists process the literature obtained from publishers and make their results available to remote consumers (research scientists). In this chapter we discuss the integration of Web services and text mining within the domain of scientific publishing and explore the strengths and weaknesses of three generic architectural designs for delivering text mining Web services. We argue for the superiority of one of these and demonstrate its viability by reference to an application designed to provide access to the results of text mining over the PubMed database of scientific abstracts.

Chapter Preview

Top

Introduction

With the explosion of scientific publications it has become increasingly difficult for researchers to keep abreast of advances in their own field, let alone trying to comprehend advances in related fields. Due to this rapid increase in the quantity of available electronic textual data both by publishers and third party providers, automatic text mining is of increasing interest to extract and collate information in order to make the scientific researcher’s job easier. Some publishers are already beginning to make textual data available via Web services and this trend seems likely to increase as new uses for data provided in this manner are discovered. Not only does the internet provide a means to accelerate the publishing cycle, it also offers opportunities for new services to be provided to readers, such as search and content-based information access over huge text collections.

It is not envisioned that publishers themselves will provide technically complex text mining functionality, but that such functionality will be supplied by specialist text processors via “value added” services layered on top of the basic Web services supplied by the publishers. These specialist text processors will need domain expertise in the scientific area for which they are producing text mining applications. However they are unlikely to be the research scientists using the information, because of the specialised knowledge required to build text mining applications. Starting with the presumption of three interacting entities: publishers, text mining application providers and consumers of published material and text mining results, we discuss in this chapter a variety of architectural designs for delivering text mining using Web services and describe a prototype application based on one of them. In the rest of this section we review some of the context and related work pertaining to this project.

Text Mining

Text Mining is a term, which is currently being used to mean various things by various people. In its broadest sense it may be used to refer to any process of revealing information, regularities, patterns or trends, in textual data. Text Mining can be seen as an umbrella term covering a number of established research areas such as information extraction (IE), information retrieval (IR), natural language processing (NLP), knowledge discovery from databases (KDD), and so on. In a narrower sense it requires the discovery of new information, not just the provision of access to information existing already in a text or to vague trends in text (Hearst, 1999). In the context of this paper, we shall use the term in its broadest sense. We believe that, while the end goal may be the discovery of new information from text, the provision of services which accomplish more modest tasks are essential components for more sophisticated systems. These components are therefore part of the text mining enterprise, and lend themselves more freely to being used in Web services architecture.

Text mining is particularly relevant to bioinformatics applications, where the explosive growth of the biomedical literature over the last few years has made the process of searching for information in this literature an increasingly difficult task for biologists. For example the 2004 baseline release of Medline contains 12,421,396 abstracts, published between the years of 1902 and 2004, of which 4,391,392 (around 35 percent) were published between 1994 and 2004.

Depending on the complexity of the task, text mining systems may have to employ a range of text processing techniques, from simple information retrieval to sophisticated natural language analysis, or any combination of these techniques. Text mining systems tend to be constructed from pipelines of components, such as tokenisers, lemmatisers, part-of-speech taggers, parsers, n-gram analysers, and so on. New applications may require modification of one or more of these components, or the addition of new bespoke components; however different applications can often re-use existing components. The exploration of the potential of text mining systems has so far been hindered by non-standardised data representations, the diversity of processing resources across different platforms at different sites and the fact that linguistic expertise for developing or integrating natural language processing components is still not widely available. All this suggests that, in the current era of information sharing across networks, an approach based on Web services may be better suited to rapid system development and deployment.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Web Service Architectures for Text Mining: An Exploration of the Issues via an E-Science Demonstrator

Abstract

Introduction

Text Mining

Complete Chapter List