Text Mining for Business Intelligence

Text Mining for Business Intelligence

Konstantinos Markellos
Copyright: © 2009 |Pages: 10
DOI: 10.4018/978-1-60566-010-3.ch298
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Nowadays, business executives understand that timely and accurate knowledge has become crucial factor for making better and faster business decisions and providing in this way companies a competitive advantage. Especially, with the vast majority of corporate information stored as text in various databases, the need to efficiently extract actionable knowledge from these assets is growing rapidly. Existing approaches are incapable of handling the constantly increasing volumes of textual data and only a small percentage can be effectively analyzed. Business Intelligence (BI) provides a broad set of techniques, tools and technologies that facilitate management of business knowledge, performance, and strategy through automated analytics or human-computer interaction. It unlocks the “hidden” knowledge of the data and enables companies to gain insight into better customers, markets, and business information by combing through vast quantities of data quickly, thoroughly and with sharp analytical precision. A critical component that impacts business performance relates to the evaluation of competition. Measurement and assessment of technological and scientific innovation and the production of relative indicators can provide a clear view about progress. Information related to those activities is usually stored to large databases and can be distinguished in: research information stored in publications or scientific magazines and developmentproduction information stored in patents. Patents are closely related to Technology Watch, the activity of surveying the development of new technologies, of new products, of tendencies of technology as well as measuring their impact on actual technologies, organizations or people. Statistical exploitation of patent data may lead to useful conclusions about technological development, trends or innovation (Chappelier et al., 2002). Traditional methods of extracting knowledge from patent databases are based on manual analysis carried out by experts. Nowadays, these methods are impractical as patent databases grow exponentially. Text Mining (TM) therefore corresponds to the extension of the more traditional Data Mining approach to unstructured textual data and is primarily concerned with the extraction of information implicitly contained in collections of documents. The use of automatic analysis techniques allows us to valorize in a more efficient way the potential wealth of information that the textual databases represent (Hotho et al., 2005). This article describes a methodological approach and an implemented system that combines efficient TM techniques and tools. The BI platform enables users to access, query, analyze, and report the patents. Moreover, future trends and challenges are illustrated and some new research that we are pursuing to enhance the approach are discussed.
Chapter Preview
Top

Introduction

Nowadays, business executives understand that timely and accurate knowledge has become crucial factor for making better and faster business decisions and providing in this way companies a competitive advantage. Especially, with the vast majority of corporate information stored as text in various databases, the need to efficiently extract actionable knowledge from these assets is growing rapidly. Existing approaches are incapable of handling the constantly increasing volumes of textual data and only a small percentage can be effectively analyzed.

Business Intelligence (BI) provides a broad set of techniques, tools and technologies that facilitate management of business knowledge, performance, and strategy through automated analytics or human-computer interaction. It unlocks the “hidden” knowledge of the data and enables companies to gain insight into better customers, markets, and business information by combing through vast quantities of data quickly, thoroughly and with sharp analytical precision.

A critical component that impacts business performance relates to the evaluation of competition. Measurement and assessment of technological and scientific innovation and the production of relative indicators can provide a clear view about progress. Information related to those activities is usually stored to large databases and can be distinguished in: research information stored in publications or scientific magazines and development-production information stored in patents.

Patents are closely related to Technology Watch, the activity of surveying the development of new technologies, of new products, of tendencies of technology as well as measuring their impact on actual technologies, organizations or people. Statistical exploitation of patent data may lead to useful conclusions about technological development, trends or innovation (Chappelier et al., 2002).

Traditional methods of extracting knowledge from patent databases are based on manual analysis carried out by experts. Nowadays, these methods are impractical as patent databases grow exponentially. Text Mining (TM) therefore corresponds to the extension of the more traditional Data Mining approach to unstructured textual data and is primarily concerned with the extraction of information implicitly contained in collections of documents. The use of automatic analysis techniques allows us to valorize in a more efficient way the potential wealth of information that the textual databases represent (Hotho et al., 2005).

This article describes a methodological approach and an implemented system that combines efficient TM techniques and tools. The BI platform enables users to access, query, analyze, and report the patents. Moreover, future trends and challenges are illustrated and some new research that we are pursuing to enhance the approach are discussed.

Top

Background

Patents are closely related to technological and scientific activities (Narin, 1995). They give an indication of the structure and evolution of innovative activities in countries, regions or industries. In this framework, patents are linked to Research and Development (R&D) and can be considered as indicators of R&D activities (Schmoch et al. 1998).

A patent is a legal title granting its holder the exclusive right to make use of an invention for a limited area and time by stopping others from, amongst other things, making, using or selling it without authorization (EPO, 2006). The patent applicant has to provide a detailed technical description of its invention but also mention the points that render it an original application with innovative elements.

A patent can be decomposed and described by several fields (table 1). Each field contains specific information while each patent is described by a code (or in many cases more than one codes) depicting its technical characteristics. These codes are given to patents based on the International Patents Classification system (IPC) or other classification systems. We should also mention that patent documents can be either retrieved from on-line patent databases, or patent databases available on CD-ROMs.

Complete Chapter List

Search this Book:
Reset