Heterogeneous Text and Numerical Data Mining with Possible Applications in Business and Financial Sectors

Heterogeneous Text and Numerical Data Mining with Possible Applications in Business and Financial Sectors

Farid Bourennani, Shahryar Rahnamayan
DOI: 10.4018/978-1-60960-126-3.ch004
(Individual Chapters)
No Current Special Offers


Nowadays, many world-wide universities, research centers, and companies share their own data electronically. Naturally, these data are from heterogeneous types such as text, numerical data, multimedia, and others. From user side, this data should be accessed in a uniform manner, which implies a unified approach for representing and processing data. Furthermore, unified processing of the heterogeneous data types can lead to richer semantic results. In this chapter, we present a unified pre-processing approach that leads to generation of richer semantics of qualitative and quantitative data.
Chapter Preview

Litterature Review On Heterogeneous Data Types Mining

Based on our best knowledge, Back et al.. (2001) were the first researchers to start working on the HDM in 2001. Their project focused on the HDM of texts and numerical data for benchmarking activities. The same researchers worked on the HDM for financial and business report data (Ecklund et al., 2001), (Kloptchenko et al., 2004), (Magnussona et al., 2005). The reason for using data mining in those kinds of projects is that the tremendous amount of available financial data simply exceeds the interest of the managers and investors to analyze the data (Adriaans and Zantinge, 1996). Furthermore, “the purpose of benchmarking is to compare the activities of one company to those of another, using quantitative or qualitative measures, in order to discover ways in which effectiveness could be increased” (Ecklund et al., 2001). In these works, the qualitative data are actually from the companies’ respective CEOs reports, Business reports depending on the project. The quantitative data are nine financial ratios, namely, Return on Total Assets (ROTA), Return on Equity (ROE), and others. The number of companies was from 3 to 76 depending on the projects. The Self-Organizing Map (SOM) which is an unsupervised clustering algorithm was used for processing the heterogeneous data types. The selection of SOM is judicious because it permits the clustering of the data without knowing the expected number of clusters prior to the data mining operations. In addition, the SOM’s trained map facilitates the visual exploration of the clusters and the data relationships. The SOM was successfully applied to purely quantitative data (homogenous data) (Back et al., 2001); however, the clustering results were divergent when SOM was applied to heterogeneous textual and numerical data. A couple of inappropriate configurations contributed to divergent clustering results.

Complete Chapter List

Search this Book: