Correlation between the Economy News and Stock Market in Turkey

Correlation between the Economy News and Stock Market in Turkey

Sadi Evren Seker (Department of Computer Engineering, Istanbul University, Istanbul, Republic of Turkey), Cihan Mert (Department of Electrical Engineering, University of Texas at Dallas, TX, USA), Khaled Al-Naami (Department of Computer Science, University of Texas at Dallas, TX, USA), Nuri Ozalp (Turkish Science Foundation, Istanbul, Republic of Turkey) and Ugur Ayan (Turkish Science Foundation, Istanbul, Republic of Turkey)
Copyright: © 2013 |Pages: 21
DOI: 10.4018/ijbir.2013100101
OnDemand PDF Download:
$37.50

Abstract

Depending on the market strength and structure, it is a known fact that there is a correlation between the stock market values and the content in newspapers. The correlation increases in weak and speculative markets, while they never get reduced to zero in the strongest markets. This research focuses on the correlation between the economic news published in a highly circulating newspaper in Turkey and the stock market closing values in Turkey. In the research several feature extraction methodologies are implemented on both of the data sources, which are the stock market values and economic news. Since the economic news is in natural language format, the text mining technique, term frequency – inverse document frequency is implemented. On the other hand, the time series analysis methods like random walk, Bollinger band, moving average or difference are applied over the stock market values. After the feature extraction step, the classification methods are built on the well-known classifiers support vector machine, k-nearest neighborhood and decision tree. Moreover, an ensemble classifier based on majority voting is implemented on top of these classifiers. The success rates show that the results are satisfactory to claim the methods implemented in this study can be spread to future research with similar data sets from other countries.
Article Preview

2. Problem Statement

We have two datasets, the economy news and stock market closing values, and we have applied a layered approach in this study, as illustrated by Figure 1. At the bottom level starting from the datasets, we build a feature extraction methodology. For the economic news, which is in natural language, we have applied sentimental analysis via the text mining methodologies. On the other hand, we have applied the random walk method for the feature extraction from the stock market closing values.

Figure 1.

Bottom up overview of study

The correlation between news and the stock market is one of the indicators of the speculative markets (Nikfarjam, Emadzadeh, & Muthaiyah, 2010).

One of the difficulties in this study is dealing with the natural language data source, which requires a feature extraction. The other difficulty is dealing with a stock market value, which is considered as a signal. The size of data we are dealing with, which can be considered big data, is also problematic. The dataset holds 131,248 distinct words and when the feature vector of each economic news item is collected, the total size of the feature vector is beyond 2.5 GByte, which is beyond the computation capacity of a single computer with these classification algorithms. For example, only one of the classification algorithms, we will call SVM requires slightly more memory than 1TByte in this case.

Research about stock markets has always been an interesting area of study because of its impact on the business and financial world. Furthermore, the developments in the computer science field have opened a door to research on the stock markets parallel to text-based news after 2000s. Most of the works use text mining tools over the news. For example, one of the popular methods is to use the bag of words (2-6) where some use local dictionaries (2,6) or some use some text mining tools like IBM Text Miner(Fung, Yu, & Lam, 2002). Others use the TF-IDF approach (3,6), some predefined term dictionaries(Rachlin, Last, Alberg, & Kandel, 2007) or part of speech tagging (Mahajan, Dey, & Haque, 2008) and concept maps (Soni, Eck, Jan, & Uzay, 2007). Also classification varies from latent Drichlet allocation (Mahajan, Dey, & Haque, 2008) to SVM (3-8) or decision tree (Rachlin, Last, Alberg, & Kandel, 2007). The success rates vary from 45% to 82% and the data sources are Reuters market 3000 extra (Fung, Yu, & Lam, 2002), PRNewswire (Mittermayer & Knolmayer, NewsCATS: A News Categorization and Trading System, 2006), FT Intelligence (Soni, Eck, Jan, & Uzay, 2007), Australian Financial Review(Halgamuge, Y, & Hsu, 2007), Forbes and Reuters web sites (Rachlin, Last, Alberg, & Kandel, 2007), Yahoo Finance (Schumaker & Chen, 2009) or Wall Street Journal (Mahajan, Dey, & Haque, 2008).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 2 Issues (2017): 1 Released, 1 Forthcoming
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing