A Dynamic Modeling and Validation Framework for the Market Direction Prediction

A Dynamic Modeling and Validation Framework for the Market Direction Prediction

Xugang Ye (Johns Hopkins University, USA) and Jingjing Li (University of Colorado, USA)
DOI: 10.4018/978-1-5225-1759-7.ch015
OnDemand PDF Download:
No Current Special Offers


There are many research papers talking about building various machine learning models to predict the market index. However, very few attention has been paid to effectively validating or calibrating the prediction results. The focus of this paper is to present a dynamic modeling and validation framework for the market direction prediction. The central idea is to calibrate the probabilistic prediction by estimating two conditional probabilities of correct forecast from the dynamic validation data set. The calibration method can be combined with any predictive model that generates probabilistic prediction of the market direction.
Chapter Preview

1. Introduction

Modeling and forecasting market direction to facilitate trading strategies have been receiving more and more attention from academia and industry (Christoffersen, 2006; Granger, 1992; Tsay, 2010). Aiming at forecasting price movement — up or down — of general stock market, market direction prediction is also called “market timing” (Henriksson & Merton, 1981). Undoubtedly, this task is difficult because of the high market volatility. Many factors, such as political event, economic condition, trade mismatch, rumors, news and investors’ sentiment and mentality, all contribute to the high degree of uncertainty and fluctuation of the stock market (Harris, 2008).

Numerous efforts have been dedicated to find adequate modeling techniques to capture the market volatility. For example, the time series models, such as Autoregressive Conditional Heteroscedasticity (ARCH) (Engle, 1982) and generalized ARCH (GARCH) (Bollerslev, 1986), have been extensively used in economic and finance research. Those models, originated from the theories of the financial time series, often assume a stationary linear correlation structure (Tsay, 2010) among the time series data, which may not be able to capture the non-stationary nonlinear patterns and the impact of external events. On the contrary, machine learning approaches focus on finding patterns from data and usually make much less assumptions. Although various types of machine learning models including Artificial Neural Network (ANN) (Nicholas Refenes et al., 1994; Yoon & Swales, 1991; Zhang & Wu, 2009), Support Vector Machine (SVM) (Cao & Tay, 2003; Huang et al., 2005; Rao & Hong, 2010), Adaboosting (Rodríguez & Sosvilla-Rivero, 2006), and Hidden Markov Model (HMM) (Hassan & Nath, 2005; Rao & Hong, 2010) have been applied to the area of stock market forecasting, the machine learning methods became dominant only recently due to the fact that the technologies that enable people to generate and handle massive data are only recent stories.

Over the past few years, building machine learning models to predict financial markets has become more and more popular (Atsalakis & Valavanis 2009; Yoo et al., 2005). However, the existing market prediction literature shows that most work centers on static models, that is to build a model under a variety of parameter settings from a training data set of a long time window, pick the best one according to its performance on the validation data set (as part of the training data set), and then test it on a hold-out data set of a relatively short time window for evaluation. Although this is a sound procedure for many machine learning applications, for market prediction, a problem is that there could be several market regime changes during a long time horizon so that a model trained and validated from the data collected from a long historical time window may not be able to catch the market dynamics. Another problem is that the models/parameters that are thrown away by the one-time validation will never participate the future recursive prediction tasks. Consequently, the market prediction is solely dependent on a narrow range of model settings, which are hardly adaptive to the future market changes. One fact that is often neglected is that for market prediction, a model that has very poor validation performance does not necessarily mean it should be discarded, one can use the opposite of the prediction. It actually follows that a really bad model is the one that has equal chance of predicting right or wrong. We say this kind of models do not generate “signal” and performs as same as random guess. Likewise, a model that performs very well on the validation set does not necessarily do as well in continual tests. One simple reason is that the market is driven by much more factors than the finite number of features that any model can accommodate. Tightly sticking to a model will likely lead to catastrophe sometime in the future (Harris, 2008).

Complete Chapter List

Search this Book: