However, using computational approaches to predict stock prices using financial data is not unique. In recent years, interest has increased in Quantitative funds, or Quants, that automatically sift through numeric financial data and issue stock recommendations. While these systems are based on proprietary technology, they do differ in the amount of trading control they have, ranging from simple stock recommenders to trade executors. Using historical market data and complex mathematical models, these methods are constrained to make assessments within the scope of existing information. This weakness means that they are unable to react to unexpected events falling outside of historical norms. However, this disadvantage has not stopped fund managers at Federated, Janus, Schwab, and Vanguard from trusting billions of dollars of assets to the decisions of these computational systems.
In this chapter, we introduce a different type of quant trader called AZFinText (Arizona Financial Text system), which focuses on making discrete numeric predictions based on the combination of financial news articles and stock price quotes. Our contribution rests on system building of the AZFinText system where trends and patterns are machine learned from stock quotes and textual financial news. While prior textual financial research has relied on tracking price direction alone, AZFinText leverages statistical learning to generate numeric price predictions and then make trading decisions from them. We further demonstrate that AZFinText outperforms the market average and performs well against existing quant funds.
We also continue further and investigate the roles that different linguistic representations can have on prediction, such as bag of words, noun phrases, proper nouns and named entities. Some representations, namely bag of words, have been the de facto representational technique in financial news analysis. From our work, we show that this representation actually performs the worst.
Our chapter builds upon the previously proven AZFinText system and explores the many branches of research that have been recently conducted using this platform. We feel that this topic is current and relevant and neatly blends together the finance and information systems sectors in an interesting way.
Predicting changes in the stock market has always had a certain appeal to researchers. While numerous attempts have been made (Chan et al., 1996; Cho et al., 1999; Gidofalvi, 2001; Mittermayer, 2004; Seo et al., 2002; Wuthrich et al., 1998; Yoon & Swales, 1991), the difficulty has always centered on the behaviors of human traders within a socially constructed system. With parameters ill-defined and constantly shifting, prediction has been difficult at best. To further create confusion, there have been two diametrically opposed philosophies of stock market research: fundamental and technical analysis techniques (Technical-Analysis, 2005). Fundamental analysis leverages the security’s relative data, ratios and earnings, while technical analysis utilizes charts and modeling techniques based on historic trading volume and prices. The main difference between them becomes can the market be timed or not?
Regardless of the technique used, both strategies rely on information. This desire for information is changing the way brokerage houses and securities analysts are approaching securities trading (Baldwin & Rice, 1997). With the advent of cheaper processing and knowledge acquisition techniques, the roles of computers in stock prediction has increased dramatically, where they have become mostly automated versions of existing fundamental and/or technical strategies. The goal of these systems is to achieve better returns than their human counterparts by removing the elements of emotions and biases from trading (Jelveh, 2006). The drawback is that these systems lack intuition, context and exterior channels of information where they have been known to continue buying battered stocks even after unfavorable news events, such as losing a costly court battle. These systems rely on news events to be translated into numeric data before appropriate decisions can be made. This information transcription problem introduces serious lag-time into decisions and in some cases, trades must be overridden by human analysts.
One area where this problem of information lag is most apparent in stock market prediction comes from textual data. Information from quarterly reports or breaking news stories can dramatically affect the share price of a security. Most existing literature on financial text mining relies on identifying a predefined set of keywords and machine learning techniques. These methods typically assign weights to keywords in proportion to the movement of a share price and have shown a definite, but weak ability to forecast the direction of share prices.