BibliographyIt needs to be emphasized this is a personal survey from the point of someone who has a good understanding of probability theory, statistics, signal processing etc. The emphasis is on estimating future returns of financial instruments based on past price information, relative value strategies and other signals that are combined via state of the art machine learning methods. [1] The Elements of Statistical Learning, T. Hastie, R. Tibshirani, J. H. Friedman, July 2003. This book can be considered as the bible of quantitative finance for anyone who's looking for predictable patterns in financial markets. Key notions of overlearning, overfitting and generalization ability of learners are discussed. Lots (and, I mean lots) of practical algorithms are presented. Very valuable reference and a good starting point. What's the catch? The area of machine learning & pattern recognition is huge. This book is just a starting point. You need to read the original papers to understand what's going on at a deeper level. [2] ECE 901: Statistical Learning Theory, lecture notes by Rob Nowak. I would recommend this if you want to understand the theory of statistical learning. Theory and practice tend to differ a lot when it comes to machine learning (as is true for many disciplines). Theory gives a lot of insight when it comes to design of algorithms, yet performance of most algorithms are demonstrated via simulations. In heart, I tend to agree with Vladimir N. Vapnik, who said "there's nothing more practical than a good theory." ;) [3] News and trading rules, J.D. Thomas, PhD Thesis, Carnegie Mellon Univ., Pittsburgh, 2003.I believe that financial markets are by and large efficient, and any single signal has little predictive power. What's the consequence? Statistical learners tend to operate in the low SNR regime. In other words, your signals will be weakly correlated with future returns and misclassification errors will be large. This thesis gave me the valuable perspective that the goal of learning algorithm design should be robustness to noise. You particularly appreciate this point of view when working with daily or weekly data sets (small sample size). Noisiness of data suggests a different way of looking at the problem: The goal is not to find the best single (and, potentially complex) predictor, but to find lots of simple predictors each marginally powerful, yet when combined, provide significant predictive power & robustness. My experience has been that an optimized neural network (i.e. a complex learner) can have much worse generalization ability than an ensemble of equal-weight combined linear predictors (searched randomly and filtered based on performance in training set). The thesis also teaches a good lesson on how to do quant research: You should have a null hypothesis (H0) and compute p-value and z-scores based on H0 to test predictive power. I tend to use heteroschedastic random walk as H0 in my tests. If you have a large enough data set, you may not use H0 and rely solely on cross-validation. [4] L. Breiman, "Stacked regressions," Machine Learning, Springer, 1996.Ensemble methods are the name of the game in statistical learning. You should definitely check out papers on bagging, boosting & variants. This paper proposes combining multiple predictors linearly with non-negative weights. The weights are chosen via leave-one-out cross validation. The method is very intuitive and relatively easy to implement. [5]
A. W. Lo and C. MacKinlay, "Stock Market Prices Do Not Follow
Random Walks: Evidence from a Simple Specification Test," Review of
Financial Studies, vol. 1, pp. 41-66, 1988This
paper introduced the Variance Ratio (VR) test. What's VR test? Suppose
you have a time-series and you want to understand its "character". The
question you want to answer is: i) Is the time-series "trending" (i.e.
positive returns tend to be followed by positive returns, and negatives
followed by negative); ii) is it "mean reverting"? (i.e. sigs of
consecutive returns tend to be opposite---negative serial correlation);
or, iii) is there no correlation between consecutive returns (i.e.
random walk). VR test gives an answer to this question. The paper also
finds an asymptotic formula for the statistical significance of the
result (z-score). When applied to stocks, I find it pretty remarkable
that VR test can reveal this information, whereas observing the
auto-correlation function directly barely shows any temporal
correlation. See reference [8] below for an excellent introduction to
VR with examples. [6] Marketsci Blog (marketsci.wordpress.com)I love this blog. Suppose you ran a VR test on S&P 500 index time-series, and found that it's strongly mean-reverting over 5-15 days (from year 2000 and onwards, this is true!). How are you going to exploit this seeming inefficiency? This blog finds & back-tests a multitude of indicators that can help you. The strategies presented tend to be contrarian (i.e. they bet on mean-reversion one way or another). What are the indicators? Some are very simple: i) Daily follow through (bet that S&P will go up [down] tomorrow if it went down [up] today); ii) Various forms of moving average (interpreted in contrarian manner); iii) RSI(2) etc. The remarkable thing is that these indicators gave good & statistically-significant performance in the last 10 years accross all time periods. My back-testing shows that combining some of these indictators using ensemble techniques gives decent out-of-sample performance. The author's own trading performance, audited independently, can also be found on marketsci web site. I want to add a brief note on the usefulness of technical indicators. While academic papers debunk the utility of technical indicators, I find it pretty staggering that the strategies outlined in this blog are working. Based on my estimates, the demonstrated performance is statistically significant and can be much improved by ensemble techniques out-of-sample. My own take on technical indicators is that a good technical indicator should rather be based on short-term price information relative to sample size. Market rules tend to change over time and ensuring statistical significance in the recent past is the key. "Ensemble of technical indicators" is a much larger class of learners than individual technical indicators. If a price series is predictable from its past, it's likely to be predictable via an ensemble of technical indicators. [7] Pairs Trading: Quantitative Methods and Analysis, G. Vidyamurthy, Wiley Finance, 2004. One
of the best quant/trading books I've read. The author gives an
excellent overview of pairs trading methodology. Written in a simple
language for a reader with signal processing background. Has a nice
section on Arbitrage Pricing Theory. This book inspired me to try
principal component analysis on stock-price time series (or,
stock-returns) to decompose them into uncorrelated components. [8]
A Computational Methodology for Modelling the Dynamics of Statistical
Arbitrage, Andrew Neil Burgess, PhD Thesis, London Business School,
1999. This
not-so-well-known thesis is an excellent reference on cointegration
(i.e. relative value) strategies for stocks. It also introduces the
Variance Ratio test with excellent visuals & examples. What's
cointegration? Two or more time-series is called cointegrated if a
linear combination of the series is stationary, while the series
themselves are not. This means that profit is possible by betting on
the direction of linear combinations of stock prices. In this sense,
cointegration is a generalization of pairs trading. While pairs trading
bets on the relative moves of two time-series (it's usually a
convergence bet), cointegration bets on linear combinations of two or
more time-series. It
should be noted that a cointegration bet doesn't always have to be for
convergence. As long as the statistics of the stationary time-series is
known, one can bet in either direction (convergence or divergence)
depending on price history & statistics of the series. I would
recommend you to check Principal Component Analysis (PCA) and other
blind source separation techniques for finding linear combining weights
that yield stationarity. [9]
E. O. Thorp, "The Kelly Criterion in Blackjack, Sports Betting, and the
Stock Market," The 10th International Conference on Gambling and Risk
Taking Montreal, June 1997.As
you must have noticed from the order of references, the most basic
prediction problem deals with estimating future returns for a single
time-series. The next step is to bet on relationships between two
time-series (pairs trading) and then to several (cointegration). Assuming
you estimated the mean and covariance of future returns of multiple
instruments, how do allocate your funds? Kelly criterion addresses this
question with the criterion that the goal is to maximize the growth
rate of your capital. You need to know (or, have an estimate of) joint
distribution of future returns. Knowing the mean and covariance of
future returns suffices up to a 2nd order approximation. A
not-so-well-known property of Kelly criterion is that it maximizes the
Sharp ratio under an L2-norm constraint on portfolio weights. I
mathematically proved this myself and haven't seen it mentioned
anywhere. Ping me if you want to learn more. [10]
Quantitative Equity Portfolio Management: An Active Approach to
Portfolio Construction and Management, L. B Chincarini and Daehwan Kim,
McGraw-Hill Library of Investment and Finance, 2006. Excellent
introduction to factor models. Particularly liked the sections on market anomalies and popular
fundamental factors. Much more readible than a standard reference in
this area (Grinold and Kahn). The presented approach is not very useful
if you lack quantitative fundamental data. [11] How markets slowly digest changes in supply and demand, J.P. Bouchaud, J.D. Farmer, F. Lillo.Market
microstructure survey. A must read if you want to understand the price
formation process in stock markets. The first author has lots of
interesting articles. The best place to find new and interesting finance articles. Research projectsThese
are some research projects I've worked on. Pointers to my approach are
outlined above. - S&P 500 index daily return estimation from past prices
- Find price-based and seasonal signals that are correlated with tomorrow’s return (linear base learner).
- Distinguish “fake patterns” from real ones by computing p-values and z-scores with respect to heteroschedastic random walk.
- Combine multiple estimators via ensemble methods.
- Trading cointegrated large-cap ETFs
- Principal component analysis to decompose daily returns.
- Optimal linear estimation of return for each component.
- Kelly betting based on estimated mean and covariance of returns.
- K Nearest-Neighbor (kNN) classifier from daily bars.
- Branch-and-bound based optimized C++ implementation for kNN classifier.
- Daily bars seem to have statistically-significant predictive power for Russell 2000 (small-cap) stocks but not for large caps.
- Predictive power is strongest in 2000-2004 and diminishes in time.
This page was made before I took a full time job in an algorithmic trading company in Dec 2009. Good luck! |