Strategy backtesting is often used to test an investment strategy from historical stock or ETF information (usually price). The belief is that if a strategy performed well in the past, then it should continue to perform well in the future.
That sounds fine in theory. In practice however, backtesting is often misleading in the sense that past returns (backtested returns) are usually much better than future returns (those you get when you actually invest your money!). Note that I’m not referring here to a case of cheating by looking ahead at the numbers during the backtest, as the unscrupulous people at F-Squared Investments have done. Please see my post on A $35M Backtesting Error to learn more about that story.
The problem I’m referring to here arises from a basic tenet of statistics called overfitting. It is perhaps the most important issue that statisticians worry about when analyzing a claim. In a sense, this refers to the old joke that “there are truths, half truths, and then there are statistics”. 😉
What is Overfitting Anyway?
Overfitting is best explained through a simple example. Let’s say you play a game of coin toss with a friend. Your objective is to get 3 heads in a row. The mathematical odds of achieving this are 1 / (2*2*2) = 12.5% in any given set of 3 coin tosses.
Now, let’s say you keep trying until you eventually get 3 heads in a row. Perhaps you were lucky and you got it on the first set of 3 tosses. Maybe you had to do 8 sets of 3 tosses. Or worse, you happened to be “unlucky” and you had to do 25 sets of 3 tosses before you finally got 3 heads in a row.
Here’s the million dollar question: can you honestly say with a straight face that you finally figured it out, so that the next time you toss the coin three times in a row, you will again get three heads? And then repeat that feat once more after that?
Of course not. The odds are always 1 in 8 or 12.5% for any given set of three tosses, no matter what you achieved in the past.
What About Strategy BackTesting?
Let’s apply this idea to the investment strategy backtesting process. Let’s say you come up with an ETF trading strategy, and you back test it over the past 10 years. You get uninspiring results. So you go back and modify the rules a bit. Perhaps you modify one of your buy criteria. Or maybe it’s the sell criteria. Perhaps you select a different universe of ETFs to choose from by adding one “good performing” ETF and by removing a “bad performing” one for instance. There are lots of ways you can fine-tune your strategy every time you iterate.
And so you try again, and you still get uninspiring results, albeit different from the first time. So you tweak something again, and try it out. Repeat that process 20 or even 50 times, until you finally get compelling results – say, an annualized return of 25% over the 10 year time frame. Wow, you just found a pot of gold!
Now ask yourself this: how does that approach differ from the coin tossing example above?
Answer: It doesn’t, because:
You kept trying until you got the results you wanted to see.
Why is BackTesting Different from Tossing a Coin?
In the coin tossing example above, you keep trying until you get 3 heads in a row. Every time you toss the coin, you make a slightly different movement which then gives you different results (either heads or tail). Eventually, you get lucky and get three heads in a row.
Similarly, when you fine-tuned your ETF trading strategy through backtesting, you essentially refined your approach, by changing a rule here or there, to get the results you want to see. So far so good. However, you are using the exact same price data and time frame at every try. This means that if you try, say, a million times, you are likely to eventually find the exact fine tuned set of rules that will tell you to buy your ETFs at their absolute lowest point, and sell them at their highest points, because the data never changes. But the future price movements are just about guaranteed to have different behaviors, so your perfect set of rules on the past, fixed data set is unlikely to be useful.
In other words, by doing extensive data mining, you find rules that performed extremely well in the past, but have essentially no future predictive value for the future.
Handpicking rules, stocks or ETFs based on past performance always lead to some form of overfitting. This means that future performance will not be predictable based on the data of the past. The problem isn’t with the strategy backtesting approach per se, but rather with the method of trying and repeating on the same data until you get what you want, while eliminating the results you don’t like. Backtesting isn’t the problem. The method used to evaluate your strategy is the problem. But no worries, there are better ways. 🙂