the (misplaced?) importance of recent performance

Let’s say you had to choose between two strategies. Both are twenty-stock models that are very easy to trade (no microcaps).

Strategy one has beat the market every year since 1999, has a CAGR of 43% (44,470% total return), and has never had a negative return in a calendar year. EXCEPT for 2017. Since January 1, it has lost 6.5%, underperforming the SP1500 by 24%.

Strategy two has beat the market every year since 1999 except for 2009, has a CAGR of 30% (8991% total return), and has never had a negative return in a calendar year. Since January 1, it has gained 19%, outperforming the SP1500 by 1%.

Which would you choose?

I would choose strategy #1. After all, it earned FIVE TIMES the amount of money as strategy #2 over the last 18 years. F**k the recent performance–it’s a fluke.

But I get a feeling that most people out there would look at the 2017 performance and say “no way!”

Now what if I told you the strategies were EXACTLY THE SAME and the only difference was the universe they applied to. Strategy one was applied to the S&P 1500, strategy two was applied to all fundamentals with a capitalization between $1 and $5 billion. Everything else was the same–all the buy rules, sell rules, ranking rules, universe exclusion rules.

Now which would you choose? Would 2017’s performance matter in this case?

The way people go in and out of ports depending on recent performance, the way people judge performance based on out-of-sample for the last six months–it all strikes me as so short-sighted. If you’re not looking at five-year or ten-year performance of a strategy, you’re looking at NOTHING. If you looked at the universe of all P123 strategies, you would probably find almost NO correlation between one six months’ return and the next six months’ return.

But the way designer models are judged almost purely on OOS performance contradicts everything about the way strategies are designed to work. In short, we design them with extensive look-back periods because we want them to perform well over extensive forward periods, not because we want them to perform well for the next six months. Nobody (I hope) designs models looking only at the performance over the past six months. So why should OOS performance, which is necessarily short, outweigh backtested performance so heavily? Shouldn’t there be more of a balance between them?

Ahhh… The Recentists versus the Frequentists. In other parlance, the battle between statistical significance (power) versus Bayesian inference.

If you believe that the market is a fixed phenomenon, then you will favor the frequentists’ view. If, however, you believe that markets are dynamic, you believe that the most recently observed phenomenon are more representative of the underlying drivers than historical patterns – i.e., you will subscribe to the recentists’ view.

I prefer neither view, instead ascribing to the view of scientific refutationism which says that data, whether or not recent, and whether or not significant, proves no theory. One contradictory data point, however, can turn an entire world view on its heads. This view does not eschew data, but uses its as a means of refuting, corroborating, and/or harmonizing beliefs about the truth.

I start with what I believe to be true, and then build a model around that belief. Real-world data informs and calibrates the model; parameters are forward-looking if/whenever possible. The model is designed to be inherently flexible to randomness. However, it is not robust to unforeseen uncertainty. The presence of one “impossible” event – or a clustering of very unlikely events – invalidates this model of reality. The defunct model serves as a learning point: no model is reality, otherwise it would be indistinguishable from that reality.

Primus,
well said !!

So what are the practical consequences now?

One could infer from your post that we always need to prepare for Black Swans. By their very nature, we can not see them in advance but the will SURELY come at one point in time. My ports are therefore hedged (most of the time), depending on perceived market risk (unemployment, yield curve and a few other timers).
This is costing me some Alpha but I can sleep much better since a sudden and crash-like downturn will leave me mostly unscathed.
There could be other ways to protect ones assets.
More ideas about this welcome.

Werner

Nobody believes in just one or the other. Everyone knows that both have to be taken into account. It’s a question of weight. The question was about weighing a five- or six-month performance against a multiyear performance. And I’m honestly curious about how many users would give the recent five- or six-month performance more weight than the multiyear performance.

I would say it really depends what is under the hood. If the model is a small cap (curve fit) model with a backtest of 50%+ and 5 stocks, and is now lagging, I’m going to be pretty impatient. If it is a model with good liquidity, 50+ stocks, and a ranking system with a few robust, simple factors, I am going to have much more confidence to hold out.

Good answer. For what it’s worth, it’s a model with very good liquidity but with a lot of small caps (no microcaps, though); the 50-stock results are very similar to the 20-stock results; and the ranking system is made up of 29 robust factors, most with a weight of 2.5%, but a few with weights of 5% or 7.5%. The model does not take into account past prices in its ranking system, buy rules, or sell rules–i.e. it is completely free of technical analysis.

Small cap value has sucked this year so far. Since small value is pretty much universal, or at least a significant component of P123 folks’ models, most are struggling (mine included). Again, a 50 stock model is going to have much less stock specific risk, so if you have confidence in your factors, I would hold tight.

Although, to be honest, I have added a fair bit to the large cap side of my portfolios, with the assumption being that if my factors are legit I should out perform. However, if I have 25-50 large cap stocks, while my factors may help me out perform, I have a good bit of confidence that I won’t, at least, underperform in the long run.

Chasing 30% annual returns going forward, seems like a pipe-dream regardless of the universe or ranking system.

I just did a quick study. I downloaded the results of the designer models. I then took the 2-year excess performance and subtracted the 1-year excess performance to get the excess performance of the models during the period 5/24/2015 to 5/24/2016. I then did a simple correlation between the 1-year excess performance and the prior year’s excess performance. The correlation was -0.26. This study was of the 152 designer models that have been running since 5/24/2015.

In other words, a designer model that has outperformed one year is more likely to underperform the next year, and vice-versa. Recent performance has absolutely nothing to do with future performance.

I would like to do a similar test with BACKTESTED results so that I could see if ten-year performance correlates with recent performance. But backtested results are not available.

  • Yuval

Yuval,
You can place Designer Models in a book. Run the book with only one asset, i.e. the DM of interest. You can get performance over any time period this way.

Thanks, Georg. That’s very cool–I did not know that.

So I did another study. If I restrict the designer models to those with more than five stocks, the correlation of performance between last year and the year before last is -0.16 (i.e. an inverse correlation) while the correlation of performance between last year and the ten years before that (including backtested results) is 0.26.

Including five-stock models, which I simply cannot trust, reduces the correlation to 0.

You’re better off looking at ten-year figures than at the last few months.

Isn’t there a possibility you are picking up the potential 0 correlation between backtested and OOS performance, since most of the 10 year data is the backtest.

It depends. If you include five-stock models–and there are plenty of those–I do believe that there is 0 correlation between backtest and OOS performance. If you don’t, though, I think there’s a lot better correlation between backtest and OOS performance than between one year of OOS performance and the next.

  • Yuval

I like the idea of giving more weight to recent memory, but retaining a long-term tail of memory.

In terms of modelling, the mean may be thought of as an autoregressive integrated moving average (ARIMA) whereby the expectation is revised with each new data point (vis-a-vis, Bayesian inference), but which also has a tendency to revert to an stationary expected value. The ubiquitous exponential-weighted moving average (EWMA) is a special case ARIMA without the long-term constant (i.e., ARIMA(0,1,1)). For modelling variance, an equivalent approach is expressed in the exponential generalized autoregressive conditional heteroskedastic (EGARCH) model.

Anyhow, whether or not we subscribe to the idea of exponential weighting, I think it’s helpful to examine how much weight a simple EWMA gives to recent versus far-dated history. For example, over a 7-year period, the most recent daily sample receives .0619% of the total weight of all observations (.0391% under an unweighted average). The most recent quarter gets 5.5%; the most recent year 21%. Follow that to the final year which receives 8.9%, the last quarter at 2.1%, and the last day at .0228%.

To me, at least, this doesn’t seem like an unreasonable framework for calibrating the expected mean using historical observations. I’ve toyed with some variations which use a sigmoidal lag to model the intuition that short-term memory is fairly consistent until it falls off a cliff (generational gap?) until an inflection point in which “ancient” events receive about the same weight as do one generation-removed ones. A sigmoidal curve enjoy broad academic support: work in behavioral economics (i…e, “Prospect Theory”), cognitive neuroscience (i.e., synaptic models), computer science (neural network decision models), and the simple observation that man’s failure to learn from the past is prologue to “history repeats itself”.

But then again, none of these methods is inherently forward looking.

Thank you Yuval.

The negative correlation between last year’s returns and the previous year’s returns was not immediately obvious to me (until your post).

Quite the argument for making all excess returns annualized and providing annualized excess returns since inception, I think.

See Chaim’s feature request here: [url=https://www.portfolio123.com/mvnforum/viewthread_thread,10558]https://www.portfolio123.com/mvnforum/viewthread_thread,10558[/url]

BTW, here is the plot of Yuval’s results (I get the same results).

Maybe I am missing something but I think this is interesting. This goes beyond regression to the mean and ACTUALLY HAS A NEGATIVE CORRELATION.

It might be worth considering whether there might be some survivorship bias. Perhaps, some bad ports got lucky the first year and were not withdrawn but could not continue their lucky streak. I am not sure that answers this.


David, this would make sense IF the most recent numbers showed a slightly higher correlation to OOS performance than long-ago numbers. But they don’t. In my correspondence studies, a 10-year look-back period shows the highest correlation to future performance. Anything less than six years is definitely suboptimal. On the other hand, it’s probable that a 30-year look-back period would also be suboptimal. If you could weight 10 years higher than 20 and higher than 5, that kind of weighting would make more sense than weighting the most recent period highest. The example I began this post with shows the problem of weighting recent results too high.

[quote]
So I did another study. If I restrict the designer models to those with more than five stocks, the correlation of performance between last year and the year before last is -0.16 (i.e. an inverse correlation) while the correlation of performance between last year and the ten years before that (including backtested results) is 0.26.
[/quote]What about the correlation between 2015 performance and the ten years before that?

Was 2015 an atypical year?

Chaim,

Good point. I wonder if some of this effect might disappear if the value models are compared to the S&P 1500 Pure Value Benchmark.

-Jim

I’ll do another study tonight or tomorrow and let you know!

"I start with what I believe to be true, and then build a model around that belief. "

Primus - Theories don’t provide “the truth”, but they are often used as such, even when subsequent observations are made that contradict such theories. http://stockmarketstudent.com/stock-market-student-blog/christopher-columbus-was-wrong