Rallan,
I agree with caution in investing in any R2G or system you did not build and test yourself, including mine. But… just wanted to ‘waste’ some time and mull over the possible factors involved.
‘Over optimization’ is only one source of potential underperformance and it’s probably way too simplistic to view that as the sole cause of any system that’s underperformed.
Look at, for example, the Wisdom Tree ETF’s. They are based on the research of the very prominent (and methodoligically rigorous) prof. Jeremy Siegel. They are basically ‘fundamentally weighted’ indexes. So, for example, EPS, weights the SP500 on earnings instead of market cap. There is a very long back test history to suggest this single factor has alpha. 500 positions are held or so. Since launch in 2007, the index has matched the ETF SPY (64% or so returns). But significantly lagged an equal weighted SP500 index. The actual ETF has had some style drift and has underperformed (60% vs. 65%). Even with over 7 years of history and 500 holdings style drift has occured. Many of the widom tree ETF’s have lagged - with hundreds of stocks and one factor. I don’t think they are ‘over fit.’
Or, look at GTAA, an ETF based on the work of Mebane Faber and very simple in theory. It’s returned 3.45% total since launching in 10/2010, vs. 69% for SPY. That makes investors unhappy, but doesn’t make it curve fit. It does mean an investor likely did poorly, but it doesn’t negate the validity research that grounds it.
Or Cambria (Mebane Faber’s) Foreign shareholder yield, FYLD, has returned -4.08% in 2 years vs. 14+% for SPY. But, SPY is the wrong index.
Or USCI - which is based on a simple system of commodity excess returns acruing to momentum and value factors. The underlying index is sound research, but has done poorly since launching 4-5 years ago. I’ve looked at the core research. I think it’s good research.
So… re: R2G’s, one or two years means close to nothing in assessing a system and determining if it’s ‘overoptimized’ or not. Especially systems with 5 or 10 holdings. And market timing. And/or hedging.
For example, do you think Hemmerling’s " Hemmerling Value Rockets" is not curve-fit (32% out-of-sample excess return), but his Russell 1000 rockets is (-14% out of sample excess return)?
Or, looking at Shiguang, do you think his “Alpha Max - 10 Large Cap Stocks w/ Improved Metrics-V4 - No Hedge” with 7% out of sample excess return is not curve fit, but his “Alpha Max - 10 Large Cap Value Stocks w/ Improved Metrics-V5 - No Hedge” with -10% out of sample excess return is curve fit?
Or using my own launched models. Do you think my “*Tom’s SX20 - 20 Value-Quality SP1500 Stocks $2MM liquidity + Hedge” (7% excess return since launch) is not curve fit, but my “*Tom’s SX10 - 10 Contrarian US Mid & Large Cap SP Stocks + Hdge” is curve fit? I don’t think either one is curve fit. I’ve review what’s ‘inside’. That’s me. They may not make people money. But, I don’t think overfitting is the issue.
In terms of the Contrarian, I’ve ‘decomposed the sources of poor performance.’ They come down to -8% to a hedge loss, -5% to the ranking system factors lagging in this year and -2% or so to ‘style drift’ of the 10 stock system around the ranking. I’ve reviewed everything inside and think the system remains sound. I have updated the hedge to weekly rather than monthly, but I’ve also looked back at the basic underpinnings of them and find it sound.
But, all R2G’s for the most part, may be curve fit, but poor o-s-s performance is as or more likely that deviations in performance come from a) the use or market timing and hedging, b) the small number of holdings, c) varying start dates and random fluctuations, d) the short times since inception and e) (often) the lack of proper benchmarks are much greater source of year to year variation.
If I had it to do over again, I’d probably only mostly launch R2G’s with 20-50 holdings. But, no one would sign up. But style drift around the underlying indexes would be much less. And, I’d separate all timing and hedging into stand-alone modules. And limit rankings to 10 factors and remove most buy and sell rules - except for a small number of quality and liquidity filters. But, that’s just me. That’s what I’ve done on most of my systems with updates since launch. That’s where most of my money is invested. But, there are many other ways to build and test profitable systems - including 1,000’s of factors. So, I don’t claim that’s the only way. It’s just where my personal conviction is heaviest.
Some systems work after launch and some don’t. But, very good short-term performance likely shouldn’t give that much more confidence that a system is not overfit than mediocre performance. The holdings are too low and time frame too short.
Best,
Tom