Excess Return of Selected Quant Strategies - Backtested vs Live Performance (Repost)

I am reposting this from another thread “Tactical Investment Algorithms” due to its importance.

Similar to the rule that now applies to new Designer Models in Portfolio123 (as mentioned by Jim in the thread),we should all consider setting a 3-6 months evaluation period to observe the out of sample performance (if you are not doing any additional model validation like the walk forward method ) before putting any money in newly created backtested strategies.

Regards
James

Suhonen, Lennkh, and Perez analyzed the backtested and live excess returns of 215 quantitative strategies issued by fifteen investment banks between 2005 and 2015. The universe includes strategies from equities, fixed income, currencies, commodities, and multi-assets. The research paper shows a significant difference between the in-sample and out-of-sample performance.

Naturally some strategies are expected to generate worse returns than during backtesting as no strategy performs consistently. However, all strategies generate significantly lower returns once live, which highlights that the strategies are either the result of data mining or are impacted more negatively by transaction and market impact costs than expected by the investment banks.

The difference between backtested and live performance was greatest for equity strategies, which perhaps highlights the intricacy of dealing with thousands of individual stocks versus only a few commodities or currencies. The researchers conclude that all backtested returns require a discount, which should be proportional to the complexity of a strategy.


1 Like

James,

Interesting.

Looks like Marc was right to de-emphasize the backtests for the Designer Models and to require some out-of-sample performance.

Perhaps, even the pros should be following Marc’s advice, as you suggest.

Best,

Jim

A three-to-six-month evaluation period may be OK if you’re rebalancing daily or weekly and are trading lots of stocks. It won’t tell you a thing if you’re rebalancing quarterly or yearly and holding a handful of stocks. And even if you’re in-between, like I am, it won’t tell you much. Three years would be preferable. But who wants to wait that long?

Regarding the study, it’s heartening to see that all six of these strategies earned excess returns in the out-of-sample period. It just goes to show that quantitative stock-picking does work, even if some of our own designer models have been struggling lately.

Yuval,

I was wondering what the optimal period for paper trading is myself. I think I agree: “may be OK if you’re rebalancing daily or weekly.”

Personally, I think it couuld take longer even with a weekly rebalance.

And you are right that is excess returns in the study. I have amended that in my post.

I also removed the above. Looks like we are doing worse than the pros with our Designer Models. We do not beat our benchmarks (on average) even with survivorship bias, I think. I welcome any corrections if I am wrong about that.

This may be purely the backtest of the models as the graphs suggests. But in deciding whether to go ahead with the models I would be surprised if they did not use more than just the backtest results. What they could have used would not be limited to any validation studies or use of a hold-out test sample, of course. It could include input from someone trained in finance on how reasonable the factors are or the market conditions, for example.

Whatever I think the pros are doing in this study, I might try to emulate some of it when I develop my own personal models. I do like to beat the benchmark.

Best regards,

Jim

Jim,

I have done some research on the availability of additional model validation tools (other than pure backtesting) on the common trading platforms.

It seems that Amibroker, Ninjatrader, Tradestation and Multicharts all have walk forward, monte carlo simulations tools on their respective platforms (although most of them do not offer fundamental analysis).

Their platforms are also not as user friendly as Portfolio123 but it appears there is room for improvement to make additional backtesting and model validation easier in Portfolio123. I am sure this will also lead to better performance in the Designer Models based on the additional validation that can be performed.

Regards
James

James,

I think I can learn from what those platforms do. I do agree that much of this type of thing is generally accepted, taught and encouraged on Quantopian. I am going through some of their lectures and videos.

For example, you and I have done a few things with pairs trading and Quantopian has several videos (with notes and code) about this—as you know.

But I love P123 and I am truly not trying to recommend anything for the platform at this time.

We all do some things, on our own, both in and out of the platform. I am very sure that some people do more with Python or with spreadsheets than I do.

Within the platform one can:

  1. Use even/odd universes for example.

  2. One can optimize a ranking system up until 2015 and then test it out-of-sample form 2015 until now.

I put this in bold only because it will give very close to the same results as walk-forward and it is already easily implemented on this platform if people think this is useful. I am not really recommending this to anyone. There is the potential for "data snooping with this method (as there is with walk-forward validation). And this, of course, is far from a complete of essential in platform methods.

But again, a lot of people do some things outside of the platform. Many much more than I do.

Marco has already said he will probably be implementing Python for a power user. Of course there is work involved and there will be increased computer resources involved—with increased costs.

I will have the option of:

  1. finding something that I think works well with the platform–as it is–and keeping my cost down

  2. Paying more for Python when it becomes available

  3. Waiting for some of the developments that will be coming to this platform with time.

I truly apologize to anyone if I implied that I do not like the platform as it is or if it even appeared that I was advocating for any changes.

Changes are coming and a lot of debate from me will not change that. I do not believe I have the slightest bit of impact on the future developments for the P123 platform, in fact.

This is all just in the interest of discussion and perhaps learning from that discussion.

Best,

Jim

Jim,

Based on the different sources (including the findings in this new attached paper), I have decided to focus on risk-on/risk-off investing using ETFs (QQQ/XLK for risk-on and GLD/TLT for risk-off).

The recommended hair-cut/discount of 33%-50% (as suggsted in both papers) that need to be taken off the Sharpe ratos and Excess Returns from backtest results are huge enough to offset any gains that can be obtained from simply investing in the above ETFs combinations.

In the meantime, my portfolio is 100% risk-off in GLD/TLT and I will not be cloning ETFs/hedge funds even when we are back in risk-on mode.

Regards
James


Evaluating Trading Strategies.pdf (622 KB)

James,

That is a good paper I think. I will study it closely.

Looks like you have thought about what the “Bonferroni test” or correction does to the significance of all of the backtest results. The number that I have done anyway. I need a p-value < 0.00005 if I have done 1000 backtests for it to approach signficance.

Or more simply I have some good backtests that do well due to pure chance alone.

Just another way of saying Marc has a point about backtests put into statistical terms. Nothing controversial there I think.

Plus, I am liking simple/easy these days.

Best,

Jim