Are we over fitting?

I strongly recommend reading this presentation
http://www.davidhbailey.com/dhbtalks/dhb-london-quant.pdf

or this presentation
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2606462

or this paper http://www.ams.org/notices/201405/rnoti-p458.pdf

Then think about it and ask yourself “Am I over fitting?”

And even if I am not over fitting, am I perhaps being fooled by intragenerational or intergenerational over fitting?
(meaning I am basing my systems on publications of people how have done over fitting or how have themselves been victims of intra/intergenerational overfitting)

This post is an appeal to everyone to engage on serious soul searching about their own sims and work so far.
Just reflect on it and your own approach.

Do not go off in anger and start posting your defense, why your ports are not over fitted…

“Over-fitting” unto itself is not bad provided that the investor understands what he/she is doing and why. What is bad is when that person tries to market the results as performance. For example, if I were to make claims that my market timer will produce an equity curve straight up to the heavens as in Market Timer: Summary Of Ranking System Factors, then that would be bad. But laying out clearly what was done / assumptions made and letting the reader decide what if anything is of interest to them, isn’t bad, even if it is over-fit.

Your second link provides some insight:

“Paradoxically, some of the best hedge funds are math-driven:
–Financial firms can conduct research in terms analogous to Scientific laboratories. E.g., deploy an execution algorithm and experiment with alternative configurations (market interaction).
–Financial firms can control for the increased probability of false positives that results from multiple testing. Their research protocols can legally enforce the accounting of the results from all trials carried out by employees.
–In the Industry, out-of-sample testing isthe peer-review. If you don’t make money, you are out of business. Corollary: Backtestcarefully or die.
–Financial firms do not necessarily report their empirical discoveries, thus discovered effects are more likely to persist.
Unless this state of affairs changes, true discoveries in Empirical Finance are more likely to come from the Industry than Academia.”

Now I honestly don’t believe that counting the number of iterations is practical (certainly not on P123) and it leads to a case of false analytics. It just moves up the uncounted iterations to a higher level. i.e. do your test, count the iterations, calculate the deflated Sharpe Ratio, Pass or Fail the results (likely fail), then repeat until you pass. You will always have uncounted iterations.

Take care
Steve

Take care
Steve

My statistics/econometrics education has become a bit rusty, but in general I try to pay attention to the following rules to avoid over-fitting:

1. Limit the number of independent variables and rules
The more independent variables (such as cash flow per share or sales growth) and rules (such as SMA50>SMA100) you introduce, the higher the chances of over-fitting.

2. Do not run too many iterations
The more iterations you run, the higher the chances of arriving at a “great” model by chance.

3. Have a large number of transactions
Assuming key return/risk results are the same, I would rather invest my money into a stock simulation with 1000 trades over 15 years than an ETF sector rotation model that trades 10 times in 15 years.

4. Check parameter stability
I stay away from a model that falls apart when changing one of my (few!) parameters a little. For example, my universe consists of the top 10% EPS growing S&P500 companies and delivers 25% p.a., but including the top 15% S&P500 companies results in 10% p.a.

5. Exclude top performers
P123 enables you to exclude the stocks which perform best in a simulation. If a model’s performance crashes down after removing its best performers, odds are it was “lucky” (a.k.a. over-fitting).

Tobias,
Thanks for the links to these articles.

All,
The last article makes a point that I have been making for a while: some types of optimization are more harmful than others. This is best illustrated by the images below from the article. In the first image the out-of-sample results are not as good as the in-sample results. But the out-of sample results are not made worse–on average–by following the strategy compared to no strategy at all. Illustrated by the horizontal regression line. In the second image the strategy causes you to do worse than you would have if you followed no strategy (on average). Illustrated by the downward-slopping regression line.

So what factors are more harmful than others? I’m not sure I have a good answer to this but I think it is important and I will try to start the discussion.

I will really have to think about what factors are not harmful or not as harmful. Maybe they are all harmful. But the second image has an answer as to what is almost certainly harmful. It talks about serial correlation. This usual refers to a time-series. So market timing and using get-series too much could cause harmful over-optimization.

BTW, I am more and more convinced that CyberJoe is right about all of that. More right with some factors than others.

-Jim



Jim et All - There are problems with such analysis. They are making generalizations out of selective (and sometimes silly) cases. It isn’t difficult to reference Fibonacci or astrology then throw everything out (including the baby) with the bath water. For example, there are market timing strategies that have worked for decades, such as New NYSE 3 Month Hi/Los.

Steve

Steve,

I do not think the article said testing and optimization of any factor is always bad. The goal would be to avoid OVER optimization.

I certainly do not have all of the answers on how to do this. If the authors do they have not given out all of their secrets in this short article. Actually, I think CyberJoe has a good start on this topic: how to avoid the overuse of potentially harmful factors.

-Jim

Some of the more successful/profitable hedge funds are currently avidly hiring quants (scientists, mathematicians, and programmers) to mine the data for profitable trading/investing strategies. They are looking for the optimization “sweet spot” and likely employing the ideas in Bailey’s latest paper: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2819847

Scott

Jim - I appreciate the distinction CyberJoe and you are making. However, I have doubts that we can discern the difference between optimization and over-optimization. Think about it for a minute. The first author says that the markets are effectively made efficient due to HFTs and arbitration strategies. With markets apparently this difficult, how does one propose a theory, test it with extremely few iterations, and come to the conclusion that it “works”? In my opinion, this is never going to happen and not a practical suggestion. These authors are blind to the fact that one has to optimize/over-optimize at some level, if not at the backtest level, then at the theory proposal level. In other words, somewhere along the line, they are not counting iterations.

If I read you correctly, you were suggesting that market timing was detrimental. Hence my last response.

Steve

None of the three papers consider the differences between technical analysis and fundamental analysis, which are huge. Garbage in, garbage out needs to be the first principle of backtesting. If you’re trying to optimize a strategy based on stuff that makes no financial sense–stop losses, Sharpe ratios, best days of the week, trends, and so on–you’re going to be ensnared by all sorts of traps. Overfitting when backtesting using fundamentals can happen too, but there are lots of ways to make backtests more robust. The answer isn’t to backtest less, but to examine every measure to make sure it makes logical financial sense. If it doesn’t, either throw it out or improve it. Now matter how good the data is, you can’t stand by a measure that you can’t explain. Personally, I have found that including measures that relate today’s price to a price anytime in the past have produced much more unreliable results than sticking with fundamentals. I would bet that if the authors of these papers compared backtesting using past prices to backtesting using fundamentals, and if they stopped relying on the Sharpe ratio (which has almost no predictive power) and relied instead on a simple CAGR, they’d find that their conclusions need to be altered.

Yuval - everything you say is right on, but there are some issues with “examine every measure to make sure it makes logical financial sense”. The problem is that what makes financial sense doesn’t consider market behavior, what makes sense may still result in significant losses. That is why we try to capture market behavior the best we can through optimization.

Steve

Steve,

You make an excellent point(s). Too many good points to comment on all of them. But I think I am restating one of your excellent points when I phrase it this way:

If there are only a few gems out there you are going to have to sort through a lot of sand to find one of them.

If it is an efficient market or a pretty-efficient market then you should expect nothing from your first backtest whether it is optimized or not. You are going to have to backtest more than once to find a truly good strategy especially if you are trying to find something that hasn’t already been published.

I do not claim to know–or to be able to articulate in one post if I did–how to balance the need for a lot of tests with the problems or overfitting.

Mine is a slightly different point. It helps if the factors that are overfit are not harmful when I fail to find the proper balance-like the first image that I burrowed from the article.

All my early models were horribly overfit. I’m convinced some/many SA models are overfit due to fantastical historic simulation and overly-precise and overly-mathematical description from the designer.

More and more, I ask myself “what are the characteristics of the stocks I want to own in this strategy” and pay less and less attention to the historic simulation.

Very good point, Steve. But does “market behavior” have predictive power the way that “financial sense” does? After all, the whole point of backtesting and optimization is to create models that will survive out of sample. “Market behavior” has to be just as explainable as “financial sense” for it to have any out-of-sample power, and in addition there should be a good reason for it to persist.

Let’s take two examples of market behavior. One: stocks with a high short ratio tend to perform worse, on the whole, than stocks with a low short ratio. That, to me, is not only explainable, but likely to persist. You’re going to find some wonderfully profitable counterexamples of short squeezes, but as a rule of thumb, the short ratio is a rather nice measure of market sentiment, and there’s no good reason why that would suddenly reverse. Two: stocks whose price has risen over the last month are more likely to fall over the next week than to continue rising due to short-term mean reversion. This is also explainable by market behavior: investors overreact to news and are then corrected by the market. This has remained constant for 90-odd years now, according to a study I read. But is it likely to persist? Perhaps in the short-term, but given how differently information is assimilated today than how it used to be, that variable is rapidly changing. I would not put my faith in it.

More quants being hired. From todays Bloomberg:
http://www.bloomberg.com/news/articles/2016-09-09/ubs-wealth-management-s-haefele-joins-battle-to-recruit-quants

Are all these quants just trying to perfect market timing or are they stock picking?

Market timing, stock picking, data mining? I don’t know, but here’s a recent python meetup posting;

I redacted the contact information. If anyone wants that please let me know.

Walter

Yuval - so does the Short Interest have predictive power? Or is it the tenth fundamental factor that you tested but the first one that actually backtests the way you would think it should? When you really think about it, with all the supercomputers out there and hungry hedge funds, why would there be any advantage with short interest? It is a bit too simple and easy to exploit in this very sophisticated world. BTW SiRatio is one of my favorite factors, I’ve used it for a long time. I use it because it works, not because someone with a theory says it should :slight_smile:

Steve

They are making a lot of money playing with numbers. Where do I sign up? Oh I forgot, I don’t have a PhD :slight_smile:

Gee, is it really that hard?
Value, Momentum and Size and low liquidity and diversification works since 1870 and it will work in the future.
Why?
Because it is mentally hard to buy a micro cap stock (where you need a GTC LIMIT Price to not let explode
slippage, its 10 times easier to use a market order on FB),
that has shown momentum (für example hard to buy an ATH, “Oh my good, this has run much too far, I can not buy this”)), has value (hard to detect and also hard to understand to buy the right value and not to buy a falling knife (e.g. combine only with momentum, size and be on the
save side),
has growing EPS Momentum.

Very hard to bare the vola of this kind of stocks and only 0.1% of all investors want to manage a 100 Stock Portfolio
with regular DDs of over 20%.

You can only be successfull, it you do things, that are hard and counterintuitive to do and go the extra mile (100 Stocks for example) that
almost nobody goes. And use your competetive advantage nobody talks and writes about: the relative small size of your portfolio.

Map this criteria to the science of Fama and Frenh and you know what to do:

Implement the method and train you mind to trust a 100% counterintuative investment, trading
strategy and keep it KISS. Let us just do it and stop overcomplicating things.

Regards

Andreas

Hello Andreas,

WISE words ! But stock market afficionados are “fiddlers”. We keep fiddling at the edges to “smooth out” the equity curve, which is creating the problems we want to solve. KISS and “keeping the rough edges” are basic principles and counter-intuitive. Therefore, nobody likes them.

Just like you, I think sticking to these principles will let us mine gold if we can tolerate “feeling terrible” when executing a strategy as you outlined above.

Werner