Isn’t testing for over-fit-ness different from testing for robustness?
Being over-fit has to do with having too few data points and too many rules. You test for it by setting aside data for out-of-sample testing. Such as designing and optimizing with even stocks then testing with odd stocks. Or designing with small caps and testing with mid caps. When we get international stocks it will be a tremendous value, if for nothing other than out-of-sample testing.
Robustness has to do with the results being insensitive to variation. But there is a wide variety of variation that you can cover. You can vary: the rank weightings, the buy rules, the sell rules, the universe, the number of positions, etc. Wherever you have a number, you can vary it. If you have a rule like PE < 13, the way to vary it is to try PE < 12, then PE < 14, etc. If you vary factor A by 10% and the results change by 50%, and if you vary factor B by 10% and the results change by 1%, then the system is robust to factor B but fragile to factor A. (This assumes varying your factors by 10% is reasonable, which you have to justify that 10% is better than, say 5%).
You also may want robustness to macro economic conditions. In which case you need more data that covers more different types of environments.
It’s not clear what using Random < 0.8 actually tests for. It pushes you down the ranking system, right? So it’s like randomly removing highly ranked stocks that you would have bought. Does this test for over-fitting or robustness. I’m scratching my head. It doesn’t actually vary the rank weights or the rules. I guess it’s varying the universe. So at best, it varies one dimension, but leaves all other dimensions fixed. As we know, each model is multi-dimensional, and it’s very easy to unintentionally neglict important dimensions.
And the future can always change in unpredictable ways that no amount of statistical rigor can account for. But if a model is over-fit or fragile, then you shouldn’t even consider it, because it’s not realistic. So statistical rigor is necessary, but not sufficent, for a model to work in real time.
Seems like “over-fit” is a clear, unambiguous term. If you have two data points and fit a quadratic equation through it, it is over fit, no confusion there. Robustness is much more murky. Robust to what? Robust to the collapse of capitalism and a returnn of communism? Robust to a nuclear bomb in New York city? Different people will demand different levels of robustness, just like different people have different risk requirements.