So…the purpose of this thread is to see if there is sufficient interest to find 10-20 participants to invest 5-10 hours each to conduct a study aiming to find a mechanical model for better system selection; i.e. predicting out of sample (OS) benchmark relative system performance.
I will lay out the case and proposed framework below.
There seem to be three major stages to ‘beating the market’ with quant. strat’s.
- Generating good looking sim’s.
- Generating a process for reliably choosing among those sim’s.
- Generating a reliable process for weighting, combining and allocating to that basket of sim’s.
This thread deal with issue #2.
- Why a simple mechanical model?
A significant body of research shows that mechanical models nearly always do as well as expert forecasters and in a significant number of cases and contexts significantly outperform them by a wide margin. Some of the specific reasons hypothesized for this are identified as follows: - Humans fail to identify the most salient variables that influence outcomes.
- Humans suboptimally weight decision variables.
- Humans are unduly influenced by the most recent events.
- Humans fail to gather / monitor outcomes and therefore fail to learn from decision and improve decision processes.
- Humans have emotions, get tired, etc…and these moods affect decision making.
A sample meta study from the research is attached. There are many other studies in this area.
Expert forecasters tend to do most poorly when input variables are numerous. Experts often do worse than naive forecasters in these situations, but are much more confident in their forecasts (for example in clinical diagnosis, experts do worse…if they have conducted an interview with the patient).
- Can a simple mechanical system be developed for the question of ‘How likely is a system to outperfom a benchmark out of sample?’
I think yes. This question is very analogous to the following questions that have been posed and answered rather well within the field of finance, by simple mechnical models:
A. Can we predict if a firm is likely engaging in financial reporting manipulation (accruals; Richard Sloan and ‘The Detection of Earning Manipulation’).
B. Can we predict if a firm is likely to experience financial distress (In search of distress risk).
C. Can we predict if a firm is likely to experience bankruptcy .
All of these questions have been answered with a very similar methodology. The predictive formulas created by the studies have held up reasonably well for a decade of so, in many cases, out of sample.
- How would the study work?
The DATA SET. Half of the data set, from 1/2000 to 12/31/2006 would be in-sample (IS). From 1/2007 to 12/25/2013 would be out-of-sample (OS). Within the in-sample date range, we would look for 100-200 sim’s to test. Half would be used for training. Half for testing resulting models.
These sim’s would be (ideally) drawn 50% from public sim’s and 50% from private sim’s. No one would see rules on sim’s. Ideally half-the sim’s would have continued to work in some way out of sample. Half would have failed. Ideally sim’s would come from a variety of liquidity ranges…although they may all focus on micro/small caps. That’s TBD.
THE PREDICTED VARIABLE. The predicted variable would be ‘outperforming the SP500 (or Russell2000)’ out-of-sample (OS). I would suggest measuring this Out of sample outperformance by (SIM_AR%*SIM_Sortino)-(SP500AR%-SP500Sortino).
THE INPUT VARIABLES. We could debate this endlessly. But…I would initially suggest (IS=In sample):
a) IS AR%
b) IS Sortino
c) Total number of factors in the ranking system
d) Ranking system weights optimized or not (1=optimized, 0=not).
e) Number of positions / holdings in the sim
f) Min. Liquidity of the system
g) Annual Turnover
h) Estimated annual trading costs (run sim once with zero slippage and once with variable slippage and then take the difference).
i) Number of buy rules
j) Total number of market timing / hedge rules
k) Total number of sell rules
How study would work.
- Agree on a list of the systems to include and divide them up based on people participating.
- Each person codes the above in an excel or google spreadsheet ‘data set’
- One person runs analysis to find an IS ‘prediction equation’ and presents the top 2-3 equations back to the group. Either SPSS (statistical analysis software) or machine learning ‘predictive analytics’ software can be used. I know how to use these, but no longer have access to the software since my licenses expired.
- Each person then runs an analysis on their systems OS (out of sample) and reports back the predicted SP500 out-performance vs. the actual.
- From this, 1 person combines the results to see which of the top 3 predicted equations worked best, if at all, out of sample.
Incentive to participate. Only the people participating get the study results and final equation.
QUESTIONS:
- Can any one suggest flaws, or improvements or simplifications to the above?
- Does anyone want to volunteer 5-10 hours?
- Does anyone have SPSS or predictive equation generation software they can volunteer to use on the data set?
Best,
Tom
MECHANPREDIC.pdf (1.04 MB)