Working toward a better Market Timing Model

I have added a recent market move requirement to Marc Gerstein’s market timing rule; sma(5,0,#SPEPSCY) > sma(21,0,#SPEPSCY).

My new buy rule is: sma(5,0,#SPEPSCY) > sma(21,0,#SPEPSCY) or BenchClose(0) / BenchClose(10) > 1.05

This rule allows the Sim/Port to also buy if there has been a strong recent move up in the market even though the 5 day average of SPEPSCY has not increased above the 21 day average yet. This allows the Sim to buy back into the market much sooner after the start of a recovery.

My new sell rule is: sma(5,0,#SPEPSCY) < sma(21,0,#SPEPSCY) & BenchClose(0) / BenchClose(10) < 0.99

This rule will not sell stocks when the 5 bar average of SPEPSCY moves less than the 21 bar average unless the recent market is also down. This rule allows the Sim to stay in the market until it actually starts to move down. These 2 rule changes have improved many of my older Sims I have tested.

An example Sim can be seen here.

This simple 5 stock Sim has a 50% annual return since March 2001 and a 0.195 Gain/Stock/Day. That’s pretty good for a Sim that requires a liquidity greater than $2,000,000.
Be sure to note the ranking system custom universe pre-filter ADT > $2Mil & Close>1, this filter is: AvgDailyTot(60) > 2000000 & Close(0) > 1

Let us know if anyone finds a better combination of factors.

Denny :sunglasses:

Very interesting.

I was curious though… instead of using the S&P 500 or the current available benchmarks is it possible to use a system’s equity curve for market timing ?

An example would be a portfolio that invests in up to 10 or 20 stocks, but uses 100 stocks (using the same criteria) to create it’s own index/ Equity Curve.

Livermore did this with the “Livermore Key” and Wyckoff also used the “Wyckoff Wave” to achieve the same thing. There are other successful traders that use the same approach. Essentially, they created their own index using their own criteria.

The advantage I see is that it would seem to be more accurate with timing turning points and not pegged to an arbitrary index like the S&P 500. The stocks you invest in may have nothing to do with the S&P… in fact often it can be misleading.

Unfortunately, I’m not sure there is a way to do this in P123.

ideas ? Anyone try something like this ?

Hi,
There has been a discussion on Equity Cuve timing:
http://www.portfolio123.com/mvnforum/viewthread?thread=3781#17621

It is definitely possible to do, using a sim or a separate porfolio to time a real portfolio. The advantage it offers over market timing is that it can get you out when the strategy appears to be breaking down, but like real market timing results can vary. In my tests I didn’t find a clear advantage of one over the other.

Denny, very good performance for such large stocks. Thanks for posting.

Don

thx. Great info on these forums.

Denny,
Thank you for sharing and all your contributions to this board

Denny,

Thanks for your professional & generous contribution of excellent market timing rules with all the board.

Regards,

Essam

Thanks, Denny. I, too, have been working on some variations on this theme. The nature of my work doesn’t allow me to be as explicit, but I can say that instead of moving averages of eps, I’ve been working with momentum, e.g. close(0, epsnext) - close(16, epsnext) and equity risk premium. This seems to work really well in large cap (R1000) space, but Denny’s approach works better in small and some mid cap space. I’ve also found, which is important for my work, that the larger the starting capital, different market timing approaches work, obviously due to liquidity. I know the forum is somewhat divided on the need for a longer data history, but I would pay to see how some of these ideas performed in the mid to late 90’s, during the bubble. Thanks again Denny for launching this thread.

Ted

Denny thank you for sharing,

I have not been able to produce an all round better simulation then the one you posted unless I reduce the liquidity a lot. It’s nice to see a ranking system that works on low and high liquidity. Building models just got harder. I always thought that market timing rules would improve 90% of sims but it seams that market timing rules only work with certain ranking systems that only work with certain buy rules using certain market timing rules and liquidity. I have tested the market timing rules across many ranking systems and the buy sell rules and there is no way to make any generalizations about what works. I have found that your new market timing rules do improve Long short models for both stocks and ETF’s. Here are some simple examples:

http://www.portfolio123.com/screen_summary.jsp?mt=9&screenid=28043

http://www.portfolio123.com/port_summary.jsp?portid=454895

Mark V.

Denny:

I like to avoid a single day’s value since I fear a one day market move is not indicative of a real market change. On the other hand, I must admit that the single day tests work well. The following rules seem to work a bit better:

Buy - sma(5,0,#SPEPSCY) > sma(21,0,#SPEPSCY) or sma(10,0,#bench) / sma(20,0,#bench) > 1.01

Sell - sma(5,0,#SPEPSCY) < sma(21,0,#SPEPSCY) & sma(10,0,#bench) / sma(20,0,#bench) <1.0

A copy of your sim with the above rules is here . Gain/Stock/Day is 0.198; or 0.203 if you use 1.005 in the buy rule. Testing some other sims and rankings, the above rules seem to work a little better with all those I tested.

Thanks for sharing the rules, sim and ranking. The strategy and returns are impressive for the high liquidity stocks.

Glenn

Hi All:

I use some market timing rules, but I think there needs to be a big cautionary note added to this discussion. We simply do not have enough data to come up with any really good. All we can do is make some intelligent guesses – and resist the urge to leverage our investments thinking we are smarter than the guys who ran AIG.

It is very, very easy to be “fooled by randomness” when developing market timing systems with insufficient data - and the data history we have in insufficient by a factor or 2 to 6 times.

This is in marked contrast to the relative abundance of data we have for testing ranking systems and sims. Testing a ranking system or sim set to weekly rebalance provides about 450 data points (if one forces all stocks to rebalance with sell rank < 101). Now 450 is considerably more than the minimum of 30 data points that is usually required for statistically significant results.

However testing market timing only has a couple dozen data points at best over the past 8 years. For example, if one tries to develop a timing system that only reacts to the deep drops there are just 5 or 6 such data points from 2001 to today (I’m looking at the R2K chart). Since one should have at least 30 data points for statistically reliable results, we have only 1/5 or 1/6 of the data history needed. If one wants a timing system that reacts to relatively mild drops as well as the deep ones, the total is 15-18 which is about 1/2 of the minimum. Yes, there are additional statistical assumptions one can apply to get some statistical results with less than 30 data points, but doing so involves increased reliance on the assumption the data population is “normal”, ie, “well behaved” and we all know what happened to AIG when it acted on that belief.

After that big note of caution, I will admit to putting some market timing rules into real money portfolios. However, I expect that my future experience will be much less satisfying than back tests of these timing rules would indicate. I expect the rules will whipsaw me more often in real life than in the back tests. I also expect the rules will often be later at entries and exits than they were in back tests.

The one thing I know for sure is never ever to use leverage assuming that the market timing rules will avoid financial ruin. That is the one sure tip I can give for those developing timing systems.

Regards,
Brian

Oh, there is another tip I can add. When testing timing ideas, I divide the data history into 2 or 3 sub periods. When I test a market timing rule, I want to see it improve more than one period. If it only improves 1 period, the new rule may be “curve fitting” itself to noise rather than to recurring market patterns.

Regards,
Brian

Denny-
Thank you for the timing model.

I ran it and several other timing models against one of my sims from March 31, 2001 to today. Here are the results

Timing.Model…AR…ATO…DD…%W…Sharpe
Untimed…40…1335…78…53…0.8
13/34 crossover…39…1297… .24…55…1.2
Your model…52…1367…33…61…1.5
NewLows.<40…53…1426…23…62…1.75
Both yours & NL…51…1424…19…65…1.9

The crossover is a 13 day EMA and a 34 day EMA of the S&P 500. Buy if the 13 is above the 34.

The New Lows model is an exposure list I set up. If the 5 day moving average of New Lows on the NYSE is >40, sell.

The last test runs your model and the New Lows model together and only invests if both models are on a buy.

The run with both models operating comes up with a higher Sharpe number and looks smoother.

-Bob

Great stuff, all. One thing I have been paying increasing attention to is the concept of degrees of freedom. There are a couple of different definitions, but the one I think is most useful for our purposes is the number of data points minus the number of adjustable parameters in your model. Number of data points can be thought of as the number of rebalancing periods (e.g. 457 weekly periods currently in P123) and the number of adjustable parameters are all the variables in your ranking system (each moving avg term is a variable, so some factors add more to the count) and all the variables in your Buy/Sell rules. Again, it’s not just the number of rules you have, but the adjustable parameters inside the rules that you have to add up.

I learned a long time ago in my physics and numberical computaion classes that a good rule of thumb is to have 10 data points for each degree of freedom. You have to be careful, though, because if you attempt to increase the number of data points (daily instead of weekly, etc.) you run serous risk of fitting your model to noise. Weekly is as frequent as I ever get.

So, with one of my real money portfolios, I went in and added up my degrees of freedom, and came up with 45. The number of weeks total in P123 history is 457, so I am right at the threshold. Adding too much more complexity to my strategy would put me in serious danger of curve fitting. I noted that a previous poster indicated they test their strategies over different time periods, and some of us also do the “random” ticker testing for further robustness checks. These are all good things to do, but don’t place too much confidence in the results, as the number of data points in each instance is significantly reduced, but your model is just as complex.

I really enjoy threads like this, because they allow me to test “substitute” market timing variables without adding more complexity. But I have stopped making more complex systems because of the serious data history limitations of P123. Although I agree that Short Sims are probably the next highest platform priority, we simply need more data. Not only would it alleviate the pressure induced by curve fitting, which we are all guilty of in some degree, but would allow one to look at performance during a completely different environment, one of chronically overvalued stocks. I wonder how well most of our value/momentum strategies would fair, and whether most of the timing rules would evolve or change. I know some others disagree, but there really is no argument. More data is absolutely necessary.
Happy Hunting,

Ted

Ted:

I fully agree about the benefit of a longer data history. The beating many of our value portfolios took in 2008 might have been reduced if we had a longer data history.

Regards,
Brian

Ted:

Great post on degrees of freedom! I hope it gets the attention it deserves.

I agree that we have 457 data points for testing ranking systems and sims (assuming they are rebalanced weekly and there is a forced sell of all stocks using a sell rule like rank < 101).

However, I do not think we really have anything close to 457 data points for testing the medium and slow timing systems I see discussed here. My gut is telling me many of these data points are dependent upon each other, but I right now I can’t put a precise number on how many “real” or “useful” data points we have.

Let me make the problem concrete. Many of my value/momentum sims do better in 2001-2003 without any market timing (ie never exiting to cash) but these same sims really benefit from being in cash for 2008 up to early 2009. Now if I were to keep testing variables until I got a timing rule that keeps my sims in play for all but 2008, I would get super test results. However, this rule almost certainly would have built a curve fit rule and would let me down in the future.

So in addition to have sufficient weekly data points (457 at present), should we also require that our timing rules generate a minimum number of signals?

This brings me to the 30 rule of thumb. I know that is very rudimentary and somewhat arbitrary rule (25 is almost as good as 30 and 40 is not much better than 30). However, if my data points are closer to 5 than 30, I know I need to be cautious about trusting the results. For example, if I add a new timing rule and it changes 30 or more entry and exit dates, then I assume there is enough data to trust the results. However, if I add a new rule and it only changes 5 entry/exit dates, I do not think one can have any confidence in the results. One could easily be curve fitting to noise with just 5 data points.

Since my understanding of statistics is limited, I would appreciate your opinion on the previous paragraph especially as it relates to testing timing systems.

Regards,
Brian

Ted,
I totally agree that we need to keep the ratio high of the data points to number of degrees-of-freedom or adjustable parameters. However, the key is INDEPENDENT parameters. Many of the typical parameters used in our ranking systems are not truly independent, especially for weekly rebalance.

For example, factors like Price to Sales, Price to Book, and Price to Cash Flow, are not truly independent on the weekly data points. The price will change weekly but the other parameters will only change after the quarterly statements come out. So between quarterly statements they all change at the same ratio, equal to the ratio of the change in the price. Many other parameters change only at the quarterly statement like ROA, ROI, ROE, etc, etc. If we rebalance quarterly then all the above parameters become independent. So we can’t just add up the number of adjustable parameters. We have to access the true independence of each with respect to each other and the rebalance frequency.

Bottom-line; the number of independent parameters will be a sub-set of the number of adjustable parameters for most ranking systems and rebalance periods. So in general the effective ratio of the data points to the number of degrees-of-freedom will be higher than you get by simply adding up the adjustable parameters.

Denny
:sunglasses:

Denny-
I tried using your new timing model on another port, one you wrote called “Denny’s Mkts 200K 10 stks b2s2”. which uses ranking system “Higher Value w/SI 59”

This port has timing rules built in. On the buy side, the rule is:
benchclose(0) > sma(20,0,#bench) or sma(20,0,#bench) > sma(200,0,#bench)

On the sell side, the rule is:
benchclose(0) < sma(20,0,#bench) and sma(20,0,#bench) < sma(200,0,#bench)

Testing it out for all dates from 3/31/01 to today:

Timing…AR…ATO…DD…%W…Sharpe
Untimed…16…942…69…50…0.5
Builtin…23…1118…38…51…1.04
DH new…21…882…34…55…0.98
NewLow<40…33…920…16…63…1.65
DH + NL40…27…892…13…63…1.69

The disappointing thing here is that the new timing system is not as good as the built-in moving average system. So, different kinds of ports/sims work better with one kind of timing system and one with another.

-Bob

Degrees of freedom in regression analysis is part of an attempt to estimate the sampling variance of the algorithmic procedure used to pick a “best model” as a function of a data sample. In standard regression modeling with a mean square error criteria of goodness, the estimated error of a fitted model will be equal to the sum of the error in the inherent bias of the chosen model class and the sampling variance of the estimation procedure for fitting model parameters. Models with higher degrees of freedom have higher sampling variance *because of the way those parameters are adjusted to choose a “best” model as a function of some sample data. Bias and variance trade off directly against one another in this situation - e.g. an n-dimensional multivariate linear model that is chosen using shrinkage estimation will have more bias but lower sampling various than the n-dimensional multivariate linear model that is chosen to directly minimize square error in fitting the data sample.

One can’t say anything at all about the degrees of freedom in a fitted model without being explicit about the procedure used to select the parameter values. Qualitatively speaking, a model with a few parts that is arrived at by a long process of trial of error will have more de facto degrees of freedom than a complicated model that was created with less trial and error/optimization (and/or regularization methods that trade off data fit optimization against other criteria).

It’s very hard to say with stock market prediction what should one count as independent data points and in what senses a historical sample can be thought of as being drawn from the same distribution as the future returns one is trying to predict. It’s clear, for instance, that absolute returns can be very different in different decades, and its also clear that as time goes by, different investment analysis strategies are first invented and later become widely adopted (e.g. looking at ROE and P/S) potentially changing the expected performance relative to a benchmark of stock selection strategies based on those criteria. IMO, the most natural approach is to think of a portfolio selection procedure that at each time t0 is only calibrated looking at data prior to t0 and is evaluated for performance relative to a benchmark in a following time interval t1-t0. By choosing the intervals small enough, one can then get a lot of samples.

Thanks Denny and Grok, good add-ons. Denny, I think I agree with you, but it wouldn’t necessarily prod me to build more comlex models. Most Quant Equity managers I know use some form of principle component analysis to get around the multi-collinearity problem. The best we can do is make sub folders for the factors and try to group the most highly correlated ones together. All price driven factors will have some level of correlation, so this makes sense. But since this is the case, why put in the redundant factors in the first place? One answer is that different factors’ contributions to return classification vary over time. So in truth, the model is more complex than just the groupings would suggest, but not as complex as adding up all the adjustable variables. I guess I choose to err on the conservative side.

Grok, it’s been a while since my time series analysis class, but I totally agree with the basic trade off of bias and variance. Again, I prefer variance over bias, but that’s just me. One can split the data into smaller segments, but noise really is a problem in the financial markets. One backtesting method I would love to see in P123 would be Walk-forward testing. I know it’s probably computationally tough in a server environment, but couple that approach with longer data history, and now we’re talking! Happy hunting all.

Ted

Marc’s article in Forbes about this model is what brought me to P123. There’s a world of less sophisticated investors like me who don’t want to scan huge stock lists weekly. For we with simpler needs I ask the following (and hope I’m posing this question in the right forum):

Questions
The rules include a couple of symbols I can’t find in either the Help or Tools.

  1. Where can one find #SPEPSCY and #SPRP ?

I understand what they mean, I just can’t find them anywhere on the site, figure out how to bring them into a screen or download the data associated with them. They also don’t fit with what I’ve read so far about custom formulas.

  1. I’ve found the EPS Estimate factor in the Tools section, but it appears only to graph the distribution of estimates of S&P500 stocks. How does one calculate an overall estimate with this function? (which it seems is what #SPEPSCY does)

Comment:
The Step by Step section of P123 does a great job of walking users through the normal operation of the screener and simulator, but you (we) may need to add a job aid for newbies like me that explains the mechanics of functions and factors and how one gets from them to symbols like the ones in question 1.

Thanks for the great work and environment here.