Benefits of Diversification

With recent talk about large vs. small ports, I decided I would make the case for diversification. If you disagree please do say; I thrive on considering viewpoints and counter viewpoints.

Here is my arguement:

Scientifically, to be confident of a result, it has to be “statistically significant”. Every result which has been measured is subject to errors, which have to be considered: If the error is too large relative to the observation, then little conclusions can be drawn.

This applies to ranking systems and indeed real time performance and simulation. There are two obvious sources of “error” why real time performance may not match future performance:

  1. Benefit of hindsight while constructing ranking system; look-ahead bias
  2. Change of market conditions: factors falling out of favour (RE: Price2Sales).

There are then the less obvious sources of “error”, and these are the effects of the randomness of stocks. However, unlike the above two errors, which are systematic and unkown, the “random error” problem is more tractable. Unfortunately, “random errors” creep in twice:

  1. In the construction of the ranking system
  2. In the real time use of the ranking system.

To illustrate the point of random errors, I have created a “random” ranking system, and ranked using my minimum liquidity prescreen. This is a universe of just under 5,000 stocks.

The “random” ranking system is useful, because it allows us to construct “dartboard portfolios”. This is akin to selecting your portfolio by putting tickers on a wall and throwing darts at it, this is akin to the acedemic “random portfolio”.

Now, I did some tests, and an equal weighted porfolio of all stocks in the minimum liquidity universe has returned about 10.25% per annum since March 2001. Please note that this is greater than the return of the S&P500 but the latter index is not equally weighted, with large companies dominating the return of the index.

My question is this: Given the dartboard portfolio, how close can you get to the “average” return, and more specifically, how likely are you to not achieve a substandard return?

Quite often, one looks at (the admittedly arbitrary) 95% confidence level, so I want to ask, for any given portfolio size of stocks, what level of return below the “average” return do I have a 95% confidence of achieving; specifically, in percentage terms, how far below the average return can a portfolio deviate within 95% confidence?

First of all, look at the random ranking system with 5 buckets. Each “porfolio” reprents 1000 stocks being churned every 4 weeks (as with all the others). There is some variation but every bucket is pretty close to the average 10% return. To see the 95% confidence, you would have to look at the bucket that is in the top of the worst performing 5% of buckets - obviously not possible with 5 buckets, but possible with 20 or more buckets.

Now look at the system with 20 buckets. Again this time, each bucket contains around 250 stocks, which is actually far more than is frequently considered “necessary” to be diversified. Yet despite this, look at the variation in performance. The bottom performing bucket has a return below 8% per annum, a full 2% below the average. Implication: Even with 250 stocks in the portfolio, the 95% confidence level suggest you could underperform by as much as 20% below the average rate of return! Put another way, there is a 5% risk your return is 20% below the average.

Now lets consider 50 buckets, the next picture down. To look at the 95% confidence level, I consider the 2nd worst performing bucket - a return of under 7%. With a (roughly) 100 stock portfolio, there is a 5% risk of underperforming by as much as 30% below the average.

Now 100 buckets (50 stocks per bucket): I look at the 4th worst performing bucket - a return of 4.5%, fully 55% below the average return!

Now 200 buckets (25 stocks per bucket): Look at the 9th worst performing bucket: - a return of just 2%, an incredible 80% below the average return!

Conclusion: 25 stocks is nowhere near enough to be statistically confident of an acceptable return. (let alone 3-5 stocks).

Now, these were random buckets, but if you imagine their performance in terms of “deviation from an index”, the results are infutable. If you have a ranking system that puts you into “value” because you believe “value” will outperform, then effectively your universe is now “all value stocks”. However, even if the index does well, your result may still be very poor due to random fluctuations, you need a large number of stocks to be statistically confident of achieving the good result, and outperforming the market.

I believe this sort of effect explains why ranking systems such as TF12 don’t have their top performing bucket as 99-100 in 2007, but if you divide into 20 buckets, the top bucket retains its top performance.

Comments?


random5buckets.png


random20buckets.png


random50buckets.png



Olikea -
One thing that needs to be addressed in the small port versus large port or diversification argument is the “Denny approach”. And that is to have only a small number of stocks in several ports. The strategy for each port being if a stock is under-performing then sell it and buy another stock which may perform better in the future. This kind of makes sense because most people don’t want to buy all stocks in a top ranking bucket. As you indicate in your post the stock holding process is hit and miss. There might be several hundred stocks in one ranking bucket but you only want to buy a few. So if the stock you are holding isn’t working out then dump it and get another one with the chance of the new stock performing better.

The question I pose (and I don’t know the answer) is whether or not the strategy makes sense. The things that need to be considered are:

The Denny approach assumes certain statistics regarding any (random?) trade. But if you sell an under-performing stock then do the statistics still hold? The sale causes a position to be prematurely aborted. What I mean by this is that the stock is still in a particular ranking bucket but it has been dumped. Would the sold stock be likely to continue under-performing into the future or will there be some mean reversion (i.e. will it start overachieving) as per a pullback type system?

How does one evaluate the ranking system for this type of strategy. Obviously some momentum type ranking systems would likely be better than others with regards to future performance versus previous performance.

How does one evaluate multiple systems with small number of stocks versus one system with large number of stocks. The multiple system approach likely means some overlap of holdings. How is this addressed in back-testing etc.

Steve

Hi Olikea,

I understand your analysis comparing random distribution, number of stocks and the average return for the random distribution. I need to make sure that I understand the shift of that logic to a ranking system.

If I understand (what I do understand) correctly, you are equating a ranking system, such as a value ranking system (lets say VALRANK) to an index (lets say VALIDX).

If that is the case, then I think I need to clarify/highlight a few points.

  1. VALIDX is equvalent to only the top fractile (bucket) of VALRANK. (agreed)
  2. To hope to acheive the top fractile performance, we have to have enough of the stocks in the top fractile to ensure with a high degree of probabablity that we will get the top fractile performance. (agreed).
  3. The top fractile performance going forward is not necessarily going to be equivalent its historical performance in the ranking system, nor is it going to be better than the lower fractiles (assuming it was better in VALRANK historial results). (agreed)

So then the question that I then have becomes, how does one “ensure” the relative peformance of the top fractile over the lower fractiles? Pick fatter fractiles (5 buckets vs 10 vs 50 buckets)?

I think the approach to answer this question is to examine the performance AND volatility of the top fractile as compared to the other fractiles. And when I realize this, then I see a new feature we need: For each fractile, show the solid bar as its average annual return (compounded average return is what we have today?), show an inside bar (or whatever makes sense for display purposes) as the fractile’s standard deviation (for a requested period - eg monthly, annually), and finally, show in a candle bar format (again, what ever makes sense for display purposes) the fractile’s high/low return (same periodicity choice as above) over the total period under study. Then a) we would have a better idea of whether this was a fractile we wanted as an index and b) we could drop the statistics into a correlation analysis/effecient frontier analysis and pick the right combo and weight of ports to own. BTW, this also points out why we need more historical data…

So, I agree with your thoughts on how we need sufficient numbers of shares to capture the performance of a fractile with statistical confidence, and I agree it can be used as our index. But I don’t think we have enough in the way of tools to assess our index as we do indices of other asset classes, and whether we would even want it as an index.

Finally, with regards to Stitts comments, I would say that that is the introduction of “active” investing - stock picking- vs the above, which is an index based approach - ala O’Shaunessey (sp?) “strategic index construction.”

Carl

To be honest, i am not sure I entirely understand the idea of " If I bought a loser I sell it and try to buy a winner." Trend following only seems to work at multi-month time horizons, anything more short term is just noise. In my view, liquidating a position just because it goes a bit negative seems to disallow a “mean reversion” process.

However, it is also possible, that trend following, a la “Turtle” may be at work… buy on extreme strength with a tight stop, keep losses small, and follow through the trends. However, these types of trading are characterised by a low win rate, typically 35%, not to what Denny seems to be suggesting.

Of course, we would all like to buy winners, but abset a crystal ball I don’t see how that is possible…

In a certain sense I agree, but also in a certain sense I don’t. I think there is enough information to make a “coarse” judgement, that maybe, to be really statistically confident you want 100 (or so) stocks, nevertheless, quite a bit higher than the conventional wisdom.

Unfortunately, the market doesn’t really bother to obey the laws of statistics (fat tails etc), so even if we think we have something “statistically confident”, it may not be. However, if we know we have something that is “statistically unconfident”, then it really really is uncofident!

Oliver,

Well your first post is very interesting. Your statistics are absolutely correct. But the main problem with statistics is if you don’t define the real problem correctly the statistics can’t help you. I find it interesting that if we thru darts over the last 6 years we would have a 95% chance of achieving 10.25% annual return. That’s not so bad! So then, that’s the baseline. I think that everyone at P123 feels that we can do much better than throwing darts.

So what real assumptions can we make that will narrow down the universe of 5000 stocks to 20 or 50 for us to buy and still have a high probability of beating this market.

There is a group of companies that are just bad companies. When the market is rising these companies don’t. They have no increase in sales. They never paid a dividend. They are getting smaller and smaller every year. They have negative earnings and earnings are getting worse. They are going bankrupt. They got sued. They have little chance of increasing in value (maybe 5% probability?). You pick the reasons you like best, but it is easy to screen them out.

There are many reasons stocks have increased in price for decades. Sales are increasing. Earnings are increasing. Profits are up. Projections are up. New products enter the market. The economy is booming. Demand is high, and supply is low. They have Momentum. They have Growth. They have Value. Interest rates are falling. You pick the reasons you like the best, but it is easy to screen for them.

So we create Ranking System that finds a subset of stocks that have a higher probability of increasing in value than the baseline above. I’ll use Dan’s Excellent Only Optimize Ranking System that achieves 71% annual return in the top bucket (out of 200 using an AvgDailyTot(60) > 100000 filter). I ran a simple Sim here to test the Ranking System and found that the stocks ranked > 99.5 achieve 71% annual return just like the Ranking System. The Sim also experienced a 28% drawdown last July/August which doesn’t show up in the Ranking System performance. Individually the stocks averaged a 7.61% gain when held for an average of 42 days, and there were 58% winners.

We wonder; what is the probability of actually achieving these returns? Well the standard deviation of this Sim is 27.8%, about twice the S&P 500. We lost 24% during the 2002 recession and a big 28% just last July/August. We would hate to have started a Port based on this Sim in July! But we still haven’t answered the Big Question posed above. Well, we aren’t going to be able to. There is not enough data to be able to declare that we have a significant statistical sample to be able to draw reliable conclusions from the results.

So, because we can’t show a 95% probability that we will achieve a 71% return in the future, we therefore can’t risk putting real money into this approach. BULL!

What else do we know? Examine the 15 factors and functions in the Ranking System. Each and every one of them has been used, referenced, examined, tested, over many decades of market data, and declared to be among the factors purported to be the way to screen for high return stocks in many of your favorite “best investment” books. There is nothing new here. They have just been assembled into a Ranking System that suggests the probability of higher returns when taken in total. Is there excessive optimization here? Probably so. That may be the reason the Sim lost twice as much as the market lost in July/August. However, these 15 factors have proven there metal over many decades.

Individually these factors are not broken. So to declare the combination in the Ranking System broken is not logical. It was just optimized for different market conditions than we are experiencing now. There is probably a different weighting that would work better on average for all market conditions, but we don’t have the data to determine that. We don’t have decades of data to optimize over. We may have to wait for the next Bull Run for it to show its worth.

What do I think the bottom line in this is? I think that we have to take the total body of experience and knowledge of many investment experts that our Ranking Systems are based on and believe in that evidence. There is much more statistical evidence of the probability of high returns than we can ever get from the P123 data alone.

I stepped out and invested in a Port based on Dan’s Ranking System in January 2006 and I have made A LOT OF MONEY with it. Yes, I lost 18% in July/August (My real Port is better than the one above), and this may not be a good time to even be in the market (Recession anyone?). If I had waited until we have enough P123 data to show a high statistical probability of these high returns before I put real money into P123 Ports I would have missed out on some very big profits.

Denny :sunglasses:

Denny - nice post, thank you for replying with such detailed way… let me just respond to a few points:

While in principal I agree, let me just play devil’s advocate for a moment; acedemics have “efficient market hypothesis”, whereby all known information is already discounted and fully reflected in the share price. The share price can, therefore, only move in response to some previously unknown information becoming known. Since unknown information is (by definition) random, stock prices follow a “random walk”.

Now, I don’t for one minute believe markets are perfectly efficient, but I think the other extreme view, the idea that markets are “perfectly inneficient” is wrong as well. If you read a lot investment books, they make out its quite easy, just pick “healthy” growing companies etc. etc.

I just don’t think the market really allows you to do that, it may not be perfectly efficient, but it is fairly efficient, and this makes stock prices unpredictable, at least on a case-by-case basis.

It is my view that the “edge” we can get by using tools like p123, is that other investors, collectively, are biased towards not being sufficiently focused on quantitative factors. When I first read “Contrarian investment strategies”, that was my first conclusion:- he showed value stocks had outperformed for a long time. The only reason to buy value stocks is a “quantitative” factor, I think investors (arguably humans in general) have a bias to focus on “qualitative” factors.

Now, I will admit I am on increasingly shaky ground when it comes to my understanding of statistics (it was never my favourite subject!). However, if you consider an argument something like this (and yes it may have more holes than swiss cheese):

In the experiment with the “random” ranking system with 200 buckets, 95% of the buckets had a return greater than 2%. The average return was 10%. Truely, the performance of each bucket was affected by “random” factors, and at the 95% confidence level, this could cause a result up to 80% lower than the average result.

Take the TF12 system with the top bucket of 71%. What I am suggesting, is that you could argue there is a 5% chance that 80% of that performance is due to “random” noise, rather than any systemic effect. In fact, thinking about it, given the bucket was specifically designed to have a high return, it is very hard to know how much of the return is down to a statistical fluke.

If you then run in real time, there is a 5% chance your return will be 80% below the stated return of 71%. However, if you increase the bucket size, that number goes down. Even though the top 100th bucket may have a smaller return than the top 200th bucket, the chances of achieving a result close to the stated result are a lot higher.

I am sorry, but Price to sales really is broken: http://www.portfolio123.com/mvnforum/viewthread?thread=3034

And the factor makes up about 21% of the TF12 ranking system…

Large stock portfolios based on TF12 have not broken down in the same way as the small portfs based on it, which have felt the full force of the brunt of the PSR meltdown:

http://www.portfolio123.com/port_summary.jsp?portid=290818

In my mind, these sorts of drawdowns are unnaceptable given that we are not actually in a bear market (yet!).

Perhaps I am biased; I have only really seen the downside of small stock ports, frequently performing dismally; larger ports have performed much better. Perhaps when things turn around small ports will perform wonderfully: But I know this - if you have a drawdown of 50% you need a 100% gain to get back, too much volatility really is bad for the long term equity curve, no matter what the expected returns!

Oliver,

I got to wondering about how does a factor like Price to Sales fail after so many decades of working, and why do stocks with the best Price to Sales perform worse than lower ranked stocks. So I set up a Ranking System here with Price to Sales as the only factor. I ran the Ranking System performance starting in 7/14/07 to 1/05/08 using the filter; AvgDailyTot(60) > 100000 & Price > 1 (4925 stocks pass this filter when using the Screener). I used 200 buckets, Price > 1, and 1 week rebalance frequency I got a whopping -81% annual return in the top bucket. This will obviously drag down even the best Ranking System.

So I decided to look at the stocks that were in this top ½% bucket. I ran the Ranking System Rank for 7/14/07 to see which stocks the system would have ranked the highest prior to the July/August market decline. Without the above filter, there are 41 stocks in this top bucket. With the filter, this is reduced to 26 Stocks. I then examined the charts of these 26 stocks.

I looked at the price on July 14 and compared it to the high price in the previous 6 months, and rounded the % change to the nearest 5%. There were 2 stocks that were at their high for the 6 months. There were 3 stocks that had lost 10% or less; 8 stocks lost 15% to 20%; 3 lost 30%; 4 lost 40%; 1 each lost 45%, 50%, 55%, 60%; and 70%.

I then compared the price change between 7/14/07 and 1/5/08. There were 2 stocks that had a gain; 1% and 20%. 3 stocks lost 20% or less; 4 lost 30% to 40%; 1 lost 55%; 5 lost 60% to 70%; 3 lost 70% to 90%; and 7 LOST OVER 90%! Among these 7 are 4 stocks in bankruptcy! +90% in only 6 months! No wonder the top bracket failed!

So, what is happening here? I did a quick check of the top ranked stocks in October and again in December. They Show that the stocks with very high drops in PRICE are rising to the top of the ranks. This is partly because their sales and P123’s sales data are not falling nearly as fast as the price. Normally this would signal “good” value.

I think I know what is happening. When the sales data is as much as 3 months behind the price, and the price is collapsing, this will cause the Price to Sales ratio to be excessively low from a “good” value point of view. When the market is experiencing a downturn, many more companies experience very large drops in price, and therefore, the stocks with the lowest Price to sales are not necessarily the “value” stocks they are thought to be.

Knowing this, how can we filter out these stocks that have excessive price drop? A few approaches come to mind.

One is to add a minimum Price to Sales filter. I added a Boolean filter; Pr2SalesQ > 1, to my Price to Sales Ranking System just to see what would happen here (I have no Idea if this is a good value to use). 3203 stocks pass this and the above filter when using the Screener, so I am removing about 1/3 of the lowest Price to Sales stocks. With this system the top bucket experienced -25.7% compared to the -81% above when run from 07/14/07 to 01/05/08. Much better, but still not something we would like to add to our Ranking Systems for markets in a downturn. I mean after all, we through out 1/3 of the stocks.

Another approach is to put a limit on the maximum a stock’s price can fall over a recent time frame. So I tried Close(0) > Close(200) as a Boolean filter here . With this system the top bucket experienced -40%, so that didn’t work either. Anyone else have any good ideas?

Next, I looked at Dan’s TF12-05 Ranking System. There are 5 factors out of 15 that have Price ratios in their calculations. They are: Pr2CashFlQ; Pr2CashFlTTM; Pr2SalesQ; Prc2SalesIncDebt; and Prc2SalesIncDebt Vs Industry. Cash Flow, Sales, and Sales including Debt are important functions for evaluating value in a stock. So we need to keep them. However, it is the ratio to price that is causing our current problem with these stocks that are collapsing.

So I removed the 5 functions and replaced them with similar functions using change in value instead of price ratio:

Pr2CashFlQ became FCFQ / (FCFA/4)
Pr2CashFlTTM became FCFA / FCFPY
Pr2SalesQ became Sales%ChgPQ
Prc2SalesIncDebt became ((MktCap + DbtLTQ) / SalesQ) / ((MktCap + DbtLTA) / (SalesA / 4))
Prc2SalesIncDebt Vs Ind became ((MktCap + DbtLTQ) / SalesQ) / ((MktCap + DbtLTA) / (SalesA / 4)) Vs Ind

In this new Ranking System I used the same weighting that was used in Dan’s original system for these factors. I ran the Ranking System Reverse Engineering and found that some of these factors had a poor effect on the gain of stocks. So I removed them, and tried a few different weightings. The best system was this one, and it actually had an increase in annual return of +8.5 in the top bucket from 07/14/07 to 01/05/08. It had an annual return of 57.5 in the top bucket from 3/31/01 to 1/05/08.

I then ran simple 20 stock Sims using Dan’s original system and the new system through 2 time periods; 03/31/01 to 01/05/08 and 07/14/07 to 01/05/08. I used the above filter, AvgDailyTot(60) > 100000 & Price > 1 on the Ranking System (Page 2 of the Sim), no buy rules, and Rank < 101 as the only sell rule with “Allow sold stocks to be re-bought
at current rebalance” set to Yes.

For Dan’s original system the first period Sim had an annual return of 81.7% with a 26.8% max drawdown. It had a return of -23% and a max drawdown of 25% for the second period Sim .

For the first period the New V5 system Sim had an annual return of 69.4% with a 23.1% max drawdown. It had a return of +4% and a max drawdown of 20.5% for the second period Sim .

Although the new version significantly out performs the original version since 7/14/07, the reduction of 12% in the overall annual return from 3/31/01resulted in a reduction of the portfolio value from $5.7 Mil to $3.5 Mil or a loss of 40% of the potential gain.

Some of you may decide that this new version fits your needs better since it does not “fail” due to the Price to Sales problem. Personally, I don’t think that it is worth it. I would prefer to use the original system, and through the use of better buy and sell rules avoid the problems caused by the large drops in price that caused Price to Sales to fail.

Denny :sunglasses:

Hi,
In my mind, diversification is up there with the power of compound interest as one of the wonders of the world. It can improve returns and decrease risk at the same time. But when diversifying you have to take care that you are not “di-worsifying” your portfolio by expanding your portfolio into sub-par stocks. By going so far down into the rankings you are more likely to be selecting stocks that provide lesser performance and so drag down the performance of the overall portfolio. The assumption here is that the ranking is effective. If so, then even with random effects the expectation is that you have an edge selecting the higher ranked stocks than the lower. Rather than discuss the statistics, I think the question that matters is does the ranking give you an edge? If you think it does then why ignore that edge by selecting lower ranked stock?

The attached screen shot shows metrics for combination of 5 stock sims. The first section shows the performance of each sim. The next section shows the performance of combined portfolio’s using 2, 3, 4, etc. portfolios (this assumes that each portfolio contributes equally to the performance of the combined portfolio each day - this probably skews the advantage of diversification a little but it’s much easier to calculate). The last section shows the benefit of diversification vs. the average of the portfolio’s being combined in terms of improved annual return, reduced drawdown and reduced sharpe ratio.

The benefits of diversification clearly diminish as more and more stocks are added to the combined portfolio. There is very little benefit of the 40 stock portfolio over the 30 stock portfolio. Were this example to be extended to yet more stocks, you may still see some small incremental benefits in risk reduction but sooner or later your would start to see reduced returns as well. I’m looking for the sweet spot with high returns and minimal risk. Seems to me you are more concerned about risk and that is really a personal preference.

Don


Diversification.png

I have not finished reading The Black Swan by Taleb yet, but if I understand his assertions correctly, we should not consider the standard deviation as a measure of risk because the standard deviation will not correctly estimate the true risk.

A standard deviation assumes that the distribution of possible outcomes follows a Gaussian distribution. However, financial results will vary in an order of magnitude many times greater than one would find in a Gaussian distribution.

I still see value in the standard deviation in order to compare volatility and risk between two or more different systems. However, if Taleb is correct, and I believe that he is, then we cannot use standard deviation to find a “sweet spot” for sufficient diversification.

In fact, Taleb makes a strong case that we cannot use past data to predict future returns or the volatility of those returns. My favorite is his example of the turkey that looks forward to his dinner. He has been fed at the same time for 1000 days. He expects that tomorrow will be served just as he was in the past… He cannot know that tomorrow is Thanks Giving.

This does not mean that back testing is not useful to achieving alfa. However, we will almost certainly not achieve similiar returns or volatily to those of our back testing.

When looking at diversification, I would suggest that we should look not only at the number of securities, but also at exposure to sectors and industries. The returns will most likely be lower in both our back testing and in real life, in the short term. However, Taleb asserts that financial markets experience unexpected volatility that can wipe out many years of gains in weeks or days. We cannot capture this change in future volatility by back testing 5, 10 or 20 years.

I am not certain how many stocks are needed to achieve sufficient diversification. However, if the idea that 30 or 40 stocks is sufficient is based on the lowest standard diviation, then I am quite certain that we are being misled by a risk measurement that we may not sufficiently understand.

Essentially - and here lies the key problem - all backtests are conducted with the benefit of hindsight - you already knew what happened and what worked.

Therefore you are introducing “look ahead bias” the moment you start the optimisation process. Information about the future is leaking into the past, and as a result the performance of ranking and sims is almost certainly going to be overstated.

Now lies the issue I have - I believe the extent to which they are overstated is inversely proportional to the number of stocks in the sim.

The problem is that ranking systems have been designed using data (not available during the course of the simulation) to give the highest returns to the top bucket, and shifting wieghtings is what helps to do this.

What we get down to is the granularity of the ranking system. Ranking buckets cannot be subdivided indefinately, ultiamtely you are limited to a universe of 5000 stocks. If you choose 5000 buckets are you likely to find that the top 1/5000th stock outperforms the 2/5000th stock and that in turn outperforms the 3/5000th stock? Not likely!

The issue is if it doesn’t work on a stock-by-stock basis, but it does work on “buckets” of stocks, at what point is the bucket large enough to “guarentee” you will benefit form the ranking system?

I am making the case that the number of stocks required is larger than we might think. Simply running simluations and showing that a 10 stock portfolio outperforms a 20 stock portfolio which outperforms a 50 stock portfolio is not necessarily meaningful:- you have designed the system to work that way, and it may be that all the additional performance from going from 50 to 10 stocks is down to look-ahead bias.

I showed in this thread:

http://www.portfolio123.com/mvnforum/viewthread?thread=2998

That while in backtesting a 5 stock version of TF12 outperforms a 50 stock port, in an “out of sample” period, the reverse is true.

If you look at TF12 (highly optimised) and my first generation ranking (very simple), then if you take 200 buckets in the out of sample period, you can clearly see the top 200th bucket has not been the top performer. However, when you switch to 20 buckets, the 20th bucket is still the top performer in both cases, out of sample. The ranking systems really have worked, they have delivered alpha in real time. However, there really is a limit to the granularity you can take, below which the ranking systems start to break apart and start to be dominated by randomness.

I think the comment about the Taleb book is very true. The future may contain many surprises and changes that we cannot predict, such as the failure this year of price-to-sales.

A carefully fine tuned system is much more vulnerable to a shift in how markets behave - you can see this very clearly in the thread about 5 stock vs. 50 stock portfolio. Perhaps some will find this an acceptable risk vs. the expected return. (if it really is achievable). Maybe that is true, but with such large drawdowns seen in some of the small stock portfolios, it doesn’t seem prudent to put all your money into such strategies.

Now, in the book “Way of the Turtle” Curtis Faith makes the arguement that optimisation is always worthwhile, but understand the more you optimise, the more you will overstate the backtested results (highly recommend the book). The returns from a 10 or 5 stock portfolio may well be significantly overstated, and the “real” performance is much closer to a 50 or 100 stock portfolio. Perhaps given the drawdown characteristics of such small ports, the risk/reward starts to look much less favourable.

Ultimately have a look at the graphs at the start of the page. Look at the variablility of the 200 bucket ranking system. Yet these are 25 stock portfolios. I really think this should be cause for alarm. Randomness can deliver a big part of returns. If you imagine the “worst case” scenario, where you optimise a system, but your top performing bucket has a lot of performance due to randomness, then you run in real time, and randonmess then works in opposition, you could seriously underperform expectations.

I feel I am making Talebs case. The world is quite random, and it is possilbe better to have coarse models rather than finely tuned models. Possibly. The jury is out.

Oliver,

I am still in the camp that 10 different 5-stock ports gives better diversification than one 50-stock port. And most likely, a better return.

Now, I do agree with you that 5-stock simulation tests do rely a lot on randomness, and you need to be careful with this. I think a great test of robustness would be if you could design a simulation that picks at random any stock out of the top 10 available stocks. A 5-stock sim that picks the highest ranked available stock (meeting all buy rules) does depend a lot on the actual stocks it picks. But what if it randomly picked from the top 10 available stocks? If you ran the same sim with this randomness, and all tests finished well, then you have a good case for robustness.

For example - rank values go down to 100th of a decimal point (e.g. 99.97). What if you could add to your buy rules: Rank ends in *.*7? Then you re-run the sim with .- ends in *.*6. And so on. This would be one way to test for randomness. Does anyone know of a way to do something like this?

Brian

Interesting idea.

I just added random>0.5 to a stock simulation from the VladsNewGrowth stable.

Nominal run 5 years of data:

__ AR __ / DD/ __Sharpe
64.26% / -23.73% / 1.98

Then the next 4 runs with the new buy rule added (no other changes between runs).
__ AR __ / DD/ __Sharpe
47.37% / -27.49% / 1.42
25.04% / -38.08% / 0.74
36.12% / -23.27% / 1.14
43.16% / -39.26% / 1.30

In all cases the port remained almost 100% invested throughout - mind you there are only 170-200 trades in the period so the volatility of result is not too surprising. Still, it does look like a useful approach to getting a measure of ‘the random effect’, albeit with a forcing down the rank scale to stay invested.

How does the RANDOM function work as a buy rule? Does the model go through all of the other buy rules, come up with the list of stocks that meet the buy rules, and then pick one of those stocks at random, instead of picking the highest ranked stock?

Andrew, how far down the list does your sim go in order to stay 100% invested? I would think that if you go too far down, you start defeating the purpose by straying too far away from the top stock. I’m hesitant to use the RANDOM function if I don’t know exactly how it works. But I think you catch the drift of what I’m thinking. Thanks for your reply and tests. If we could add a little bit of randomness without straying too far away, I think that can help us test our models for robustness.

Brian

I think this is an excellent idea…

the “random()” function will generate a psuedo random (random enough!) number between 0 and 1, having random() > 0.5 is the equivalent of a coin flip. The stock has to pass a “coin flip” test, just like any other buy rule. If it fails, then the system moves on one further down the rankings. This will have the effect of moving down twice as far, but it is like choosing from the top 10 instead of the top 5: For modest stock portfolios the effect of being slightly lower in ranking should have minimal effect.

If you put in random variables into the sim and it all falls to piecies, then you really need to go back to the drawing board. A robust system should stand up to random influences, which, after all are all over the place in the stock market (poor earnings etc. etc.)

Why didn’t I think of it before? Ha! I got some testing to do, thanks for thinking of it!

Oliver,

Thanks for the explanation.

Brian

Oliver, Brian:

I always check a sim for robustness using a restricted buy list of the stocks selected from the previous run. In this case the sim has to choose a completely different set of stocks. If I have a sim that back tests in the 50% AR range, I usually get something less, say 25% AR.

If the sim was looking at 10 stocks and choosing the best one, the restricted buy list would force the sim to choose the next best one. Isn’t his equivalent to twice the effect of using a random>0.5 buy rule which would only select every other next best stock?

Glenn

There is another thing one can do to “de-optimize” a sim.

If I have something promising which I consider to trade with real money, I always take out the 2 best performing stocks (in a 10 stock sim) and rerun it. If it is still close to the original AR, then I consider, otherwise I discard.

Wern

Oliver - your post explains exactly how I believe the random buy rule works.

It does have the disadvantage that for ports with restrictive buy rules and less than full investment the results may be miss leading due to variations in investment levels. In this case it’s probably better to look at stock gain/day as the variability measure.

It certainly felt like it would be worth some tests of the different ranking systems to see how they faired on ‘randomness’ testing versus in and out of sample performance for instance.

If a sim using 1 stock is not predictive of future performance and a sim using 50% of the stocks in the universe is most likely too much to be meaningful, what is optimal?

I have never seen an academic paper use a granulation smaller than 10% of the univserse.

I use a screen for liquidity that leaves about 4000 stocks to test, is 100 stocks too few? I can’t imagine that 400 (10%) would be needed as a minimum.

Does anyone know if there is any research or statistical rule for determining what would be the minimum % of the universe or number of stocks that should be included to consider a sim “robust” or “predictive.”

Jorge

This is a hard question to answer exactly, but there are some clues…

For example, just by a visual inspection of the ranking buckets at the start of the thread, you can clearly see there is more “random” varation when you choose smaller buckets.

On an emprical basis, I have found that every ranking system created, if you look at 20 buckets, the top 1/20th bucket always appears to be the best performing bucket in real time. This is not necessarily true when looking at 200 buckets. In sample, the top 1/200th bucket is the highest one, quite often because the ranking system has been optimised to make the top bucket the highest performing, so it inherently contains some look-ahead bias. In real time, the top 1/200th bucket may not be the top performing one.

Where is the cutoff?

I don’t know exactly, but my feeling is that probably a sufficient number is probably closer to 100 than it is to 10.

To illustrate the point, I have showed the ranking of “Balanced4” the P123 system performance in 2007. Since the ranking system was created before 2007, the year represents “real time” performance. I show the result with my “minimum liquidity” prescreen from 01/06/07 to today.

I show for 20 buckets, 50 buckets, 100 buckets and 200 buckets.

Notice how that for 20 buckets the trend is very clear, it still works great. But 200 buckets and the performance is all over the place, and being in the bucket 99-99.5 would have given negative performance!

The maximum number of buckets before performance starts to become quite unreliable is about 50 buckets, and given my minimum liquidity screen is a universe of just under 5000 stocks, this corresponds to a 100 stock portfolio.


balacned42007_20buckets.png