Finding BLUE

No I’m not posting about Blue’s Clues but rather the Gauss-Markov theorem. Blue stands for Best Linear Unbiased Estimator (as almost all of you know). This is an important Theorem for linear regressions. It is my contention that our ranking system is a type of linear regression. Or more accurately a not so linear regression (with regard to the factors used in the ranking system). The weights used in a ranking system are just coefficients in a linear regression equation.

Isn’t BLUE what we want in a ranking system. Note I did not say PEST (that would be me). I know there is no perfect estimator. But we want the best.

How good can we do with our present ranking system estimator? Pretty darn good. But could a few easy changes make it better? This is actually a question. But a very easy (possible) answer comes to mind.

Note that the “L” in BLUE stands for linear. Percentile is a linear scale but certainly it is not linear with regard to the factor or function used in the ranking system. There is no reason to think that a change from the 50th percentile to the 55th percentile causes the same change in a factor (or function) as a change from the 95th percentile to the 100th percentile. In fact, you will almost never be able to convert a factor to its percentile using a linear function. Sometimes we might want to convert to a log scale but to percentile? Probably never really. Clearly percentile cannot always lead to BLUE. In fact it is probably never the best scale to use in order to get the Best Linear Unbiased Estimator and may be far from the best scale to use many times.

Is this why some great ideas from finance fail in ranking systems? I’m guessing so.

One commonly used remedy is to use standard deviation from the mean as the X axis or independent variable(s). It is a linear conversion from the factor to the number of standard deviations from the mean. Standard deviation from the mean is a scale that is pretty easy to understand even if you are not real familiar with the factor itself. Could we be able to convert from percentile to standard deviation then back? Would this be hard? Would it actually be desirable? In the end we still would need to be able to convert to rank so we could pick the top 5 or 10 or 20 highest ranking stocks.

This is not a feature request because so many of you know much more about this that I do. You probably already know whether this is a good idea or not. I would love to know too.

Thank you.

Jim

There’s no reason to think that will be so. But equally, there is no reason to doubt it will be so. The kind of data we use, and the reasons why we use it, do not lend themselves to this kind of hierarchical precision.

If fact, you will almost never be able to convert a factor to its percentile using a linear function. Sometimes we might want to convert to a log scale but to percentile? Probably never really. Clearly percentile cannot always lead to BLUE. If fact it is probably never the best scale to use in order to get the Best Linear Unbiased Estimator and may be far from the best scale many times.

I think it’s the opposite. I think ranking systems fail (1) when they do not really represent great ideas but are merely discovered by treasure hunting among things that just so happened to have worked in the past without consideration of whether their success was based on logic or luck, and/or (2) when they fail to take account of the nature of the data and it’s often incredibly bad “behavior” leading, often, to the unintentional but still “mis-specified model.” As a simple example, consider a ranking factor based on EV/S or P/S, both of which are often pretty successful. But what about a company that just made a big divestiture in an effort to rescue an overall deteriorating business. The TTM figures used in the ratios are much higher than the forward-looking sales. So EV/S or P/S will appear deceptively low and the rank factor deceptively positive. Stuff like this happens pretty much all the time with all data-points in a countless variety of ways. You can’t cure it through any quantitative techniques. The only way to deal with it is to understand the data items, how they can flash bad signals, and to control for as many of those as you can – I do it by using screening/buy rules to narrow the universe to situations that minimize the probability of false signals (but the probability can never go to zero), and by using multiple factors that address the same basic idea since mis-specification from false signals is, at least in my opinion, a much bigger threat to $uccess than is use of multiple factors that in theory would seem highly correlated with one another. If I use one growth factor, the probability of mis-specification is very high. If I use five carefully selected growth factors, the probability of overall mis-specification in a multi-factor system, while certainly not zero, is a heck of a lot lower than it is with use of one factor).

My suggestion: Worry about mis-specification. (I’m adopting this well-known quantitative research phrase in order to try as best I can to reach those who, for whatever reason, remain resistant to financial theory.)

There will always be hazards we can’t control. But among the things we can control, the two traps most likely to damage live performance are (1) reliance of factors that worked based on luck rather than logic, and (2) model-mis-specification. If you get a grip on those deadly obstacles, a single undergrad-level statistics course will be more than ample for all the rest of our needs (and, or course, the absence of assault by factors we can’t control, such as Mr. Market, Mr. Economy,etc.).

Marc,

Thank you for your comments.

The above, in quotes, is very basic high school math. We are not talking undergraduate statistics but rather second year high school math regarding linear functions.

Mis-specification is clearly important and I do not intend to ignore this. There are days when I can walk and chew gum at the same time.

I became more interested in this after reading Hull’s paper (the one you referenced). I think you are probably right that we do not need great linear regressions for fundamental data: but would it help? I don’t really know. On the other hand, I’m sure Hull would not have been published using percentile scales in a linear regression for market timing. Would his fund make any money? I think not: I think doing the math right can matter. His fund would certainly be smaller!!!

Regards,

Jim

To be precise, BLUE is a property of the Ordinary Least Squares (OLS) estimator. OLS gives you the optimal coefficients in a linear regression model. But, critically, the BLUE property only holds when the data you’re using satisfies a number of criteria. One of those criteria is that the error term (the part you cannot or do not explain with the variables in the model) is normally distributed. Another is that you have included all the relevant variables in your model. There are several more criteria, but it’s obvious that not all criteria are satisfied, so the BLUE property will not hold anyway. So the theory is nice, but doesn’t apply here.

In practice, OLS doesn’t work well. I’ve tried it in a number of ways. The problem is that stock returns are extremely ‘noisy’ and that the ‘optimal’ coefficients you will find are very unstable. They will change completely from one week to the next. You can fit the past, but it has extremely poor predictive performance.

You’re right that there is some similarity between linear regression and the ranking system. There is one major difference though. In a regression you usually try to explain or predict ALL observations (stock returns). But in a sim or a live portfolio, you mostly care about the performance of the 10 or so stocks you actually invest in.

Transforming the data before fitting a linear model is a common way to make linear models more widely applicable and to fit/describe the actual relationships in the data better. Some of those transformations might work well in ranking too. However, transformations that do not change the ordering of the companies in the ranking will have no effect, because the ranking will stay the same. For example, taking logs of the PE ratio doesn’t help: Rank(Log(PE)) is the same as Rank(PE). Taking standard deviation from the mean also doesn’t help. These transformations would only make sense if P123 wouldn’t always rank all the values in each (sub)node.

Thanks Peter!

Excellent post. Reading it makes me realize that I was not very articulate or perhaps I was not completely clear in my thinking. What I really should have suggested was having the option of doing a true multi-factor linear regression without ranking each factor. The only thing that would be ranked would be the dependent variable at the end so that I can pick the top ranked x number of stocks based on the ranking of the dependent variable only.

Peter makes the point that the output will be similar with single variables. This is clearly true. Also, he correctly points out that doing linear transpositions then ranking each factor will not affect the results: the outputs will be the same. It is important to note that converting a factor or function to a rank is not a linear transposition. If you use multiple factors and do not rank each factor (or the nodes), the outputs can be actually be different because it is not a linear transposition. In addition to different outputs you could find BLUE mathematically rather than using trial and error optimization. Remember, until one of us disproves the theorem BLUE is really supposed to be the best linear unbiased estimate.

I checked my text and tried to confirm on Wikipedia: “The errors do not need to be normal.” My understanding is that while BLUE can be found one is limited (severely) in the statistical inferences such as statistical significance without the normality assumption. Also the distribution does not need to be i.i.d. according to Wikipedia. I may be missing something, however, and there are other pretty severe restrictions such as being homoscedastic etc.

Peter makes other good points. Many based on experience. He has tried this and thinks the outputs are not any better, for example. Marc thinks it will not be better also–or that at least ones energy is better spent elsewhere. I would still like to try it for myself but there is one more thing to consider.

If we really want to sell R2G ports with market-timing and compete with the pros would this be helpful? More specific to my thinking: after reading Hull’s paper, does anyone want to pay much for a simple moving average crossover? I wouldn’t anymore but that may just be me. Also, one could be more transparent in what factors are being used. It would be P123s data and computing power that the sub would keep paying for: even if they know most of the factors that are being used. They won’t know the exact timing without the linear regression, P123’s data and computing power.

Implementation might not really be hard. If I can’t do it at P123, I will buy STATA software (eventually) and load it on the computer but I probably will not be able to download all of the data I want. I am sure it is more involved for P123 to license and integrate this (or similar) software, however. Actually ranking (once the ranking system was developed) would not be any more computer intensive. I am not sure about doing the actual linear regression and statistics as far as computer utilization. The implementation for market-timing might be slightly different than described in my first paragraph: as done in Hull’s paper for example.

Probably no one is interested. That is okay. For market-timing, I will see if Hull’s Fund is still open. I think ranking is working fine for fundamentals.

I appreciate everyone’s input and time. I will not push this in the likely event that no one is interested. I have enjoyed the discussion.

Deleted post

Marco,

Just wanted to say thank you!!!

What I was suggesting can already be done using Zscore(" ") for each factor and combining all of the Zscores into one equation. The coefficient before each Zscore is the same as the weights used when I ranked each factor. It does lead to different holdings as it should. The returns, Sharpe ratios and drawdowns are similar but are clearly different. Zscore does have the option of trimming the far outliers. I will be trying this.

P123 is awesome! I was more concerned about getting the best linear scale for the independent variables than finding the line of best fit.

Marc and Peter. Thank you for your comments. I would not have figured this out without having the opportunity to discuss this in the forum.

Warmest regards,

Jim

Yes, what you are talking about high school math but you need an elementary grasp of undergraduate statistics to know it doesn’t apply.

As pvdb states:

I remember in my undergraduate statistics text book one of the first things it introduces in the first ten pages is the concept of the bell curve and normal distribution. Everything else in that textbook revolved around that. What anyone dealing with the stock market must know, however, is that many of things of interest associated with it are NOT normally distributed. So a lot of what you learned in undergraduate statistics goes out the window: linear regression, standard deviation, z-score, etc. That’s why when I hear so-called experts proclaiming during the last crash that what just happened was a six sigma event or earlier that LTCM was going bust, I must roll my eyes, they may have gotten PhDs or even a Nobel Prize but they failed a question related to Statistics 101.

Sterling,

So you know that using the Zscore of a function is clearly inferior to using the rank of a function with out trying it? You are good.

BTW, I wonder why CapitalIQ bothers to provide factors that are ranked and factors that have the Zscore as data options. Why do they bother? You should call them and tell them they are wasting their time.

I just wanted to add a bit more color on my remarks re: what you can expect in terms of ranking position.

There is only one formulation in which we have a right to expect position 1 to be better than position 2, position 2 to be better than position 3, etc. It’s this:

P /(D/(k-g)) lower is better

Some may recognize this as a reformulation of the Gordon Dividend Discount model (DDM), the penultimate determination of calculating a correct stock price. (P is price, D is dividend, k is required rate of return, g is expected infinite rate of dividend growth).

The problem we have is that this is just a theoretical formulation. It cannot be applied in the real world because the inputs are too difficult to even guesstimate, and occasional web tools offered that try to do it tend to produce comical results (including negative stock prices when g > k – believe it or not, I actually saw a web site publish that).

Everything we in the real world of security analysis involves looking at things that give us clues that the ideal but unspecifiable ratio might be more favorable for one stock than another, etc.

Example: Why might we use ROE as a rank factor? We can do this because ROE is indicative of a company’s ability to generate future earnings growth and the ability to generate future earnings growth is indicative of a company’s ability to generate future dividend growth, so a higher ROE may portend a better D/(k-g). If we pair that with a measure of value, we may have something usable.

But we’re talking here about probabilities and potential, and most important, we rely on the often unstated but always present all else being equal assumption. Suppose Company A has an ROE of 15% while Company B has an ROE of 8%. Whether you use our present rank interface based on absolute ROEs or z-score adjusted ROEs, or whether you use BLUE Company A will look better than Company B, the issue presented here being how much better.

But is that really the proper question. Suppose I also tell you that Company A’s business has peaked and that it’s ROE is likely to trend lower in the future as the business decelerates. Suppose I also tell you Company B is on the rise. It’s ROE now is 8%, a few years ago it was 4%, a couple of years before that it was in the red and that it’s expected to keep trending upward. Now, it’s absolutely clear that B is the better choice based on the ROE factor. So in fact, whether we use BLUE, z-score or anything else, we have bad ranks.

We overcome them by (1) having our ROE factor accompanied by others that suggest ROE is on the rise, or at least not on the decline, perhaps trends in turnover, margin, roe history, analyst sentiment, etc. etc. etc. (2) having our ROE factor applied to a subset of the universe that’s been narrowed by screening/buy rules that suggest a better probability that companies are on the rise.

This is why I repeatedly maintain that you cannot “quant your way” into a successful strategy. We’ve seen over and over again that people can quant their way into fabulous simulations, but that’s not the task. A great simulation is simply a great answer to the wrong question. Fundamental reasoning is needed to serve as a necessary bridge from fixed and known dataset A (the simulation sample period) to dataset B (the unknown future). Without that bridge, any good out-of-sample result you see is attributable to nothing more than luck (which, by the way can be considerable given the power and persistence of money flows, but luck is still luck).

Some in the past have, in this forum, expressed disdain for fundamental analysis as being rigid or confining. That is as wrong a sentiment as anyone can possibly have. I can’t imagine anything more challengingly creative than coming up with clues and figuring out how to articulate in ways that can be handled by a server and a database.

Every successful quant I have encountered in real life is a stickler for “domain knowledge.” In the stock market, domain knowledge means an understanding of how stocks get priced (the rational and the irrational aspects) and understanding how the data can be used to build models that have high probability of success in the task at hand (success in dataset B).

I don’t rain on these quant parades to be mean or ornery. I do it because I know why live-money performance is what it is and I’m trying to help you do what’s needed to get the results you want. For those who are interested, I’m the one who put together the reading list in the Help section. And as some know, I’m always happy to discuss in forum.

Marc,

So true with just one independent variable. Do you use just one variable in your ranks?

Try what I have tried with a simple 4 factor ranking system where each factor/function is ranked in your original system. Convert each factor into its Zscore. Take the weights of each factor in your original ranking system and put it in front of each Zscore as the coefficient. Add the Zscores with the coefficients in one equation. Rank the equation. You will find that the new system selects different stocks, has different returns etc.

It is not intuitive: we cannot imagine 4-5 dimensions in our head. Which is better rank or Zscore? I do not know. They are different however.

This is because going from 99 to 100 is only a change of 1%. But going from the 99th ranked faactor to the 100th could go from a Zscore of 4 to 7 say for an outlier. This is a bigger change both in absolute and percentage change. You are right that this will not make a difference in the relative positions with one independent variable but the difference can affect the relative position with more than one independent variable.

Using the above example here are possible numbers for obtaining rank with two ranking systems (first: rank each factor and second: Zscore each factor) for stock A and stock B

Rank of stock A is related to 99 + 100 + 100. (299) using ranks (equal weights). 99 and 100 are ranks.

Rank of stock B is related to 100 +99 + 99 (298) using ranks

Stock A will have a higher rank if each factor/function is ranked.

Random possible numbers for the same factors for stocks A and B using Zscore:

Rank of stock A is related to 4 + 5 + 5.5. (14.5)

Rank of stock B is proportional to 7 + 4.1 + 3.9 (15)

Stock B will have a higher rank if the Zscore is used instead of the rank for each factor

This difference in the ranks with different stock selection does happen pretty frequently when I try it with my 4 factor system. I’ve tried this once and the returns were similar. The drawdowns were noticeably different. The Sharpe ratios were a little different.

I really do appreciate your comments. Very helpful. At a minimum the Zscore ranking system finds some good stocks that are not found with the usual ranking system (many of the selections are the same). I would not have found this without this discussion. Also Zscore allows one to trim the outliers which I will be looking at. This is what you are doing in your most recent R2G port, I think. All very helpful.

Thank you.

Warmest regards,

Jim

Hi Jim

Could you please share an example how you have implemented the z-score in your ranking system or simulations? I would be very interested to test it. Thank you,

whotookmynickname

Andreas,

I am playing with this this morning. So I took 5 functions (A, B, C, D, E in this example) that I use in my live ports and converted them to ZScore and put them into a single function in the rank. See the example without the actual functions: you will probably want to use custom formulas to make it fit. I optimized the coefficients a little with trial and error. I then ran the eight stock Sim as shown with only one sell rule RankPos > 8.

What I am working on: I changed the trim a little but I mostly excluded that NAs using the ZScore. When I get enough data I will run a linear regression of the ZScore of the 5 functions (independent variables) against the following one week percent returns (dependent variable). I will then use these values as the coefficients in the rank (before the ZScores). There is some manual work with Excel to do this so I won’t have this very soon.

BTW, this is exactly what can be done with ClariFI. You can get the data as a ZScore and run linear regressions at ClariFI. STATA cost $4,000 and it can handle a lot of data points. When I am done downloading to Excel I may have a sample of 5,000 data points to plug into Excel. I will let everyone know if my small trial suggest that the $4,000 purchase may be a good investment for P123.

Let me know if you have any question and please let me know if I could be doing this better.


ZScore Rank.JPG


ZScore Sim.JPG

Fundamental reasoning is needed to serve as a necessary bridge from fixed and known dataset A (the simulation sample period) to dataset B (the unknown future).

That is an excellent way to phrase the argument Marc. Thank you.

Jim,

I don’t think anyone has suggested that using several Zscores in a single node would result in the same ordering as the plain functions in seperate nodes. Even making small changes to the universe to which a normal ranking system is applied has dramatic effects on the final ordering of stocks.

The problem that people have pointed out is that measuring standard deviations of populations that are not normally distributed will yield less than satisfying results. P/E ratios for a universe of stocks on a given week are not normally distributed. Is it possible to calculate a standard deviation for that population of P/E’s? Yes. Do you know something more about a stock’s P/E and it’s relationship to the distribution of the universe of P/Es when I tell you the P/E’s Zscore? No.

Moving functions from subnodes to a single node, as you did when you wrapped them in Zscores, is a legitimate way of avoiding some of the challenges created by normalizing each subnode. But why blindly wrap everything in Zscore? You note yourself that some data points make more sense on a log scale, others perhaps normalized as percentages, others maybe simply broken into quintiles, some have outliers that should be discarded, and for others the outliers are the only meaningful values. So rather than just wrapping everything in Zscore, it seems like it would make more sense to actually understand each data point and transform it appropriately.

The issue you already intuit you will have when you then try to combine these disparate values into a simple linear regression, is that the values are all on different scales and have different distributions and thus weighting them cannot be done intuitively.

In other words, you started the thread by essentially saying “I think there may be problems with the way p123 normalizes subnodes” and you proposed Zscore as a different way to normalize subnodes. However, Zscore is a much poorer choice of normalization than that already in place. While it is true that for an arbitrary data point under the current normalization scheme, a difference in rank of 5% does not provide much intuition about just how much better one stock is than another; given that we know that very few of the data points are normally distributed, under the zscore method of normalization, knowing that one stock scored a 3 while another scored a 4 offers even less intuition about just how much better the higher scoring stock is.

SUpirate1081,

At least Zscore is a linear function while rank is not. I have looked at whether to use ln or log. But would you then destroy the linearity by then taking the rank rather than keeping it linear with Zscore? It is possible to take the ZScore of a log function.

I’m just saying if you are doing a linear regression you aught to be working with lines (or linear functions). You can agree on that can’t you?

Zscore isn’t linear.

Z(x) + Z(y) != Z(x + y)

and

Z(a * x) != a * Z(x)

So Z fails both requirements of a linear transformation. Or more intuitively, for a normal population, if Z(a) = 2 and Z(b) = 3 and Z(c) = 4, the odds of observing a value between b and c is exponentially smaller than the odds of observing a value between a and b.

Never said above

Said if f(X) = y is linear then Zscore(“f(X)”) = y is linear. Rank[f(X)] = y is decidedly nonlinear.

And of course you know that when you run a rank performance test with 200 buckets this is why the upper buckets never perform in a linear fashion.

Do the upper buckets look linear to you?

Did you ever ask yourself why?


rank performance test.JPG

Although Z-Score may not be the best approach to adjusting functions for optimized ranking, many of the value functions that are frequently use need some form of trimming outliners. Take for example, PEExclXorTTM / Aggregate(“PEExclXorTTM”,#Industry, #Avg). It is assumed that companies that have a lower PE than their industry average would be good companies to consider. However, companies that have the lowest values (rank > 99.5) significantly underperform the companies with higher values (rank > 99.0 & < 99.5).

You can see this if you set up a single function ranking system using PEExclXorTTM / Aggregate(“PEExclXorTTM”,#Industry, #Avg) lower values are better. Then run the performance using 200 buckets and the filter, AvgDailyTot(60)>1000000 & Close(0)>1, through the max period. you will get an annual return of 6.5% in the top bucket, but 23% in the second bucket. buy adding a Boolean rule, PEExclXorTTM > 3.5, (equal weight) you get 17% in the top bucket. That is almost a factor of 3 improvement. (See link below) Although it hasn’t performed well over the last 5 years.

I perform similar tests on most of the factors & functions that I combine in my best ranking systems. Each function needs to be tested separately since the cutoff value for the Boolean function will vary widely, and a single “trim” values for all functions leaves a lot of performance on the table (some functions need no trim). That has significantly boosted the annual return of the systems without the Boolean functions.

http://www.portfolio123.com/app/ranking-system/276746

Thank Denny.

Are you saying you use the ZScore for the trim (instead of using the Boolean function)?