Comparison of 5 different ways to rank to include Industry Performance

marco · May 15, 2015, 9:27pm

Dear All,

Our recent change in the pre-built Industry factors (the ones that end in “Ind”) generated a lot of discussion . So I decided to see what is the difference between the many methods in which an Industry factor could be calculated.

I decided to create a ranking system that ranks based on:

40% - The stock’s technicals
40% - The stock’s valuation
20% - The Industry performance

The Industry performance is the combined score of the 4Wk,13Wk and 26wk industry returns, higher is better. The idea here is to give better scores to stocks that are in industries performing better. You can find this public ranking system here.

As you can see there are now 5 ways to calculate an Industry return over a period:

Using Industry nodes and Close(0)/Close(x). This formula will access the Industry’s price series generated by us using a cap-weighted method
Using Industry nodes with the new Ind factors which average the stock’s returns after a 16.5% trim
Use Stock nodes with Aggregate function and simple average
Use Stock nodes with Aggregate function and cap average
Use Stock nodes with FMedian function

From the output below from the Ranking System Optimizer tool. As you can see the overall annual returns of the buckets are similar. If I had to pick a winner I’d probably pick the #5 that uses FMedian because it has the lowest StdDev between buckets.

CONCLUSION: Not much difference in all the methods when viewing the results in large buckets and where the industry is a small subset of the overall score.

PS. Using Stock Nodes and Aggregate function creates multiple equal ranks for stocks in the same industry, and all get the same rank. The next industry rank will be dependent on how many stocks are in the industry before it. Not a bad thing per se, just a technical detail.

InspectorSector · May 16, 2015, 1:48am

Marco - thanks for this info but I’m not sure how relevant it is. First of all, you are using a technical factor for the study, and second, ranking systems are essentially “fuzzy logic”. My concern is when an industry factor is used in a buy/sell rule or rules. In such cases, absolute values are of concern.

Steve

DennyHalwes · May 16, 2015, 2:43am

That was a very interesting way to look at the effect of changing the method of calculating a factor on a ranking system. I was wondering what the effect on the performance of the 5 methods was using a single factor ranking system. So I zeroed the weighting of the Tech Rank & Valuation nodes and changed the weighting of the factors of the Industry Leadership node one at a time to 100%. I then ran the performance using all data, weekly rebalance, 200 buckets, AvgDailyTot(60) > 200000, & $1. Below are the annual return results:

Factor
1st 17.5%
2nd 16.5%
3rd 21.0%
4th 27.5%
5th 23.5%

Below is the Performance of the 4th factor, I am a little surprised how much difference there is between the top bucket and all the rest!

atw · May 17, 2015, 4:58pm

Thank you both for your posts.

In particular, it’s fascinating to see the 4th factor outperformance.

Am I mistaken that Aggregate w/ CapAvg is using a 16.5 trim?

It seems hard to believe this result could be expected to persist.

WalterW · May 17, 2015, 6:52pm

Subsisting Pr4W%Chg for Close(0)/Close(20) (and doing the same for 13W and 26W), taps down the highest bin to ~24%. Does anyone know why? I plan to look at it tonight in the PIT Chart tool.

EDIT: Ok, price series are adjusted for dividends. Using TotalReturn* brings back the Close based returns.

DennyHalwes · May 17, 2015, 9:46pm

Hugh,

The formula for the 4th function are in the form; Aggregate(“close(0)/close(20)”,#industry,#capavg)

So since the Trim is not specified, it defaults to 16.5%.

mgerstein · May 18, 2015, 12:41pm

Yes! And that’s exactly why this “controversy” is so perplexing. How could you not be happy the the change to the new Ind protocol, which removes a troublesome source of large random fluctuations and enhances the likelihood of the factor performing for you out of sample in a manner consistent with how it performed in sims? While we can never expect perfection in this regard since the future is necessarily unknown, it would seem that anything that removes randomness from the modeling process is a plus. Couple that with the fact that you can go back to the old way if you want via the Aggregate function . . . What is the problem?

Again, to my mind, whether it’s specifically articulated or not, I think this all comes down to an assumption that the past is sacrosanct and could not and should not change. Although superficially that would appear to be the case, in fact, it is a flawed assumption in this field.

There is no past that is available to us or anyone else in the investment community to work with. What is typically thought of a fixed past is, in fact, a model.

The first set of models is at the company level. No matter how great a set of information systems is used, profit cannot be precisely counted. Some things can be, but many other are determined by modeling assumptions some of which are highly controversial within the accounting field. That;'s why we have two income statement; one being called the cash flow statement, which counts cash in minus cash out and another, called the income statement, which counts cash in for a particular ordinary business source (revenue) minus costs modeled with the aim of matching revenue to expenses (a brutally difficult task considering that many expenses support revenue over periods that do not match the period in which the spending occurs). And right now, the hot topic at FASB is valuation-based profit, which means that ordinary operating expenses should include an amount to reflect (model based) estimates of the change in the total “market value” of the company’s assets. I think this change would be a nightmare and if it comes to pass, I’ll probably have to work up a metric for us that adjusts net incomer, eps, etc, to eliminate it. Either way, this Ind change is chickens** compared to the potential impact of this sort of thing.

Next are models of the economy. There’s no exact science to counting GDP, or any one of countless other indicaotrs. Don’t you notice the significance of revisions? Do you know what the final number is? It’s the number we have after the government decides that enough si enough and to simply stop looking to revise. But that doesn’t mean it’s firm.

Now, let’s move on t databases. Go back to the white paper I posted when we switched to Compustat. There is no single correct way to take the already-model-based data in the 10-Ks and 10-Qa and convert them into the kinds of numbers we can work with. In that paper, I explained and illustrated substantial differences between what Reuters and Compustat do. And Zachs is still different. And there are other data vendors that arer also different.

Ultimately, a quest for data stability is a quest for the outlawing of change for the better, something I do not think we can or should support and I doubt the majority of our users would support that. Steve, I’m especially perplexed why you, one who has often spoken out so effectively against excessive attention to simulated results (as opposed to live results) cares so much about this topic. Your old sims are what they are. As Denny said in a post, the new items make for the new normal.

All this would be rightfully upsetting if it meant that all your models were anchored to nothing and could fluctuate wildly based on changes/improvements that come down the pike in the future. Whether that fear is valid or not depends on your models. If your models are anchored in legitimate ideas about relating to factors that genuinely influence stocks prices, changes should be modest. (Through all the changes made here, including the change to Compustat, none of my models (not just the new I use live but the ridiculous large number stored in my account that I created for one work-related reason or another) have changed in a meaningful way. Some improved. Some deteriorated. But all of the changes were minor. None were such as to cause me to suddenly reject a once-thought-to-be-good model or accept a once-thought-to-ber-bad model. If a model changes significantly in response to the sorts f changes we’ve been introducing, whether Composts, Ind, error corrections, data vendor cleanups, etc., that’s a sign that the model is overly subject to the dreaded “R” word, randomness, and needs to be reviewed.

I’ve long argued that the sort of statistical robustness testing done by so many is nonsense because all it does is perfect a prediction of the past. Frankly, this Ind change is probably the best robustness test you got since the last data change. Maybe i should come up with a list of other changes users could introduce on a do-it-yoursdelf factor-definition-basis (i.e. use Aggregate to introduce more changes into Ind comparison; recreate some of the old Reuters factor, etc.) to run a set of REAL robustness checks.

Jrinne · May 18, 2015, 2:59pm

Delete

InspectorSector · May 18, 2015, 3:56pm

“Yes! And that’s exactly why this “controversy” is so perplexing. How could you not be happy the the change to the new Ind protocol, which removes a troublesome source of large random fluctuations and enhances the likelihood of the factor performing for you out of sample in a manner consistent with how it performed in sims? While we can never expect perfection in this regard since the future is necessarily unknown, it would seem that anything that removes randomness from the modeling process is a plus. Couple that with the fact that you can go back to the old way if you want via the Aggregate function . . . What is the problem?”

As I said, the industry factors were handled poorly by P123 but this is a dead issue.

“Again, to my mind, whether it’s specifically articulated or not, I think this all comes down to an assumption that the past is sacrosanct and could not and should not change. Although superficially that would appear to be the case, in fact, it is a flawed assumption in this field.”

This is what set P123 apart from their competition. In any case if the data was good enough to use today then why is it not good enough to use tomorrow? Ultimately it comes down to 2 issues that I can see: (1) inability to track down problems; (2) credibility… no matter how logical your arguments seem, P123 will lose credibility every time there is a discrepancy between one simulation run and the next. This translates into loss of business.

“Now, let’s move on t databases. Go back to the white paper I posted when we switched to Compustat.”

As I recall the great selling point was “point in time” data. This really confused me because the was P123 handled the reuters data was point in time. What we have now is nothing of the sort.

"There is no single correct way to take the already-model-based data in the 10-Ks and 10-Qa and convert them into the kinds of numbers we can work with. "

Yes and that is why suggesting that there is one ane only one strategy for investing that works is nonsense. This is also why quantitative analysis is so effective as it doesn’t require a correct way to covert company factors into numbers.

“I doubt the majority of our users would support that.”

How many users will accept that the stocks bought last week are not the ones recommended to be bought last week? Zero.

“I’m especially perplexed why you, one who has often spoken out so effectively against excessive attention to simulated results (as opposed to live results) cares so much about this topic. Your old sims are what they are. As Denny said in a post, the new items make for the new normal.”

There is a fundamental difference between performance and repeatability.

“I’ve long argued that the sort of statistical robustness testing done by so many is nonsense because all it does is perfect a prediction of the past.”

Agreed - but the new norm for R2G is putting a new face on an old problem. New salesmen will enter the building with the usual chest thumping, and tarzan calls. It will take another two years for that to dissipate before we realize wgat we did was not the answer and get on to eliminating in-sample data completely from R2Gs.

"All this would be rightfully upsetting if it meant that all your models were anchored to nothing and could fluctuate wildly based on changes/improvements that come down the pike in the future. "

Marc - tell me again why your low price market newsletter stopped recommending low price stocks? Was it because you couldn’t find any picks? Did the market collapse or did you sit on the sidelines waiting? And for how long?

So I leave you with the following portfolio versus simulation done yesterday. Garp $500K. This portfolio was designed by P123 a long time ago, 25 stocks, a popular ranking system (Balanced4). You can see the yearly discrepancies as well as I can. Most users are upset when they see a discrepancy of 2% in a given year. How do you explain such a discrepancy without stable data?

Steve

mgerstein · May 18, 2015, 4:50pm

“Again, to my mind, whether it’s specifically articulated or not, I think this all comes down to an assumption that the past is sacrosanct and could not and should not change. Although superficially that would appear to be the case, in fact, it is a flawed assumption in this field.”

This is what set P123 apart from their competition. In any case if the data was good enough to use today then why is it not good enough to use tomorrow? Ultimately it comes down to 2 issues that I can see: (1) inability to track down problems; (2) credibility… no matter how logical your arguments seem, P123 will lose credibility every time there is a discrepancy between one simulation run and the next. This translates into loss of business.

“Now, let’s move on t databases. Go back to the white paper I posted when we switched to Compustat.”

As I recall the great selling point was “point in time” data. This really confused me because the was P123 handled the reuters data was point in time. What we have now is nothing of the sort.

The variability of the past has nothing to do with good/bad data or point in time. It’s the nature of the beast. This obviously is new to you (and probably many others here). Perhaps if you would switch from arguing mode to learning mode, you might come to understand it.

See what happens when you refuse to learn this discipline, which is not the one in which you were trained and made your career. You’re so committed to trying to wage verbal battle you’ve stepped into gratuitous nastiness and even fell on your face by getting it wrong. Who said anything about the newsletter going away? It still exists and performance is fine.

I think it’s time to put this whole topic to bed.

InspectorSector · May 18, 2015, 5:45pm

"The variability of the past has nothing to do with good/bad data or point in time. It’s the nature of the beast. "

Lets get real Marc. The past is not “variable”. Its not the nature of the beast. The past is variable because either you chose it to be so or you were sold swamp land by your data vendor and don’t have control over what you are provided. If today’s data is not accurate enough to do backtest then why is tomorrow’s? You haven’t answered that question. Please do so.

"This obviously is new to you (and probably many others here). Perhaps if you would switch from arguing mode to learning mode, you might come to understand it.

This is not “new” to me. I’ve worked on trading systems for more than 20 years.

The users of P123 were told previously that variability was only a problem if the portfolio had 5 or less stock holdings or not well designed. So I have presented GARP $500K. Please tell me why the results are not even close to being the same? If the answer is “because the data has changed” I’m going to throw up all over my keyboard. If that is the answer then we might as well toss out sims completely and simply have a screener. The truth is that you can’t tell me why, because with ongoing data changes you can’t investigate the underlying cause.

“See what happens when you refuse to learn this discipline, which is not the one in which you were trained and made your career. You’re so committed to trying to wage verbal battle you’ve stepped into gratuitous nastiness and even fell on your face by getting it wrong. Who said anything about the newsletter going away? It still exists and performance is fine.”

Unless I’m in a straight-jacket right now, I recollect that you said in a previous post that you stopped providing picks under $3 which was the original theme of your newsletter. Is that true or not? Simply “Yes” or “No” will do. There’s no need to be insulting.

Steve

mgerstein · May 18, 2015, 6:44pm

Steve, check back to me when your mind is open. I’m not wasting any more time arguing.

InspectorSector · May 18, 2015, 7:46pm

Marc - my mind is open. You have a really good knowledge of company fundamentals. I definitely want to learn your strategies. But there is more at stake here than learning the latest value-based methodology. P123 really needs to be carefully about how it proceeds into the future.

The issue is maintaining integrity of the “system”, s/w platform, or whatever you choose to call it. I honestly don’t see how that is possible with back data constantly in the state of change. The tipping point for me was that there was no point in keeping old industry algorithms because there is “no repeatability anyways”. Once P123 starts making such rationalizations, then the integrity bar gets lowered, lowered each time that rationalization is used.

The best fix is to put a stop to the repeatability problem. Now I understand that this may be difficult or impossible and your job is to sell the people on what you can practically provide. I get that (but I still want to see you squirm

I was drawn to P123 because it was head and shoulders above the rest. I’m sure others feel the same. Lets try to keep it that way.

I have expressed my concerns and for me it is a need to understand exactly what is going on with my systems. .Oothers have supported me privately but I think they are afraid of getting trounced on the forum. You are a very intimidating guy.

Take care
Steve

atw · May 18, 2015, 9:02pm

I am not nearly as experienced as other members of this site. However, I feel that any movement away from PIT data is very worrisome.

If workarounds (such as turned out to be possible regarding the new “ind” variables) are always available AND upcoming changes are introduced in a transparent manner such as would be the case if Denny Halwes’ feature suggestion about tracking system changes is implemented, then perhaps everyone can be happy. P123 can continue to develop its tools and users can have a reliable means of repeating their studies and tracking changes.

If a PIT orientation is lost, I think p123 could degrade into another “What do you think?”/“Who do you like?” type website that relies on common sense and intuition, which I believe most of us know are particularly unreliable stock selection and market indicators, instead of the serious research tool it could be. The best ideas often fail miserably. That’s why having a platform that allows for simulation based on data that was truly available on any given day is so great.

Hugh

Shaun · May 18, 2015, 11:06pm

How is PIT being lost? My understanding is that repeatability has been lost as a calculation methodology has been changed for relative ratios. Am I wrong?

Years ago, I built a ClariFI Modelstation and Compustat PIT/CapitalIQ PIT platform from the ground up and know the challenges that P123 faces being at the mercy of data vendors. There were many issues.

I saw P123’s explanation on the relative ratio change as reasonable… previously, the Industry/Sector relative ratios were frankly a poor kludge with overzealous trimming that was moved to better standards with the calculation methodology.

I want repeatability otherwise we cannot trust the simulation algorithm that drives the platform which we do not have control over nor access to. Denny’s “Developer Log” idea is worth voting for. The change as I understand it is not in the core algorithm but in a relative ratio calculation.

InspectorSector · May 18, 2015, 11:47pm

Shaun - P123 used to have a system back in the Reuters days, whereby data was saved each week. No one was retroactively backdating data. For me, this was “Point In Time”.

Now we have a system where data is constantly being back-filled, probably on an idealized basis. They cannot know how long it would have taken to process the data back 10-15 years ago. So are they making a good assessment of the time delay (perhaps a good 4 weeks from when the Q/A report is released)? Or are they using the time when the information was made available to the public (the day the Q/A Report is released)? Or something in between?

So here is a good example of the problem. Back in mid-2013, P123 found out that some data (analyst estimates I think) were spilling over into Monday. Although the data could not possibly be processed in time for Monday opening, the data vendor indicated that it was available to the public Monday morning thus it was justified in using the Monday date. So the P123 solution was to time-stamp the data going forward, resulting in a one week delay (for most of us). However, that does nothing for the data prior to 2013 which is given a PIT that is not achievable in the real world.

Now it just so happens that a 4 Wk change in estimate is a really good stock formula. Or is it? You may never really know what kind of results you might achieve in the future because the past isn’t PIT.

Denny’s “Developer Log” idea is worth voting for. But similar feature requests have been around for many years. And it doesn’t solve the underlying issue. Unless you can get back to a point where you can repeat a simulation from the past, you can’t properly investigate or solve certain types of problems.

Now I made a statement before to the effect that the platform isn’t stable and people took offense to this. But I use P123 as much as anyone and I can safely tell you that I discover a bug on average probably once per week. Most are U/I related but not all. Just because things appear to be running smoothly most of the time doesn’t mean they are. If one can’t get to the root cause then it is a problem

Steve

Shaun · May 19, 2015, 12:49am

Steve, respect your background and have subscribed to several of your R2Gs. I understand your frustration.

Revisions are the worst data in some ways because it is collected by one company then passed to another for dissemination to platforms like P123. We are still ahead by using a disciplined approach the incorporates albeit non-perfect data.

Daily data updates will help the issue of data timing. But it will tax P123 to keep that working smoothly from my experience in doing so.

Did you know that Compustat had to buy back it’s own data from Charter Oak to create its PIT database early on? That is one reason I tried using CapitalIQ PIT data. I have also tried Bloomberg PIT data and they have dropped several items that I was utilizing. That is frustrating! All the sources are challenged. FactSet’s AlphaTest is not PIT but is still used by some firms (to my knowledge).

When data is released in an SEC filing, the biggest companies companies are usually updated within a day. Smaller companies can take days to get loaded into the master database. Hedge funds do this themselves if they have positions and analyst teams. We may not see smallcap data updated for over a week in some cases based on P123 weekly updates.

The cost for me to ‘level up’ is to go from P123 to ClariFI Express at 20 times the cost, which is too much for me.

All that being said, I am glad that you are as knowledgable and openly concerned about the quality of the data and the platform. I strongly believe from years on the site and interactions with the P123 team that they are doing the best they can for us.

InspectorSector · May 19, 2015, 1:22am

Yes I understand - I’m inventing a new condition called “keyboard rage”. My understanding is that small companies can take 4 weeks.
Steve

Jrinne · May 19, 2015, 1:32am

X

mgerstein · May 19, 2015, 4:21am

There is no movement away from PIT.

Actually, there absence of common sense is a critical flaw that can doom events most seemingly statistically valid models into out-of-sample hell. No less a person than William O’Shauhnessy, in “What Works onWall Street” makes that point absolutely clear.

Absolutely false. Bad ideas fail miserably. Good ideas succeed. The investment community has experienced this again and again and again, and there is ample academic research demonstrating this and as to the latter, I’ve experienced impressive out of sample performance by adapting good research published 20 or so years ago. To get started in learning about how to recognize and develop good ideas, I suggest you check the articles I published in the last few days on TalkMarkets.com; links were presented by me in an earlier forum post. I’d be happy to discuss those ideas (as well as others that will be in forthcoming installments) in the forum.

Agreed. The issue, though, is what “data” actually is, what it isn’t, how it comes into being, and its role in models. No matter how good the data is, if you don’t understand the nature of the data and how it can be used, your chances of success will be low. My suggestion to people who are still concerned is to read the white paper I posted in the help section back when we switched to Compustat. Then, after people read it and think about it, we can have forum discussions amplifying the concepts and addressing question. Importing ideas regarding data, research, etc.from other disciplines without understanding the nature of this specific discipline does nobody any good. And that, by the way, would apply in any field. You can’t go onto a tennis court or swinging a racket as if it were a baseball bat and expect to succeed. Understand the nature of the arena etc. in which you’re operating.

While I’m not going to engage in any arguments, I am eager and anxious to help all who want to learn in developing an understanding the nature of the data upon which we rely and how we can make profitable use of it. My suggestion is to read the data white paper and raise questions on the forum.