IMPORTANT: Pre-built Industry factors will be using a simple average by tomorrow

Hello,

A while back we announced a new function “Aggregate()” that allows you to calculate your own factor for a GICS group (industry, sector, subindustry, etc). It allows you to specify a cap-weighting or simple-averaging of the values after an outlier trim is performed. In the announcement we mentioned that we will change the pre-build “Ind” factor to use a simple average method to avoid volatility caused by large-caps falling in&out of the outlier cuts.

The new, pre-built “Ind” factors located in INDUSTRY FACTORS in the reference (about 66 factors) will be live tonight after the reload. Nothing else is affected. If you use the factors like PEExclXorTTMInd your simulations re-runs will change. If you prefer to keep using an industry factor with cap-weighting please use the Aggregate() function.

Let us know of issues/concerns. Thanks

PS. Some of the documentation in the HELP section still mentions cap-weighting for industry factors. We will be updating the docs tomorrow.

Marco - correct me if I’m wrong but this is the second time the industry factors are being changed? Two of my Market Neutral R2G Models use industry factors, and last time they were changed (I believe at the beginning of the year), the results for one of my models was negatively impacted according to my simulations. I was never able to get back to my previous results and there was no old version of the factor to revert back to.

I have to ask why you have deviated from the previous policy of labeling the previous version of a factor “Old”, and keeping both. It seems you are about to impact my R2G models again and I have no recourse except for attempting to revise it. I can’t revise one of them for a few months.

Steve

I missed the announcement about the aggregate function. And it is impressive; makes a lot of sense. But I agree with Steve. The existing factors should have been kept as-is for backwards compatability. New factors could have been added or users could have simply utilized the Aggregate function to tap into this new functionality. I also utilize several industry factors and my simulations that utilize them are impacted this morning. It’s not a large movement, but enough to require another look.

Hi Marco,

By coincidence I noticed a negative impact to a model that includes PEExclXorTTMInd. Can you provide a little more color for those of us who are less sophisticated?

For example, can the former “Ind” variables be recreated for PIT study using the aggregate function? If so, please show how exactly how to do so.

Also, presumably p123 made this change because the new variables will be more robust. Have you investigated that from a statistical pov or is it just assumed to be so?

It’s very upsetting when a sim/portfolio produces different results overnight. Usually it’s because a user changed something and forgot to change it back. Of course it’s really, really great that p123 keeps working on its data and the tools it provides, but in a case like this, it can be very unsettling. And especially when real money is on the line!

Please advise, and thanks very much.

Hugh

A bit of history on the “Ind” factors …

Long time ago, when P123 started, we downloaded pre-built factors from Reuters since we did not have a “ratio engine”. As we were preparing to switch to Compustat a few years ago, we had to create our own ratio engine and write little programs for each factor. But we could never really duplicate Reuters “Ind” factor (even with Reuters data). We asked Reuters for the specs on the Ind factors, but they could not find them: the programmer/team that did them was long gone. We tried to reverse engineer the algorithm but it was never consistent. In some industries we could come very close with a combination of cap-weighting and outlier trimming, but those same settings would fail in other industries.

So, were Reuters Ind factors created with an amazingly good, adaptive, and complex algorithm that gave precise representative values for an Industry? Unlikely. First, of all they would not loose the specs of this amazing program. Second, there’s no such thing as a precise representative value for an Industry factor. Marc can chime in on this, but we decided that it was better to give users a way to do their own with the Aggregate() function so they can decide their own algorithm, trim, etc.

However, pre-built factors are handy… Writing PEExlXorTTMInd is much easier than the equivalent Aggregate(“PEExclXorTTM”,#Industry,#Avg,16.5). Plus we want to add more pre-builts, like PEExclXorTTMSec or PEExclXorTTMSP500, etc.

So we need pre-builts Ind factors for simplicity , and we launched them with cap-weighting and 16.5% trim. When the PIT charts launched, and we could visualize factors like PEExclXorTTMInd, it became obvious that cap-weighting with 16.5% trim creates too much volatility. In fact, anything cap-weighted with PE will cause volatility when a big cap moves in/out of the outlier region. But it’s possible that cap-weighting is right for other factos like performance. And it’s also possible that the outlier trimming should be a variable depending on the size of the industry. And so on and on.

In other words… Industry factors is a deep hole and very likely there’s no ideal solution. Switching algorithms, and changing users backtest results is never popular, but it does start the discussion. But it should not “destroy” a good system. If it does, it is more likely because the system was curve-fitted.

All matters relating to cross-sectional comparison (including the Ind factors in past and present incarnations) are addressed in the Help section in the pdf entitled “Peer Comparison in Portfolio 123.” Specifically, pages 10-11 explain the Ind function and show you how you can recreate what we do now and what we did before using the new Aggregate function.

Marc - can you provide a link to this pdf? I’m having trouble finding it. I did find https://www.portfolio123.com/doc/IndustryBenchmarks-Methodology.pdf which you should probably update as soon as possible.

Marco - I have some basic problems with what you are saying. First of all, I believe in repeatability. If I can’t repeat the test results I generated a week or month ago then I have a basic problem. Repeatability is step #1 in establishing a solid platform. In the case I mentioned regarding my Market Neutral model, I could not repeat the results from the original backtest from a couple of months previous. This is just one case. The majority of my R2Gs have backtest results that I can’t repeat. I find this quite disturbing. The amount of degradation is not as important as the fact that the results can’t be repeated. The fact that you are playing with algorithms means that there could be other reliability issues that are masked but you will never know.

In addition to lack of repeatability, some industry factors used in buy/sell rules may have very useful absolute values. These factors may be used in an industry valuation as an example.

I am not denouncing the fact that you found deficiencies in an algorithm and want to change the algorithm. I am complaining about how you went about this. In the past you always created an “old” version which was put in place in existing ports/sims/models. Now you are not doing this.

I am assuming that you will be allowing a revision of R2G models now, ignoring the 6 month wait period? After all, you have changed the Industry algorithm twice in under six months.

Steve

Steve,

The magnitude of the differences is important. Let me know if there are big differences. I can also edit your system to use the Aggregate function instead of the “Ind” if you like.

Also, exact repeatability will never be possible. Compustat makes fixes too. We run tests every day to see if there are differences against a set of snapshots. Every week we find differences which lately have all been caused by Compustat fixing old data.

Hopefully with the soon to be launched rolling tests (with optional noise rules) for the R2G presentation re-do (but also to help you designing your own models), a better mindset of simulated results will be used: one where simulations are interpreted as an “envelope” of possible results, not an precise annual return figure.

PS. creating “IndOLD” factors was not an option with 66 of them.

Marco,

Thanks for the continuous improvements to P123. Looking forward to the envelop. I have always been more comfortable with ranges as they make the risk more transparent as compared with point estimates.

Regards,
Mukesh

Thank you for the explanations Marco. I think I will wait until the R2G changes before making any updates. Any idea yet when the changes will be released?

Also, I’m curious to know if the rolling tests are new sims starting on different dates, or same sim but calculating the performance from a different date. If it is a set of sims starting on different dates there could be an issue. At the start date, all stocks are bought to fill the portfolio. Depending on the model design, the stocks bought could reach deep into the depths of the ranking system, thus resulting in lower quality holdings than if the stocks were acquired over time. I view this as a problem and I usually discount the first year (depending on turnover) of a sim when I’m looking at the results. It is also a problem for model startup. I am really happy with the way models start up now, continuing on from where the sim left off, same stock holdings, same internal variables such as NoDays, PctFromHi. If these are not maintained then models can have a bad start and may not be a true representation.

Steve

Marco & All:

I totally agree with Stitts's remarks about repeatability.  If results are not repeatable, how can they be reliable?  I "get" that there are going to be issues with Compustat.  This is unavoidable due to them "messing" with the data. [That is one of the reasons I was not crazy to move to them.]  That being said,  I, and I think the vast majority of P123's members, would appreciate it if you would not add to Compustat's variability.

We invest millions of man-hours in our simulations and Port’s, under the general assumption that past performance is history.

 I agree with your remarks about optimization.  I do do  sensitivity tests and analysis.  The problem is not that the Sim's and Port's fall apart.  That is now rare since the database has settled, but they do drift.  It costs time and money.  To the extent that it is the result of P123 policy and not Compustat data issues, I resent it.

 Overall, I am happy with P123.  I use it to make money.  In general, it's a wonderful service.  Thank you for creating it.  The intent of my comments is not to be negative but to point out opportunities for improvement.

  Bill

Yes Marco, don’t let the things you can’t control stop you from doing the things you have control over. This bug [url=https://www.portfolio123.com/mvnforum/viewthread_thread,8645]https://www.portfolio123.com/mvnforum/viewthread_thread,8645[/url]
may very well have been seen in the past. In fact I’m certain that other users have experienced it. But the problem is that it was likely explained away as a repeatability issue with the additional comment “you won’t have this problem if your port is well-designed”.

Steve

Hi Marco,

Thank you very much for your detailed reply.

Unfortunately, I am having trouble using Aggregate to recreate results I formerly obtained with “incperempttmind” - shouldn’t that now be:

Aggregate(“incperempttm”,#Industry,#Avg,16.5) ? It doesn’t seem to be working well.

I am a huge fan of p123. However, except for the correction of an out and out error I would also like to see a more conservative approach to changing variables once they have been adopted in models. It’s easy enough to create new variables, isn’t it? I’m sure I speak for many p123 users when I say how unsettling it is to rerun a simulation and achieve different results without knowing why.

Separately, did Marc post the link to the “Peer Comparison” pdf?

Please advise regarding incperempttmind, and thanks!

Hugh

The “Peer Comparison” is in the Help section among “Tutorials.”

To replicate incperempttmind using our current protocol, do this: Aggregate(“incperempttm”,#industry,#avg,16.5,#exclude,false,true). For the old value, substitute #CapAvg for #Avg

Folks,

Repeatability, or replication as the idea is expressed in other research disciplines, is for us a good thing, but it is not an end in itself. To quote the great Kevin O’Leary a/k/a/ Mr. Wonderful from ABC’s “Shark Tank,” our goal here is to help you “MAKE MONEY!” And frankly, you can’t earn a nickel from a simulation, even a great simulation with perfectly repeatable results. You make money by applying your ideas in the real-world stock market using real money stakes. That means live performance, out-of-sample.

Given that, consider what you are asking for when a demand repeatability is so vigorous as to criticize a decision we make to replace a questionable algorithm with a better one. Do you REALLY REALLY REALLY want us to refrain from replacing bad number with good numbers simply in the name of repeatability? Really?!

You cannot succeed here by blindly applying scientific method in the abstract. Scientific method works only if it is combined with and properly applied to the domain in question with due consideration of the latter’s unique characteristics. This domain, investment strategy, depends very much on modeling based on factors that can reasonably be assumed to be persistent into the future. (That’s the genesis of the SEC’s “past performance is no guarantee” mantra.) Ex that, you have no defensible reason to expect to make money from a strategy regardless of what the sim shows.

I this context, go back again and review Marco’s explanation for the change in our approach to Ind. The old approach, the one now being described as bad, was actually developed by me in an effort (and a reasonably successful effort) to come as close as possible to reverse engineering the long-lost algorithm created by a one-time South Carolina based contractor engaged by a company called Market Guide back in the 1990s. (Market Guide was bought by Multex, which was bought by Reuters which merged with Thomson – now you get sense of how lost that spec is). But when the point-in-time charts were being developed, Marco noted some massive and erratic jumps in the Ind numbers over time due, as he explained, to the CapAvg approach. That means you might have a PEGInd figure of, say, 0.85 if you rebalance today, but five days later that same figure might be 2.14, and three weeks late 1.44 and then a month or so down the road, 0.50, and the back up to 3.00. If you have a model that worked with that old version of PEG but not with a repaired version, that’s a sign you need to rework your model, because if you continue to use it, you’ll either get immediate bad out-of-sample performance or a bit of luck for a while and ongoing vulnerability to luck running out.

So would you really prefer to use such a datapoint in a model designed to help you make money in the real world? Really?

Come on. You know PEInd figures should not run that way over time. I have long made heavy use of Ind factors and I invested a ton of man hours developing the now-discarded reverse-engineering algorithm, and I did my share of whining when Marco showed me the then-in-development point-in-time charts and vehemently stated that the Ind algorithm was $&*!@ and that we’d have to fix it. But I agree. No matter how many man hours by me or anybody have been invested, we cannot consider leaving it as is, once we discovered the problem, unless we change our mission and start telling subscribers that we only care about good sims and could care less whether you can make money with them in the real world.

We’re not going to change our mission. Making real-world money is still what counts. And toward that end, our priority remains to give you the best information and tools we can that will assist you to accomplish that goal. And that means we cannot and will not allow a problem we discover to persist.

We believe most subscribers are happy about that approach, especially since it’s not standard in the business world at large and in our area in particular. I still recall back when I was at Multex when I assisted Steve Liesman, the CNBC senior economic reporter, on a major project. Back then, I was working on FactSet (Multex had no platform like p123.) And after a huge number of man-hours (Steve was and for all I know be a major-league a** hole) I thought I was done . . . until Steve called back screaming and cursing about all the f***** up data I gave him. On investigation, it turns out that the great FactSet ($60,000/year for the platform and then you pay separately for a data license) had not been correctly lining up fiscal years for the companies in the Multex Database (and who knows how many others); in other words, if PG’s fourth fiscal quarter ended 6/30 and CL’s ended 12/31, FactSet would simply show PG on a calendar tear basis and compare it’s 6/30 # with CL’s 12/31 # as if they both applied to the 12/31 quarter. How’s that for a royal screw-up! But here’s the point of the story. Obviously, besides having a bunch of our own data guys doing a ton of manual crunching so Liesman could meet his deadline, we reported this as a major bug. After a couple of years of checking and nagging, I finally gave up. I have no idea when they finally got around to fixing it; in fact I’m not longer on FactSet so I have no idea if they ever did address it.

One more note about precision re: the past. Don’t assume from what you may know from other disciplines that it’s the case here. Financial statements are a model of a company’s past performance and a database is a model of a collection of financial statements. We model the past just as we model the future. And any model means difference of opinion. (The, even among the most reputable of accountants, and they’ve got some hot topics on the table now at FASB that could badly upset what all investors do with financials, so badly I offered an accounting professor I know who attends their confabs that I’d be willing to appear their as a “consumer of financial statements” to make a case against the proposed change.)

When it comes to recreating the past, we go above and beyond what most others do in the major must-have matters relating to survivorship and look-ahead bias. That is important. Preservation of algorithms discovered to be flawed is not in that category.

One thing, by the way, that is replicable, is the underlying theory of investing and approaches you can take to applying it. That’s why I so vigorously argue learning this stuff and I’m here to help. I’ve seen and made use of academic ideas based on 1960s-90s data samples that STILL WORK! And even where I’ve encountered some that don’t, I can identify clearly visible changes in the nature of the markets that explain why they don’t. That’s what should be revered.

As far as we know now, all of our data items and algorithms are good. That said, we remain open minded and aware that there is always a possibility we may learn something somewhere that shows us otherwise. So please, let’s not argue or advocate for preservation of anything discovered flawed, not now, not ever.

Updating to the best practice is the way to go. I had stopped working on industry factors due to odd behaviour which I now understand as being due to very aggressive trimming.

I would like to see, in the Help section, results for a “Reference” Rank, Screen and Simulation like Balanced4 run on Russell 3000. This would be hardcoded over a time frame to shows impact of changes. I doubt anyone will use it a lot, but it will show how much things have/can change when datasets, factors, and algorithms change. You can even run it monthly and see if there are changes between two points in time that are not due to anything P123 changed, but due to Compustat changes.