Winsorize

I am wondering if any data is Winsorized or truncated? This includes raw data and data computed by P123.

https://en.wikipedia.org/wiki/Winsorizing

Thanks!

Vineeta,

You might take a look at ZScore and see if “outlier_trim” does what you are looking for—or close enough for your purposes.

-Jim

Our industry aggregates are truncated, 16.5% of the population on either tail. We describe it as a “fat median”.

If you want to roll your own, the Aggregate function – which pretty much reveals the industry aggregate code for end-user use – lets you Winsorize and/or truncate.

Aggregate("MktCap",#all,#avg,16.5,#Winsor,FALSE,TRUE,TRUE)

So breaking this down, the quoted string is what is being acted upon. In this case, it’s getting market cap. (My examples are rarely imaginative.)

The #all is the population that you’re acting upon. In this case, #all is the entire universe. Other common things to put in there are #sector and #industry, both of which are pretty self-explanatory. The MOST common thing I put in there, though, is #previous, which means that it will act only upon those stocks that have passed the prior rules in a screen. (It’s synonymous with #all as the first rule.)

The #avg means to take the simple mean. You can also use #CapAvg, which will market-cap weight the average.

The next parameter, 16.5, is the elimination of outliers at either tail in percentage points.

You can Winsorize with the #Winsor parameter. The other option is #Exclude, the default, which just eliminates anything at either end of the tail.

In order, the next three are exclude zeroes, exclude ADRs, and median fallback. The first will drop all zeroes from the calculation – for fields like dividend yield or total debt, where you’d probably like to do that. Exclude ADRs…excludes ADRS. And median fallback is for cases where the industry is so small that the elimination is problematic. The one that I always use as an example for that is Air Freight and Logistics (GICS code 203010), which has only 19 companies.

Everything after the scope (#all) is optional. The defaults are 16.5, #exclude, FALSE, TRUE, and FALSE.

Thanks Jim, Paul!