Winsorize Factor Output

Hey,

Just finished up reading “In Search of Distress Risk” by Campbell, Hilscher, Szilagyi.

I was hoping to test out their multi-factor model for predicting bankruptcy. The trouble is, they propose seven factors, each of which is winsorized at the 5th and 95 percentiles.

Anyone know if/how we can winsorize the output of individual factors? Stocks scoring beyond the 5th or 95th percentile scores for a factor still need to be included, so a screening rule wouldn’t work. Somehow I need to handle the outliers for each factor score, not eliminate them.

Any help on this would be great.

Thanks in advance,
Ryan

You can use the Aggregate function in a rule:

Aggregate("formula", scope [, method, outlier_pct, outlier_handl, ex_zero, ex_adrs, median_fallback ])

The specific parameter that you need is outlier_handl. it determines how the function deals with outliers. It defaults to #Exclude (just exclude them), but it can be changed to #Winsor to set them to the highest/lowest non-outlier value.

So if you wanted to average SalesTTM against the entire universe winsorizing 5% at either tail, it would look like this:

Aggregate("SalesTTM", #All, #Avg, 5, #Winsor)

Note that this function is subject to the normal proviso that you can’t nest quoted strings. You may be able to create custom formulas as a workaround.

Hi Paul,

Thanks for the reply.

I looked into this prior to posting. The issue I have with the aggregate function is that it will (as I understand it) calculate the average factor score for your universe. What I was looking for is to take every stock in the universe and calculate the factor score, except in situations where the factor score for the stock is an outlier. In this case, I was looking for the individual factor score to be winsorized.

Unsure that the aggregate function could be used for this as I don’t really care what the average factor score is for the universe. I thought I would post this to the forum just to see if anyone had found a workaround for this.

I appreciate your reply though, thanks so much.

You should be able to approximate this by using the Aggregate function (mean calculation) in conjunction with the ZScore (standard deviations from the mean).

You will want to calculate the mean of the stock universe using the Aggregate function, setting the outlier trim to 0% (Don’t use the Winsor option).

Then calculate the ZScore to determine how many standard deviations from the mean that the stock is. If the stock is more than +/-2 standard deviations from the mean then you need to adjust the individual stock value so that it is 2 standard deviations from the mean. 2SD = 95%. You should be able to do this knowing the raw factor value, mean and number of SDs.

I don’t use these functions much so I might have this wrong, but it is a possibility for you to try.

Steve

Isn’t that what the this does?

Paul, thank you. I learned something about aggregate from your response–even if I am still missing something.

Jim - I believe that he wants the individual factor values clipped at 5% and 95%, not the aggregate.
Steve

Ah. Got it Steve. Thanks. You are right–thought I was probably missing something.

Be thoughtful about this sort of thing. We added winsorizing because it’s a thing and a lot of folks here with knowledge of statics want to have the capability. Realistically, though, you always need to be sensitive to the peculiarities of the domain in which you’re working.

Distribution of financial factors and data tends to be more than non-normal; these distributions are often downright psychotic and extremes can be so extreme, even aggressive winsorizing accomplishes nothing. * * * Study the data-set before you decide. * * *

This is why I never care about how a rank performance test per se looks. To me, everything starts with delineation of the sub-universe with which you want to work (pre-qualifying) via use of screening/buy rules. If a ranking system is effective in allowing you to sort your way down to a manageable number of final selections, the ranking system is good and should be used. If not, discard it (for now but it may work great with other kinds of sub-universes).

If you study the data enough, you’ll see that none of the statistical refinement techniques are likely to work, and that you need to get bolder through screening/buy rules.

Ryan - I’m interested in the results you get. BTW - you might be able to simply use ZScore if you can work with standard deviations from the mean instead of raw values.

Steve

Steve,

Working on it, will let you know. The ZScores could definitely work…great suggestion.

The model is based on logistic regression (binary 0 or 1 output). So while I agree that a simple ranking system would have issues with cutting off the fat tails in a non-normal distribution, that isn’t an issue with this model. In addition, my interest in the study was simply to see if I could replicate the findings by following their methodology. Better to test it empirically (that is what Portfolio123 is for) rather than assume right off the bat that it won’t work lol.

Thanks for the feedback.
Ryan

A Ranking based on the paper does a decent job as a Quality rank.