FRank factor questions

I understand this splits the factor into percentiles but is there something that I can use to split up a universe and just take the top bottom/half or ~30%?
My understanding is that Frank doesn’t split the universe evenly

So it seems that sometimes posting a question makes me think much harder and figure it out on my own!

MktCap > FMedian(“MktCap”, #previous)

…does exactly what I want

I think FRank will aggregates NAs in the 0th percentile, which is why you may see the bottom half is larger than the top half.

Is there anyway to break apart the universe in better increments. I’ve just realized that limiting it to top 50/bottom 50 using FMedian is a bit restrictive

FRank(“MktCap”,#previous,#desc) > 70 or whatever number you want.

Not quite. See the documentation below. There’s no way to override this at the moment

Special cases:

NA’a are always placed at the bottom of the array and all get the same percentile. The percentile assigned is the next percentile below the last meaningful value

Equal values get assigned the same percentile. The next non-equal value gets a percentile equal to: ( percentile of the equal values) minus (number of equal values * rank-delta)

Thank you, Marco, for clarifying.

Does it operate the same way regardless of the sort (i.e., desc or sac)?

There’s also the Aggregate function. It’s kind of designed for this stuff. By default it takes the average of the universe after lopping off 16% at either tail of the data after discarding NAs:

MktCap>Aggregate("MktCap",#previous)

Yes, it does. In fact, I generally advise ALWAYS taking the top of the sort to avoid just these sorts of problems with the NAs.

That is, if I want the lowest 20% of the universe by P/E, where generally the lower the better, the first guess would usually be:

FRank("PEExclXorTTM")<20

But I have no idea how many or few there are that are NA in that factor. If EPS is zero, then the PE is NA, so it’s a non-trivial case.

Instead, I would advise:

FRank("PEExclXorTTM",#previous,#asc)>80

There will be no NAs involved at all, because they’re still going to be ranked less than 20.

Parenthetically, you can also avoid the problem with EarnYield. :slight_smile:

This has me wondering why NAs just wouldn’t evaluate to NA.

“Errors should never pass silently.
Unless explicitly silenced.”
PEP 20

I realize this could dramatically change how ranks work across the ecosystem, but it’s not really problematic since the average function already treats NAs as null values. Weighted percentile ranks of percentile ranks are numerically equal to the percentile rank of the weighted average.

Because by intentional design, ranking systems never eliminate anything. Instead, ranking systems put everything into order, and then the user explicitly removes what they don’t want through screen, universe or buy/sell rules, depending on the tool. Effectively, we pass elimination of NAs onto the user.

“NAs” are comprised of two types of results: Database nulls; and errors, generally divide-by-zero errors. Database nulls aren’t errors. In fact, they’re potentially information-rich. The example that I usually give is a null in debt lines, which indicates that the company doesn’t have debt at the least. In fact, knowing how the sausage is made, I would probably interpret a zero as “the company paid off all its debts” and a null as “the company never had any debt in the first place.” But I can’t know that for sure without looking at the company’s statements directly.

From the P123 product perspective, consider what you would want to have happen if you’re actually looking for low-debt companies. Eliminating all the NAs would mean eliminating the companies that grew organically through equity investments. That’s probably exactly what you’re looking for, so it would be an incorrect result.

As for the errors, they are generally an accident of genuine information coming together in a way that’s problematic computationally. Not always, though. Beta can revert to NA with too short of a price history, for example. That has meaning.

And finally, remember that we’re talking about this in the context of FRank or FOrder, but when you use many factors together in a ranking system then Beta, debt or PE are going to be one of several factors. You don’t want to eliminate an entire company because one factor in ten was null.

I am not talking about eliminating companies from the universe.

Conceptually, a multifactor rank might look like:

FRank(“Avg(NA, Rank1, Rank2, NA, Rank3)”)