New data provider! Global stock data!

We are pleased to announce that we have entered into an agreement with Factset to supply us with data for stocks all over the world.

Beginning in the second quarter of 2020, all subscribers will be able to use not only the S&P data we have currently available, but also Factset’s data for North American stocks and ADRs. In the third quarter, we will make data for stocks on the European, Asian, and Australian exchanges available to any subscriber who wants to access that data (we have not set pricing yet, but it will be reasonable). We will be cutting off access to S&P data at the end of June 2020, by which time the transition to Factset will be complete.

Here are a few of the advantages subscribers will see:

[]Longer backtest periods. Rather than the current fifteen-year limit, ultimate-level subscribers will be able to access twenty years of data, and perhaps more.
[
]Improved classifications. Factset uses its own classification system, RBICS, which will replace GICS. This classification system has a number of advantages over GICS. To quote Factset, RBICS offers “a comprehensive structured taxonomy to classify companies by what they primarily do. RBICS delivers a granular view for investors by classifying companies using a bottom-up approach according to the products and services they provide. By combining this approach with a top-level grouping based on companies’ behavior similarities and stock co-movement, FactSet RBICS delivers unprecedented precision.”
[]More data fields. We will be introducing more line items, more estimate items.
[
]No license required for professionals. Professional users will no longer require a license with a data provider in order to access Portfolio123’s data.

The primary reason for the change was that S&P wanted to change the ground rules of our partnership to such a degree that it would have effectively put us out of business. S&P wanted to shorten our backtest period to ten years of fundamentals and five years of estimates; they wanted to require that all users, not just professionals, register with them for pre-approval; they wanted to require all professionals, no matter how low their AUM, to purchase a license from them (their going rate is around $20,000 a year). Switching database providers became not only desirable, but necessary.

As we transition to a new data provider, there will be bumps in the road. Many stocks will be reclassified in different industries. The dates on which announcements and statements became available will change. There may be a few fields that we will no longer be able to offer. The backtested performance of your models will change, sometimes drastically. But we will have all hands on deck to help our subscribers deal with these issues.

And our subscription prices will not increase. Our European and Asian/Pacific subscriptions will be priced comparably with our North American subscriptions, but subscribers to more than one region will receive substantial discounts.

This could make it a very good thing—especially with access to an API which will hopefully become a priority. I assume the data downloads will be less restricted?

-Jim

Oh, exciting times! So it sounds like the S&P data will still be available to subscribers. Is that right? If not, is there an anticipated end-of-life for the dataset?

PS
I hope your digital marketing guy starts soon and has input on naming things - “ultimate-level subscribers” sounds weird.

Walter

Walter:

I defer to Yuval for any corrections.

-Jim

Oh, I missed that. Thanks! But in all fairness, I just got back from an eye exam and my pupils are still ultimately-dilated and everything appears super-duper fuzzy.

Yuval,

You mentioned estimates

Do you know what estimates are provided by Factset? I mean by this Thompson Reuters? I assume it is not CapitalIQ (and hope not Zacks).

Maybe Factset does it’s own.

Thanks

-Jim

Yes, FactSet provides estimates that they collect themselves.

My initial reaction is sweet and sour. I appreciate there was no real choice considering S&P behavior. If anything, it did prove that the engine that the P123 team has setup is arguably better than ClariFI. In a weird way, well done!

Pros: finally Intl data at reasonable price

Questions / worries:

  1. Is factset also PIT?
  2. Is factset as reliable as S&P Compustat (particularly regarding PIT)
  3. Closer to the time we will need to have detailed mapping from old → new functions as well as GICS <-> RBICS. I have live strategies that use GICS codes so that will require some serious redoing
  4. " The dates on which announcements and statements became available will change" → if PIT data, how is this possible?
  5. I assume we will still have ETFs → how about the “virtual” ETF time-extension?

Finally thank you to the P123 team for giving us an early heads-up. I now know that I need to plan some time for updating my live strategies in Q2 2020!

Thank you

Jerome

We don’t have definitive answers to some of these questions yet. We do know that FactSet has a very good reputation. But we have to study the data closely in order to determine what drawbacks this data has over S&P’s, if any. I can, however, explain a few things.

We will definitely map all old GICS codes to the new RBICS codes. But the mappings will not be entirely exact, and there will be many cases in which the components of an industry will change drastically. For example, let’s say company ABC and XYZ are both in the entertainment industry according to GICS. Those companies might be in two completely different industries under RBICS.

What I meant about “the dates on which announcements and statements became available will change” is this: Let’s say company XYZ announces its earnings on November 14, S&P takes five days to process this and changes their data on November 19, and Factset only takes two days and changes their data on November 16. So our database currently reflects the data changing on November 19 (with November 14 through 18 as “stale statement” days), and down the road the data change will be on record as taking place on November 16 instead.

The ETF time-extensions will remain in our database because we created those ourselves.

  • Yuval

I suggest to launch a beta site with new data as early as possible so that p123 can get help from existing user on the testing. And as user to get hands on data as well. Thanks for the effort and communication.

-gs3

I second gs3. Great idea.

Jerome

When was the P123 method developed? 2004? Before?

It remains a good system. But it is also true that there have been advancements in the last decade-and-a-half. At least FactSet thinks so.

FactSet:

“Instantly interact with data from the Marketplace in a fully hosted environment that includes industry standard databases, programming languages (Python and R), and data visualization tools.”

Note the adjective “industry standard” applies to programming languages Python and R in this sentence.

And this:

“….integrates advanced automated machine learning technology allowing you to build, deploy, monitor, and manage sophisticated machine learning models quickly and easily.”

I know P123 works largely based on out-of-sample data since I joined.

If I have to start from scratch I would like to have access to some modern tools (if possible) for validation of a new system.

FactSet finds some of the more modern methods valuable for advertisements, if nothing else. I certainly would not turn off any developments since 2004 based on the philosophical beliefs of one or two P123 members alone.

I cannot imagine that they would charge too much for the Open Source Python. But if they do charge a lot extra, there is the API, perhaps. I would like to be offered these options at a reasonable price.

Anyway, it won’t be 2004 forever. I do think that ranking will always be valuable because it is a HUGELY SUCCESSFUL WAY OF REDUCING THE NOISE IN THE DATA so I do not think it will be hard to catch up to the year 2020.

This could be an opportunity for all of use to learn a new trick or two.

-Jim

As per Wikipedia: " In 2008, FactSet bought a copy of Thomson’s fundamentals database,[27] securing permanent access to global financial data going back to 1980."

Looks like pretty serious (seriously good) data.

I do not know if an API would be possible: I do not know how FactSet operates but I get that SP500, ultimately, will not give you access to any tools that work. Proof that P123 does work in its present form.

Please, just look at some way to have the data interact with a Python program at some point—even if none of the output can be downloaded. Just if you can as you look at any options and finalize any negotiations.

There are a lot of Python programs and any general opinions about Python will be wrong about some of the specific programs–without getting into a philosophical debate.

Note: Pandas in Python was developed by AQR Capital Management specifically for Quantitative Investing. Something that FactSet seems to recognize when they call the program an “Industry Standard.” Theoretically, there could be something there for someone. If not now, maybe someone could (just theoretically) develop something useful in the future. Maybe even someone at P123.

I do get that I am hoping that FactSet is not like the SP500 (and my banker) who only lends someone an umbrella when the sun is shining. When it cannot really be used effectively.

More about their data:

“The company receives data from providers such as Barra, Dow Jones, Russell, and Lipper. Since contractual relationships with third-party vendors can be terminated with one year’s notice, the company tries to maintain relationships with at least two vendors for each type of data.[25] Recently the company has attempted to maintain or increase its available content by building its own databases or by acquiring content providers.”

And about their size (potential scope):

For 2014, FactSet’s 36th year of operation, the company recorded its 34th consecutive year of revenue growth.[23]

For the fiscal year ended August 31, 2014, FactSet revenues increased to $920 million, growing 7.00% year on year.

-Jim

Hmmmm……

Looks like someone at FactSet already found some of the things that can be documented to work with Python.

Guess I was not the first to discover it which makes sense because it is pretty obvious really. Since it turns out not to be such a big secret and since those reading this will probably never use it anyway……Plus, everyone’s implementation (and control of capital) will be a little different (and limited) should anyone at P123 ever use this.

[color=darkblue]IN AGGREGATE (including investors outside of P123) THIS IS MAY BE WHY P123’s TECHNOLOGY IS NOT WORKING AS WELL FOR US IN 2019. [/color] I can show some academic articles that suggest this.

From FactSet’s site. I could be more specific, uhh after a letter of intent and an NDA.

Just to summarize: It won’t be 2004 forever [color=firebrick]AND WITH CHANGE COMES OPPORTUNITY.[/color]


Yuval,

Is Mexico one of the supported countries?

HOLY CRAP THIS IS AWESOME

Python is fully compatible with CSV files, and since all our downloadable results are in CSV format, I’m not sure what the problem is. Perhaps you can put this in a feature request, or at least start a new thread with specific, concrete examples? I don’t think this particular thread is a good place to discuss Python interactivity. Please also specify which limitations of our output prevent it from meshing smoothly with Python-coded programs, what sorts of Python-coded programs you would write that would use this data, and whether your request would need to be modified in order to apply to other programming languages.

In general, P123 is trying to serve investors, not programmers. But if there are enough programmers like you who find our output frustrating in some way, we could consider use cases.

See feature suggestion as requested.

P123,

when I look at the link that Wwasilev posted on the Python thread (https://www.quantopian.com/docs/data-reference/factset_fundamentals), I get some concern on the PIT quality of FacSet.

Here is an extract from the web page →

“Point-In-Time
Starting in June 2018, FactSet fundamentals data is collected and surfaced in a point-in-time fashion. This corresponds to when Quantopian started downloading and storing FactSet fundamentals on a nightly basis. Prior to June 2018, FactSet fundamentals data is surfaced in simulations based on the report date provided by the vendor…”

Any thoughts?

Thank you,

Jerome

If I recall it’s the way Factset deals with prelim data. I think they process a press release and fill in, for example , half the line items. When the actual filing is processed the prelim data is overwritten and there’s no way to know which items were not present. My guess is that Quantopian is keeping both versions of the statement, similar to what Compustat does, to be more “precise”.

These incomplete prelim statements are the reason why we have fallback mechanisms in P123 . It’s all very complicated and avoids NA’s, but is it really more “precise”? In between prelims and final filings it’s quite easy to end up with a mix of lineitems, some from the current prelim quarter , some from the previous final one. Then you have SEC filings which everyone can see at the same time but standardized data is made available from data providers at different times depending on their processes.

And then of course you have ticker action reacting while the press release is being read.

So what’s all this “precision” for?

Rest assured we’ll think about it hard. We’ll use all our experience to figure out what’s best for our purposes. Maybe avoid buying or selling a stock if it’s in the prelim period might be the right answer all along – even with Compustat data.