Index | Recent Threads | Who's Online | Search

Posts: 26    Pages: 3    Prev 1 2 3 Next
Last Post
New Thread
This topic has been viewed 859 times and has 25 replies
boat
Re: Nice ML integration on FactSet

Marco,

This sounds interesting to me. Thank you for exploring ML options for P123.

Mark

Jan 11, 2021 10:48:44 AM       
InspectorSector
Re: Nice ML integration on FactSet

Just to be clear, the In-Sample/Out-of-Sample example that they give didn't handle leakage, but maybe there is an explanation of why not if I were to dig deeper. In any case, there are advantages to keeping the data on P123. That part I like. But surely there is a cost for Data Robot that has to be offloaded on to P123 subscribers. And that is probably where you will have a stumbling block.

I have aspirations that go beyond XGBoost. As I mentioned in another thread, I may want to look at creating an AI-based ranking system designer. Think of the possibility of designing RS's based on least square error instead of the ranking buckets that we currently have. And also employ some of the anti-optimization techniques from XGBoost, just not decision-trees. In other words, I want to keep my options open for how to work with ML/AI and not get locked into a platform that does some stuff well.

Steve

Jan 11, 2021 11:02:34 AM       
Edit 3 times, last edit by InspectorSector at Jan 11, 2021 11:05:06 AM
Jrinne
Re: Nice ML integration on FactSet

Just to be clear, the In-Sample/Out-of-Sample example that they give didn't handle leakage, but maybe there is an explanation of why not if I were to dig deeper.

Steve,

I think this is a small thing that FactSet can worry about. There was a portion of the video where the training data preceded the validation data (in time).

This is "causal" for sure. Clearly no look-ahead bias. I get that with a time-series they could do more. Exactly as you say.

I am all for you—over at Colab— fixing this to ensure there is no "data leakage" over at Colab. I understand the issue.

I know you can take care of this over at Colab. It is just a matter of where you slice the data. In fact, I think you have already addressed this.

Now if you and P123 can work on seamless downloads of data to Google Drive we can be up and running with one implementation of this by the weekend;-)

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Jan 11, 2021 11:34:35 AM       
Edit 2 times, last edit by Jrinne at Jan 11, 2021 11:36:06 AM
InspectorSector
Re: Nice ML integration on FactSet

I actually have no problems using Data Robot or other software that Marco digs up. Just be wary that trendiness (Factset using ML) doesn't imply success. A case in point, Quantopian... It was very popular with almost everybody. But they couldn't make it work after years of trying. The fact that everybody was using it didn't mean they were headed in the right direction. Data Robot may be very popular and very professional for the most part. But all it takes is one tiny aspect not handled correctly and you've got nothing.

Jan 11, 2021 12:07:10 PM       
Jrinne
Re: Nice ML integration on FactSet

I actually have no problems using Data Robot or other software that Marco digs up. Just be wary that trendiness (Factset using ML) doesn't imply success. A case in point, Quantopian... It was very popular with almost everybody. But they couldn't make it work after years of trying. The fact that everybody was using it didn't mean they were headed in the right direction. Data Robot may be very popular and very professional for the most part. But all it takes is one tiny aspect not handled correctly and you've got nothing.

Steve and Marco,

If we can afford DataRobot or P123 can get us access to it then I am ALL FOR IT.

But Steve, I think Marco is just saying he likes what you are doing with XGBoost over at Colab and is getting different ideas on where to go with this (Colab and/or elsewhere).

Marco can clarify but I cannot imagine DataRobot is an option for us. Actually, I hope I am 100%, 180 degrees wrong on whether we could get access to DataRobot.

But Steve, you might declare a victory on this and keep going over there at Colab if I am right. I do not think we can afford DataRobot.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Jan 11, 2021 12:15:56 PM       
Edit 1 times, last edit by Jrinne at Jan 11, 2021 12:17:01 PM
marco
Re: Nice ML integration on FactSet

DataRobot is just a cloud service for ML. They help companies use ML, has a seemingly easier, no-code, front end, and they rent out their ML instances for big data. There are many others, and many are getting unicorn valuations . DataRobot valuation is $2.7B , or around 25% of FactSet! This last fact alone is reason enough for p123 to get involved in this.

I doubt DataRobot would even speak to us to do a proper integration (JV) . We're too small. But you can certainly signup as a user to rent their ML instances using the data from P123 and take advantage of their slick interface and pre-built model blueprints. They offer $500 credit when you signup up. I bet you can do a lot with $500 credit. We don't generate terabytes of data. Perhaps 1 gigabyte at most. To train a model with a gigabyte of data can't be that expensive (they have to compete with the many ML cloud services out there).

DataRobot probably can charge a premium because of their front end and their "model blueprints". For example their blueprint "Light Gradient Boosting on ElasticNet Prediction" has about 30 settings. Things like "subsample_for_bin" "min_child_weight" "min_split_gain" "reg_lambda" "max_delta_step" and so on. You need a data scientist to understand these. But judging from their demo they have reasonable defaults set for FactSet users. So, at the very least, we can use DataRobot to test a lot of these models and default values and just focus on the ones that work well for us. My guess is that each of these model blueprints is based on open source libraries and easily reproduceable elsewhere.

Portfolio123 Staff.

Jan 11, 2021 1:22:54 PM       
Jrinne
Re: Nice ML integration on FactSet

Marco,

I was impressed with DataRobot.

I got some data and spent a few years doing much of that. E.g., XGBoost, Ridge Regression etc.

I am now funding a model using Factor Analysis and going to convert it to XGBoost over the next week or two.

To see several years of work running real-time on multiple processors completed in minutes at most……Still finding the words.

Anyway, I encourage you to keep looking into this and finding the best business model for P123. Not sure I can add anything to the business part of this.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Jan 11, 2021 1:35:53 PM       
InspectorSector
Re: Nice ML integration on FactSet

Jim - I think the business model is for P123 to sell data to users and have them run the models with Data Robot. And maybe the next unicorn will be me (not likely). I could use a few extra $billion.

Jan 11, 2021 1:56:46 PM       
Jrinne
Re: Nice ML integration on FactSet

Jim - I think the business model is for P123 to sell data to users and have them run the models with Data Robot. And maybe the next unicorn will be me (not likely). I could use a few extra $billion.

Steve,

We will see what can be worked out with DataRobot. I will say you do not need it to do XGBoost and adding Ridge Regression or a Random Forest will not help you much.

I like what you are doing a lot.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Jan 11, 2021 2:01:44 PM       
InspectorSector
Re: Nice ML integration on FactSet

"But judging from their demo they have reasonable defaults set for FactSet users. So, at the very least, we can use DataRobot to test a lot of these models and default values and just focus on the ones that work well for us. "

Marco - I just want you to know that my tulip python library takes care of all of the (major) parameters for XGBoost. It is not a problem and saves a tremendous amount of evaluation time. I think that the big problem with the code that I gave out was that it was at too high a level and most people can't appreciate it until they discover how things work at a low level first.

Jan 11, 2021 2:03:36 PM       
Posts: 26    Pages: 3    Prev 1 2 3 Next
 Last Post