An update on our ML features and how we plan to integrate ML on P123. There are several threads on the topic in case you want to read more. If I missed some important posts let me know.
Python code for calling 123 API - https://www.portfolio123.com/mvnforum/viewthread_thread,12580
Nice ML integration on FactSet - https://www.portfolio123.com/mvnforum/viewthread_thread,12693
Machine Learning for Factor Investing - https://www.portfolio123.com/mvnforum/viewthread_thread,12733
We are currently deciding the best way to have something user friendly enough while maintaining flexibility for power users, that won’t require huge effort from us, and won’t cost the end user too much to use.
Cost
DataRobot was helpful to show our data scientist an example with real data and to get a sense of the costs . He uploaded the sp500 stocks with about 60 features for 10y and he burned $500 credit pretty fast. So back of the envelope estimate to train a model for russell 3k you would spend maybe 18x that (6x for 3K stocks and 3x for different feature combos). So that’s ~$10K which seems expensive since we only used CPU intensive algorithms as far as I know. No GPUs or TPUs. We’re going to try different clouds to compare. DataRobot is likely the most expensive. And, if we only require CPUs for these algos, then doing it in our own servers will be the best way to make it affordable. We would just use a cloud for peak usages.
How it will be cohesive and easy to use …
There are too many ways to screw up if you are downloading from P123 , uploading to train, then downloading predictions to use . So we need this to be seamless. The bare minimum simple integration involves these components
- A front end to create a feature set and target with some simple tools to examine data and transform it
- A front end to kick off the learning: universe selection, periods, models, # of “cores” used
- A front end to examine results
- A way to use the trained model in P123 systems which will be just another function
Notice nowhere in the use case above is the API mentioned because it’s all behind the scenes. The learning part will be (initially) in a cloud service like AWS in a P123 account. We would simple charge the user for the cloud cost + some profit factor. The great thing about the integration is that you will be able to use the actual values if you want , not the ranked ones, since there’s no data license issue (data is never downloaded)
For advanced users that want to use the API we will add a way to import the results back into P123 so they can runs backtests with predictions or run a live model.
That’s the current direction. Let us know your thoughts.