Running full simulations through the API

Hi all!

Are there any plans to extend the API to support running strategy simulations? I’m trying to automate a bit of my model optimization workflow, but I’m finding it hard to completely forego using full simulations.

Ole

So when (if) P123 does this, P123 might consider including the ranks (“include composite”) of each transaction in the sim. This would make everything needed for…well, for things like test_user wants to do and any machine learning method (i.e., features) too.

Returns for that transaction in the sim could be used for…again, whatever a lot of people including test_user might use it for. But also for a “target” in machine learning.

You can see another recent post for turning the returns of a sim into excess returns using an Excel spreadsheet: HERE. Or maybe that could be part of a single download using the API also.

Definitely a a good idea. Maybe just put it as an optional download in the sim itself. Charge for the option or make it available for higher member levels if it is resource intensive.

P123 does make the required download possible which is awesome. For some, like me and perhaps others, it is not a single simple download. And may even involve Excel spreadsheets at times? Excel spreadsheet to munge the data and get it in a usable form?

I know I have used Excel spreadsheets as Yuval and Dan have described in the link provided above. And sims.

Jim

I’d like to collect a few use cases for this. Who is interested? What EXACTLY are you interested in doing?

For example, here’s what I’m interested in doing: running a simulation on different universes, varying ranking node weights with each run, and obtaining the .csv file for each run (or be able to parse the raw data for each run using Python or another program). But that’s just me. What are you interested in doing?

“Extending the API to support running strategy simulations” leaves so many question marks. Do you need to be able to set every single parameter in the simulation–including all the optional parameters (e.g. margin, hedging, stop loss, etc.)? What should the output look like? What parts of the output do you want? Some sample use cases would really help clarify this task for us.

Thanks,

  • Yuval

DataMiner is a great tool not available anywhere else. But it could be developed further.

Someone who knows Python and machine-learning could probably work with people like test_user, Michael7, Steve Auger, Azouz etc to get a single array for most machine learning programs in Python.

Something that you could download and immediately upload into XGBoost, TensorFlow, SciPy etc. with a minimum amount of data wrangling.

Thanks Yuval,

My use case is similar to yours: When I’m optimizing a ranking system there is no single test that will tell me which system is best, I need to run many different tests for each weighting. At the moment I run a rolling screener + screen backtest (over a few time intervals) in five universes through the API using python, as a first step. If I like the results, I continue on the web page, running full simulations in the five universes. The screen backtests are useful, but they don’t correspond to how I actually trade, so I need the full simulations.

The first step, on the command line, is efficient and saves me a lot of time (thanks making for the API). But the next step, on the web page (great as it is), is quite slow. Being able to run everything in one python script would be a significant quality-of-life upgrade.

I’d be very happy with the ability to run an already existing simulation through the API, and in principle I only really need to ability to change the ranking system, the universe and the time periode (start and end time of the simulation). So if I can make a new simulation on the web page, and set everything up there, and then run this model through the api with the option of varying the three parameters indicated, I’d be happy.

For output, I’d like the Cagr, alpha/beta, annual returns (as a list) & average turnover, in python. I’ll think about this a bit more, but I think that’s it. I hope others are also interested in such a feature.

Ole

Very cool.

In addition, this optimization of the weights for the ranking system in a sim could be automated couldn’t it? I am pretty sure it can be.

Maybe P123 could find a download (or a few downloads) that are hands-on for Ole and and Yuval and facilitate some automated solutions at the same time. There are not so many resources involved that there would have to be a competition between manual and automated solutions.

Probably, more-automated solutions are already being worked on with the AI solution that P123 is developing. And weighting of the ranks in the sims can be automated with the downloads presently available at P123 with some data wrangling. So, what P123 is doing is pretty cool.

Anyway, I am all for it what Ole is suggesting. I do think P123 should work on some additional specific solutions for Ole, Yuval and others. P123 might also work on developing a few general solutions in DataMiner. P123 could develop some formats in DataMiner that generalize to most programs in Python’s large library of solutions.

Something that could be uploaded into Python and run with little (or no) data wrangling at any step.

In fact, maybe just one download format is needed for immediate uploaded into most programs in the Python library.

Or two, if some people want to use Z-score instead of ranks. If so, P123 might look at that also.

Jrinne, yeah, automating the whole process would be great! Unfortunately I suspect it will be hard. Well, hard for me anyway. The parameter space is large, a brute force approach would not work. I’ve been playing with downloading historical ranks + performance, and using minimization packages in Python and Julia to optimize the weights, but I often get stuck in local minima, and I’m struggling a bit with what to optimize for.

I suspect I simply don’t have enough experience with such algorithms yet, I need to spend more time on it. The same goes for ML, I’ve been testing a bit, but I need to learn more. I guess you could say I’m stuck in the local minima of old-fashioned manual rank optimization - I need to “climb over the hill” to get to the lower(?) minima of ML-assisted optimization :slight_smile:

Ole

Ole,

Thank you for your comments.

I think this can be approached in so many different ways that I have to say I am sure you could do it in an automated manner if you choose to do so.

Staying away from any particularly mathematical solutions I have used, I made a lot of money using P123’s rank optimizer and an Excel spreadsheet provided by Steve Auger.

It was a pretty simple Excel spreadsheet algorithm. I would call it ALMOST and evolutionary algorithm today. But whatever one calls it, it certainly addressed the problem of local minima. Steve Auger is sophisticated and I am going to assume that this was his purpose when he developed the spreadsheet.

So I guess I self-identify as someone who has always been doing machine-learning at P123. I think Steve Auger’s spreadsheet counts as machine-learning. I am going to say it does anyway.

But for sure, I think there are multiple ways to get around the problem of local minima. Yours being a fine way to do it. I have nothing against a little discretion (especially on the selection of the factors to use) and/or manual involvement in the optimization.

Best,

Jim

Ole, Jim -

Thanks for the feedback, which is very helpful.

Ole, are you aware that you can currently download the results of a simulation via the API and get the output you need? Look at https://help.portfolio123.com/hc/en-us/articles/360053182312-Portfolio123-API and scroll down to “Strategy.”

What you can’t do via the API is run a simulation or change its parameters. So it’s good to know which parameters are most useful for you.

Jim, which parameters in the simulation tool are you most interested in being able to change? And is the output we’re offering (the data from the Summary, Holdings (current) and Statistics tabs) sufficient for your use case?

  • Yuval

Yuval,

Ultimately what I want (and what I think would be used by many doing machine-learning) is the download of a CSV file with the following column heads:

Date, ticker, returns (or excess returns), FactorRank1, FactorRank2, FactorRank3…, FactorRankn

The only complexity I would at to this is that sometimes I would want to use NodeRank sometimes. That is it!

Some people would have a different target than returns but otherwise this should satisfy a large number of machine learners using a variety of techniques in Python, I believe. Especially the ones P123 would want to attract–like those over at Kaggle.

And of course, I have no problem with whatever targets someone else wants to use. And if there are a lot of posts requesting a different target then put that target as the first priority!

I mention that in this post because several years ago I used sim-data downloads to gather this data in a CSV spreadsheet. I had to download the benchmark data, subtract it in the spreadsheet run multiple sim (to get all the tickers in the universe) etc but it got the job done. And I know it CAN be done that way.

If you stick with the sim method in the API then I would mostly like to increase the number of tickers in the sim (negating the need to run multiple sims)–which I assume would not be a problem for the API.

Maybe have a way to get excess returns for each ticker too. Although I prefer excess returns relative to the universe (equally weighted) which P123 does not like to do for some reason. I can probably do this data wrangling myself if necessary.

Or maybe just start from scratch and just provide the above array for those wanting to use machine learning.

This could be directly uploaded into TensorFlow, XGBoost and/or Scikit-Learn without ever doing any data wrangling whatsoever (other than what is done automatically in Scikit-Learn). But also could be ‘sliced’ by the date/year if desired.

I think this would be used by more than a few people. Especially if it is marketed as the full, single-download, easy-to-use machine-learning solution that it is.

Jim

Thanks Yuval,

I wasn’t aware of that - thanks! It’d still be really neat to be able to run the model through the API though.

For example: I have just constructed three small modifications to a ranking system I’m using, that looks like small improvements using rolling screens & screen backtest. As a final test (and to help me figure out which of the three is best), I need to run a full simulation in five subuniverses over three time windows, for the original ranking and the three new ones. This sums up to 60 simulations. It’s not a big problem, just make some coffee, turn on a podcast and start clicking buttons. 30-40 minutes later I’m done. The only real problem is the awareness of how easy this could have been if I could do it through the API :slight_smile:

Also, just to be clear, I don’t mean any of this as a complaint - I’m very happy. The problem is perhaps that I’ve been spoiled by the API :slight_smile:

Ole

Hi Ole. Thanks for the example. Use cases like this will help us design the function for running sims in the API.

API’s are great for automating tasks that you rerun often or which have many iterations/combinations. The example you gave would require some coding on your end to create the list of combinations (periods/ranking systems/universes) and loop through them and gather the results. If this was something that you are only going to do one time, then using the Optimizer tool on the site would be easier since it can quickly generate and run combinations of sims (or ranking system performance tests). The Optimizer is available to Ultimate level users. It is limited to 50 combinations, so you would need 2 runs to cover your 60 combinations. The output is downloadable and covers all the key stats, but the API does output a lot more details. Let me know if you have any questions on how to use the Optimizer.

Hi Jim,
I believe everything that you asked for in your last post is already available using the datauniverse and rank_ranks functions in the API. I can post examples if that would be helpful. But if we are going to do that, lets create a new thread since this thread is for the function to run sims.

Dan and Yuval thank you very much for your help with this!!!

I apologize for requesting a feature that is already available. I have not used DataMiner enough, obviously. I had the mistaken impression that each individual factor would need to be concatenated (i.e., one feature run at a time).

Dan, let me take you up on your offer of some examples on another thread. But let me buy an inexpensive Windows laptop (for DataMiner downloads) before we do that.

I am impressed and I appreciate this.

Best,’

Jim

Thanks Dan

Well, now I feel a bit dumb :slight_smile: I’ve actually tested the optimizer at bit, but I don’t think I realized how powerful it is, and I had a some problems with it - the web page would “hang” if I ran more than 15ish combinations. Returning to it now, and trying another browser (Firefox instead of Safari), it now works fine and seems to be able to do 90% of what I need at the moment.

I’d still prefer doing this through the API, it would be easier and faster to work with, and it would open up some new possibilities for automatic rank optimization, but after testing the optimizer a bit more I now see that it solves most of my problems. Thanks!

Ole