Index | Recent Threads | Who's Online | Search

Posts: 87    Pages: 9    Prev 1 2 3 4 5 6 7 8 9 Next
Last Post
New Thread
This topic has been viewed 3823 times and has 86 replies
superelastic
Re: To Quant Or Not To Quant, That Is The Question

I don't have any answers, but I feel like the API idea could have some value for P123 that is not being tapped now.

Maybe we need to discuss what is meant by 'API' first: I am thinking that you mean exposing a REST/URL-based interface to P123 that would trigger an existing P123 function and return results via JSON or whatever. So a user could write some simple code and kick off a screen, or a portfolio rebalance or (most interesting to me) a simulation run.

I often run simulations like the proverbial 1000 monkeys typing on 1000 keyboards--hoping eventually to get a work of Shakespeare, but mostly not. But if I could define a RS via an API, kick off a sim run, then get the results via JSON, I might be inclined to connect the API to Excel or Python and implement some type of optimization.

That idea probably only appeals to the data geeks. However I am also thinking if I ran a small advisory business (this is getting out of my depth), I could build or have built for me investment tools with "P123 Inside" that could be customized for my clients and if done well, allow me to differentiate my advisory business from others.

Anyway, you guys will no doubt have better ideas...I'd just like to encourage you to do some thought-experiments with the API idea.

Dec 5, 2019 7:19:51 AM       
marco
Re: To Quant Or Not To Quant, That Is The Question

Jim

Our strength is the simulation infrastructure that walks a strategy forward, has the ability to compute point in time ratios and manage the portfolio. For ranking it uses "simple" sorts of N factors. Although calling "simple" is not fair either. As a whole, a collection of simple sorts, is quite powerful.

If there's a way to *leverage* what we have to open P123 up to the quants, I'm all for it.

But adding more advanced quant/statistical tools to our framework, with the accompanying user interfaces , is a massive endeavor. The position sizing is the perfect example. Took us months and months to do. We had to add several statistical functions and a complex user interface that went through many revisions. Did it make a material difference to us? No, not yet at least. Pos Sizing is just one of the things a quant wants. A pretty interface is nice but not enough. Without the rest Pos Sizing is just sitting there.

So I really like the idea to be able to interface with a Python engine. Could the project be made manageable ? I think so if we compartmentalize.

Lets try a practical example to see if we are in the same page . How would a Python powered position sizing algorithm work?

Python would be a separate engine that is accessible via APIs (fancy acronym work for communication protocol between two systems). It would only have prices loaded in memory, fully adjusted for splits. Anything else it needs would be supplied on the fly from the sim engine.

On the front end , in the Position Sizing Tab, the user would have a choice of Python sizing algorithms, like "CapWeightedVersion1.3"

The CapWeighted algorithm manifesto states that it only needs one point in time factor, MktCap. It also needs the existing positions (any new buy/sell recs could be "executed" before doing the pos sizing).

At every rebalance the sim engine would call the pos sizing API, specify the algorithm to use, supply MktCap factor and the positions. In return it gets the target sizes for each position and figures out what trades to do.

That's it.

Similarly other APIs could be available to *replace* existing functionality like Buy/Sell rules, Ranking, etc.

Something like this work ?

If so then the next steps are to locate the resources to do this. The key is the Python engine and libraries. We don't have much experience with Python nor libraries. It's also easy to get lost in the myriad of quant wants/needs/possibilities. So a committee of quants is a must too. Also... would this Python engine be scalable, and reliable or crash and get bogged down all the time ?

Our focus right now is the Factset transition, but if new resources (programmer) can be located for the Python engine (1 or 2 max), and we reach a consensus with the Quants on a focused phase 1 list of APIs, it could be doable.

If it does become a group effort ... we could certainly discuss compensation , either free access or $$$, or ???.

Thanks

Portfolio123 Staff.

Dec 5, 2019 10:19:40 AM       
Jrinne
Re: To Quant Or Not To Quant, That Is The Question



Lets try a practical example to see if we are in the same page . How would a Python powered position sizing algorithm work?

Python would be a separate engine that is accessible via APIs (fancy acronym work for communication protocol between two systems). It would only have prices loaded in memory, fully adjusted for splits. Anything else it needs would be supplied on the fly from the sim engine.

On the front end , in the Position Sizing Tab, the user would have a choice of Python sizing algorithms, like "CapWeightedVersion1.3"

The CapWeighted algorithm manifesto states that it only needs one point in time factor, MktCap. It also needs the existing positions (any new buy/sell recs could be "executed" before doing the pos sizing).

At every rebalance the sim engine would call the pos sizing API, specify the algorithm to use, supply MktCap factor and the positions. In return it gets the target sizes for each position and figures out what trades to do.

That's it.

Similarly other APIs could be available to *replace* existing functionality like Buy/Sell rules, Ranking, etc.

Something like this work ?


Yes I think this could work. But PLEASE READ MY COMPUTER EXPERIENCE. One course in Fortran and an audited DOS course. I do have a math degree which may explain why I can do any of this at all. Do not make the mistake of thinking I know much about programming.

At some point I would like to get a little verbal NDA about my data provider. I am an open book after that (with an exception of the best algorithm I have found). A Random Forest beginning to end: open book.

So yes that sounds doable. I was hoping you would want to make a sim. At present I munge the data and get the numbers that way. This would be adequate for me and actually it is excellent.

Outline of this:

I train the algorithm on some data (e.g., 2000 to 2010). I then have a trained machine learning model.

I use the trained model to make weekly predictions 2010 to present date. This is simplified. A walk-forward method is better. This can all be made into another array or DataFrame that can be manipulated

I then sort the predictions for each week to find the 5 stocks with the best predicted returns for each week. Usually I delete all the other stocks. One can then just average all those returns. That is enough to know if the model is good.

But even I can munge this and get an annualized return. Get each year's returns etc. You would not need my script but I think my script would be considered a god-send for most quants. We machine learning geeks also like metrics like RMSE, MAE etc. this is automatically available often or could take a small bit of coding.

I could do an equity curve with this using Python graphing but I have not bothered

If it is computer intensive to do sims you might start with what I do. But what you describe sounds much better and doable.

We have not talked about predictions when the Port is rebalanced.

One must make a small array for that one week (e.g., Monday Morning). This is the tickers for the universe in the row. Obviously, the date column is not necessary since they are all Monday morning. The columns are the factors. The trained model then makes a prediction of returns for each stock. Python automatically sorts this. Sorts by predictions; best stocks first.

I sell the stocks that are not in the first 5 on the list. I open the broker account and buy dividing the amount I want to buy by the price the broker has to get the number of shares.

Obviously you can improve on that but actually I consider this fantastic the way it is.

Primitive but in some ways better than P123 already. Some models do better especially with non-linear data.

Thank you. Does this help at all?

Happy to expand. Let me know what you want me to expand upon.

-Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Dec 5, 2019 11:10:49 AM       
Edit 20 times, last edit by Jrinne at Dec 5, 2019 12:43:01 PM
marco
Re: To Quant Or Not To Quant, That Is The Question

Jim, we did just that a few months ago. It was a proof of concept test with the help of an AI scientist. We chose 50 factors for a smallcap universe of about 2000 stocks. We picked a training period and fed the data into a Python random forest engine, then tested the scores from the model out of sample. The periods were similar economically speaking. The out of sample sims using these scores were no better than random.

We were also going to try neural nets (which takes weeks to complete with a pretty good server), support vector machines, try more factors and maybe tweak the universe. But the disappointing results killed the whole project. Had it been better we were going to add a front end to the whole thing and bolt it on to P123 like you would like I think. Also, our AI expert is now busy with other things and no longer available.

So that was our Python experience. I still think it's worthwhile since it opens up P123 to a new kind of user. Not sure I hear/heard enough demand for it though.

Portfolio123 Staff.

Dec 6, 2019 7:43:40 AM       
Jrinne
Re: To Quant Or Not To Quant, That Is The Question

That is awesome! Thank you so much for sharing this. Saves me the time of checking Random Forests again.

If you go back through all of my post you will see that I have consistently said I do not think Random Forests will beat P123 sims. I did not go so far as to say that there would be no effect. Still this confirms that I do not want to spend time looking at simple Random Forests again.

I do have something that should be (and I think is) better than a Random Forest. I am absolutely sure that any AI scientist would see my point. Obviously, he would want to check it out--or refer you to someone who would. Furthermore, the computer requirements are similar to a Random Forest. So if you thought Random Forests would be doable than this would be too. Not Deep Learning for example.

If, being in the industry, you can come up with a simple letter of intent and NDA you might consider letting me talk to that AI scientist for 5 minutes (should take more like two). In fact to be honest, he should have had you start on something other than a Random Forest to begin with. I mentioned Random Forests only because I do not have an NDA.

No matter what you decide on that, very much appreciated!!!!!


-Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Dec 6, 2019 8:37:22 AM       
Edit 5 times, last edit by Jrinne at Dec 6, 2019 8:58:55 AM
Jrinne
Re: To Quant Or Not To Quant, That Is The Question

Here is what I said about Random Forests in a previous thread:

Marco,

I want to correct a couple things I said in this thread.

Random Forest are considered one on of the easiest machine learning tools. But in fact THEY ARE NOT EASY. It was a mistake to say that.

As an example, stock returns have a lost of noise in the data. A RANDOM FOREST FOR STOCKS IS LIKELY TO FAIL IF YOU USE JUST THE WEEKLY RETURNS. You might reduce this noise by using the excess returns (using a highly correlated benchmark). Better would be to subtract the mean return for the universe each week from each stock’s returns (for that week).

I could go on. E.g., the excess log returns will probably work better still. Because this helps with outliers.

The other thing is that I could probably prove to you that Random Forest can work. But it is also true that the P123 method that you use now will probably work better than a simple Random Forest model. That is my impression anyway.

Let me say that again. I am not claiming that a Random Forest will work better than what we do now at P123. In fact I think they probably are not better for most situations.

If you want to just try a Random Forest (and use say excess returns) make sure to try a large minimum leaf size (the default will probably be 5). You may end up using a minimum leaf size of 300 to 500. Again, this is necessary because of the noise in the data. Also, be aware that your results may look better than they really are if you do not use a validation or hold-out-test sample.

Anyway, it is not easy. And even worse, I should have said it was Random Forests are RELATIVELY EASY. That means some of the other methods are harder still. But this does not imply that the institutions have not hired people with an advanced degree or two for whom this stuff is routine if not easy.

Thank you again, for your interest and comments.

-Jim


The emphasis was in the original post


Marco, thank you again for your consideration and sharing your results.

-Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Dec 6, 2019 9:10:00 AM       
Edit 1 times, last edit by Jrinne at Dec 6, 2019 9:14:13 AM
marco
Re: To Quant Or Not To Quant, That Is The Question

Jim, why do you need an NDA to discuss a different, freely available method to train a model? Even if the method was your own, I still would not sign an NDA. We are a tool provider, and if the tool works better if would be made available to all.

Portfolio123 Staff.

Dec 6, 2019 9:15:59 AM       
Jrinne
Re: To Quant Or Not To Quant, That Is The Question

Jim, why do you need an NDA to discuss a different, freely available method to train a model? Even if the method was your own, I still would not sign an NDA. We are a tool provider, and if the tool works better if would be made available to all.

You do have a point in a sense. It is not so special that someone else in a hedge fund, say, is not doing it.

But it is also true that I do not see Jim Simons (Renaissance Technology) discussing all of their methods on the Bloomberg channel either.

In fact, Jim Simons did (and does I think) use NDAs. Maybe for what I am thinking of, in fact. That is not to say he does not have a bunch of other (probably better) things protected by NDAs.

Anyway, your AI specialist should not have suggested Random Forest to you as the first method for your use. Offering it as one method , in Python, for P123 members is different. If you had talked to me, I would have (did) say it would not beat a sim.

My advice was free. Not all of it is.

-Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Dec 6, 2019 9:28:26 AM       
Edit 3 times, last edit by Jrinne at Dec 6, 2019 9:58:11 AM
Jrinne
Re: To Quant Or Not To Quant, That Is The Question

As soon as everyone makes their Designer Models Public, I will reveal all of my secrets too.

If you do want something that is likely to work better than sims on nonlinear data you should try kernel regression. I have mentioned this before and it was used by Jim Simons (at RT).

I would be happy to discuss the limitations of this method (if it becomes a topic of interest to you again in the future). There are some limitations as with any method.

I am sure you agree that I am not required to give up all my secrets (like the designers). There is only one method that I do not wish to discuss--as I said earlier in this thread. I would not mind selling it (like the designers). I understand no one has to buy it. I probably will make it available as a Designer Model if you expand the format of Designer Models to include recommendations (discretionary or automated) as you have discussed.

I would also argue that I have given as much as the AI specialist, and for a lower cost. Maybe I have given more as I suspect you will find that kernel regression works if it becomes a topic that interests you again in the future.

I would not ask how much the AI specialist charged.

I would recommend you look at what he supplied to you in his report. Did he use the excess of the log returns and was a large leaf size was used, as outlined above? If he didn't then you do not yet know that Random Forests do not work at all. But, in any case, Random Forests probably are not better than sims in most situations (if they work at all).

Just for you information.

-Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Dec 6, 2019 2:17:18 PM       
Edit 16 times, last edit by Jrinne at Dec 6, 2019 3:27:48 PM
judgetrade
Re: To Quant Or Not To Quant, That Is The Question

I think we need to differentiate two things:

1) Methods (Ranking, Statistical Stuff (normal based distribution or statistics that incooperate the fat tails, Slicing the universe with buy and sell rules, Machine Learning e.g. AI Stuff etc.)

2) Alternative Data (the PIT database of P123 for me is already alternative data, fundamentals, estimates, price etc.)

I am pretty sure that there is not much more juice in the 1) bucket (It could improve models, yes, but I do not think very much, BUT I could be wrong here, maybe because i am just too lazy to learn new stuff in bucket 1)).

I firmly believe that KISS in the 1) first bucket is important.

And I am a strong believer in ranking because you can put a lot of parameters into it without overoptimizing, because
ranking encapsulates the complexity (= nonlinear cause and effects, changing cause and effects that all the time, big fat tails) of the whole pit database. I know I cannot prove this, but whenever somebody came up
with a complicated model to manage a complex world, I usually saw it fail or underperform (not only in trading, but in business too).
The whole AI is in my perspective a big hype.

Renaissance made his performance not so much with the 1) bucket BUT with Alternative Point in Time Data with very low
lag time to the database (so something happens, and they have it in the database the same day or seconds later, stuff where other
market participants wait days or months for.), at least that is the whisper you (Podcast on this http://investorfieldguide.com/baker/)
hear. Yes they hired the best math guys on the world, but those guys got the best alternative data (they generate this data mostly inhouse!) in the world too. The better your data, the better your Methods on 1) Bucket will work.
And I do not think they have 10000 Alternative Data Points, they have relative less but they identified the big performance drivers!

Same thing with optimization (different Thread), I basically do not use it, one of the best simulations are the ones with equal weight on all ranking factors, that are models I go deeper. It’s in or its out (one of the principals of the market wizards, stuff that is 30 Years old, maybe I am too conservative here).
What I do is I look at my factors and try to understand them based on emotions of the market participants and in the case of sector momentum on how the big guys play the game (depending the market regime) and they must work since 1870. I want them to be stable and they are only stable if they hack other market
participants emotions (emotions of humans are stable for 5000 Years!)
So, if P123 would get for example alternative data (on extra subscriptions costs per alternative data set) that would let explode the performance of the models. But that gets expensive very fast, because those datasets are hard to get and even more expensive to pay for.
Good Discussion learned a ton!

Thank you all!

Best Regards
Andreas

Dec 6, 2019 4:11:37 PM       
Posts: 87    Pages: 9    Prev 1 2 3 4 5 6 7 8 9 Next
 Last Post