Index | Recent Threads | Who's Online | Search

Posts: 27    Pages: 3    1 2 3 Next
Last Post
New Thread
This topic has been viewed 912 times and has 26 replies
Jrinne
Boosting your returns

I'd also want to try it myself .

Marco


So XGBoost is the best Boosting program on the planet. But JASP’s Boosting program is menu driven and one of the easiest Boosting programs on the planet (I might add free). And I believe it is powerful enough to make one want to look into Boosting further.

Download for JASP: JASP

As far as data, I looked at data on a spreadsheet with over 1,000,000 rows of data at one point. There are problems (and strengths) with that study. But there are 3 problems with that data. 1) An average member cannot duplicate it quickly. 2) It may not be easy even for P123 to duplicate.

3) And let me be scientific about this. Scientifically speaking and in retrospect I think some of the factors in that study s*ck as factors.

I think the study supported the effectiveness of XGBoost and of TensorFlow. But let me move to a method that can be replicated by everyone and with everyone trying their own factors that may not s*ck as much.

I used a P123 sim to sample 25 highly ranked stocks. For practical reasons I set sell rule: 1, Force Positions into Universe: No, and Allow Immediate Buyback: No. This allowed me to get the rank of a stock as the input and alway have one week’s return as the label and have the sim always buy exactly 25 stocks each week.

Keep in mind that using this method the stocks selected each week were not always the highest ranked stocks due to: Allow Immediate Buyback: No. Also I did not use slippage in order to avoid adding noise to the predicted returns. (Image1 of sim below).

I then separate out one compound node (header name Factor1) and 2 other factors (Factor2 and Factor3) and got the ranks of those Factors for the same tickers over the same period. Quickly, I kept some factors in a compound node because they were highly correlated and separating them out would not help and add noise (I think). Generally, separating some of the factors was expected to be helpful with boosting because the inputs and returns were not a linear relationship.

The spreadsheet also had returns, excess returns (compared the the average return of all 25 stocks for that week) and percent excess returns (xsp). Also there are column headers for date and ticker. So I got the data into a spreadsheet (image2).

I loaded it into JASP and select Machine Learning-> Boosting (image3). Selected the input and label (Image4). Changed some of the settings (image5). JASP has train, validate and test set. JASP marks the Test data and also writes a column of predictions onto the spreadsheet. I exported this data to a spreadsheet. I sorted the Test column and removed anything not from the test set (image5).

Results:

The correlation of the predictions and the excess returns (xsp) was 0.039 p-value = 0.012

Sorting by prediction the 20% of the stocks with the best predictions had an average excess return of: 0.39%

Sorting by rank the 20% of the stocks with the highest rank had an average excess return of: 0.25%

Conclusion:

The predicted returns using Boosting were significantly correlated to the actual returns with a p-value = 0.012, n = 4100.

Boosting was a clear winner for the Bottom line. Selecting 20% of the stocks (n = 820) from this sample based on the Boosting predictions gave an average weekly return of 0.39% compared to 0.25% for the same number of stocks selected on the basis of rank. This if an over 50% greater excess return.

Discussion: I think that last supports the idea that a closer look at this may be warranted.

I will be interested in what other’s find.

Jim

Attachment 1 Sim Results.png (343227 bytes) (Download count: 157)


Attachment 2 Upload.png (270862 bytes) (Download count: 156)


Attachment 3Data in JASP.png (741273 bytes) (Download count: 157)


Attachment 4 Drag and Drop Headers.png (155954 bytes) (Download count: 165)


Attachment 5 Setting.png (280252 bytes) (Download count: 156)


Attachment 6Sorted Test.png (265454 bytes) (Download count: 157)


From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 20, 2020 7:01:49 PM       
Edit 11 times, last edit by Jrinne at Nov 21, 2020 7:24:50 AM
Jrinne
Re: Boosting you returns

I forgot to add that Boosting is not a black box.

One can print out an "Relative Influence Plot" as JASP calls it or a Feature Importance plot as it is more commonly called.

As I would expect factor1 (which is actually a compound node) is most responsible for the improvement in predictions from boosting (image). Factor2 is much less important.

Attachment Relative Influence Plot.png (59026 bytes) (Download count: 162)


From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 20, 2020 7:46:31 PM       
Edit 1 times, last edit by Jrinne at Nov 20, 2020 7:52:24 PM
test_user
Re: Boosting you returns

Very cool - thanks for sharing!

A quick question, you say that: "I then separate out one compound node (header name Factor1) and 2 other factors (Factor2 and Factor3) and got the ranks of those Factors for the same tickers over the same period."

Is there an easy way to do this? Do you use the screener?

Nov 21, 2020 4:51:02 AM       
Jrinne
Re: Boosting you returns

Very cool - thanks for sharing!

A quick question, you say that: "I then separate out one compound node (header name Factor1) and 2 other factors (Factor2 and Factor3) and got the ranks of those Factors for the same tickers over the same period."

Is there an easy way to do this? Do you use the screener?

Ole Petter,

Thank you for your interest.

So the simple answer to this is yes it can be done pretty easily but it could be better. P123 could add A LOT along the lines of what Steve is working with Marco on with regards to DataMiner. So just a hat tip to what Steve and Marco are working on.

But for now the best way to get data is through the sims, I think. You can get about 20,000 buy transactions at a time through a sim. This dwarfs anything DataMiner can do for now.

So let me ask: do you have a level of membership that gives you access to sims?

One of the problems with the screener is that—whether you use Python or a spreadsheet—it is a bit of a nightmare matching the returns, the returns of the universe (or the sim each week), the excess returns and rank of of your system and then the rank of each in individual nodes or factors and match them up. You can do it with sims but just barely.

So if you have access to sims with your membership "all" is the ranking system for the sim shown above. Factor1 is the rank of a node in the ranking system for the sim. The other factors (Factor2 and Factor3) are just P123 factors (not functions). But you could do this with functions too.

The ranks for the factors are obtained by simply putting 100% weight on Factors (or a node) in the ranking system and 0 for all of the other weights. Repeated until you have done this will all of the factors, nodes or functions.

I think you will need this in a sim and it is not immediately obvious. You need something like this in the buy rules: portfolio("MLFactor").

This is the only thing that will keep all of the different sims you run synced up so you can easily concatenate them (whether in Python or a spreadsheet). You run 4 different sims here. One using the optimized ranking system, 3 others with 100 weight on one Factor (or node). In other words on Factor1, Factor2 and Factor3 using portfolio("MLFactor") to keep them synced up in this example.

And as I mention above, I think you have to use this: "sell rule: 1, Force Positions into Universe: No, and Allow Immediate Buyback: No." Otherwise, "buy/sell difference" with each rebalance messes everything up and you have to manually remove each one. And the label can be for more than one week really screwing things up.

Anyway, if your membership allows you to use sims I can get you up and going with this. Please let me know what I can expand upon.

For screens you have to do one week at a time and P123 will cut you off after 5 weeks even though you are using just ranks.DataMiner might not cut you off at 5 downloads but you can only download one week at a time and you will have to figure out a way to get excess returns.

This will not work with raw returns in my experience. The data is too noisy—fluctuating with every change in oil price, Fed move or Trump tweet.

Anyway, my advice is to not waste your time if you cannot get excess returns. And I do not think a cap-weighted benchmarks will cut it.

Hope this helps some. Sorry for the length of this post. But there are quite a few tricks to getting this (with my method at least). And I probably did not cover them all and probably was not very clear on the ones I did cover as it is.

But once you get the tricks you can do machine learning at P123. You do not have to follow Marc over to Chaikin Analytics to do machine learning;-) Isn’t it ironic?

I am trusting that Marco will not block this method after I responded to his request to learn how to do this himself. I do not think P123 wants to block data. They just do know know their potential yet. That is my hope anyway.

Best,

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 21, 2020 6:22:55 AM       
Edit 16 times, last edit by Jrinne at Nov 21, 2020 7:14:57 AM
test_user
Re: Boosting you returns

Thanks again Jim, I have access to sims and I was aware of the portfolio() function, but I would never have thought of using it that way - brilliant! When I find the time I will try to replicate your analysis with my own ranking system.

Nov 21, 2020 7:34:09 AM       
Jrinne
Re: Boosting you returns

Thanks again Jim, I have access to sims and I was aware of the portfolio() function, but I would never have thought of using it that way - brilliant! When I find the time I will try to replicate your analysis with my own ranking system.

Ole Petter,

Thank you!

Please contact me on the forum or by email if I can help at all.

Also, if you want P123 to streamline this in any way you might consider contacting Steve Auger by email.

For whatever reason, Steve has been able to capture Marco’s attention on this. And the combined programming skill of Steve and Marco are out of this world.

I think Steve (or I) can share some code for XGBoost and/or TensorFlow if you are interested.

And Steve is seeking to form a group to avoid bothering people who have no interest in this.

There is a lot more that can be done with this. Stuff that is done everywhere but here: like screening the entire universe for a large number of factors using the Feature Importance mentioned above.

One can rationally argue how useful Feature Importance really is. But de Prado is clear about this in his book:

“Backtesting is not a research tool. Feature importance is."

de Prado, Marcos López. Advances in Financial Machine Learning (Kindle Locations 3402-3404). Wiley. Kindle Edition.

Actually, P123 now agrees that backtesting has limitations.

It is not clear what they see as the best alternative or how that will evolve. But again, de Prado is clear on this.

In any case, whether it is about feature importance or anything else make sure to contact Steve or me.

Best,

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 21, 2020 8:03:19 AM       
InspectorSector
Re: Boosting you returns

In a few weeks time I will present my method and design for an AI-based indicator for current quarter surprise for a subset of software stocks. My objective is to generate interest in use of P123 in conjunction with AI. The indicator design will be presented here as a series of posts, unless Marco creates a separate platform/forum specific to ML. Ultimately, you should be able to do everything with a few lines of code plus the s/w library that I am working on polishing up right now. To get maximum benefit from my posts, readers might want to brush up on their Python skills. I found this to be a good site for reference: https://www.w3schools.com/python/default.asp

Python is a pretty easy to use programming language. If you are already a programmer it won't take long to pick it up and use.

I will be using Google Colab as the development environment and Google Drive for file storage and retrieval. The advantage of Colab is that you don't have to muck up your PC with all sorts of installations that often result in strange effects on the functioning of your PC. Users can use their own development environment but will have to tailor any code I provide to accommodate storing and retrieving data files and importing libraries.

Also, xgboost will be the primary AI engine: https://xgboost.readthedocs.io/en/latest/python/python_api.html
I will also provide a tensorflow interface, but training is much slower and the results not as good.

Steve

Nov 21, 2020 11:02:29 AM       
Edit 2 times, last edit by InspectorSector at Nov 21, 2020 11:05:58 AM
wwasilev
Re: Boosting you returns

I'm just starting to look at this issue but it seems to me that dumping ranks (top and sub-node(s)) should be easy and relatively inexpensive for p123. For a sim, those values need to be computed anyway so the dumping them along the way is the only additional step. Disk storage (file size) and sim bottlenecks (disk IO) may be issues, though. I would hate to see date collection get overly complicated.

Nov 21, 2020 12:06:52 PM       
Jrinne
Re: Boosting you returns


Also, xgboost will be the primary AI engine: https://xgboost.readthedocs.io/en/latest/python/python_api.html
I will also provide a tensorflow interface, but training is much slower and the results not as good.

Steve knows what he is talking about here.

The above demonstration with JASP was done in about an hour with the time mostly spent on writing and screen shots. And a little time with JASP.

Normally one would spend some time adjusting (and validating) the hyper parameters in a Boosting program.

The only hyper parameter I changed was "Shrinkage" to 0.01 (based on previous experience with boosting programs). I also changed the Training and Validation Data to K-fold with 5 folds which is not a hyper paramater. That was all the time that I spent. I did this before I saw how the test sample performed.

I thought my point was already made. And that no one would claim that changing these 2 things from their default settings was just too hard for a serious investor.

Anyone wanting to spend more time with JASP should also change "Min. observations in node" and "Interaction depth." The defaults that I used here are almost certainly not optimal. And the optimal hyper parameters will be different for different data (including your data).

The real time that I have spend with boosting has been with XGBoost which is the program professionals use and it does offer some additional capabilities. But is XGBoost better than a neural net as Steve says?

Steve’s opinion of neural nets is shared by many. Here is a quote from the web. I do not think it is from a famous person but the same quote can be found everywhere:

"Xgboost is good for tabular data……..whereas neural nets based deep learning is good for images…."

"Tabular data" is just what is found in an Excel spreadsheet.

I actually disagree with this blanket statement. TensorFlow can beat XGBoost SOMETIMES, I think.

But XGBoost is the place to start. And Steve is using TensorFlow too.

If you just want to make money you should see if Steve has something you can use.

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 22, 2020 3:43:08 AM       
Edit 23 times, last edit by Jrinne at Nov 22, 2020 9:16:46 AM
InspectorSector
Re: Boosting you returns

But is XGBoost better than a neural net as Steve says?

I have limited experience with one model only. But from what I can see, xgboost is far superior for the type of application that I am developing. Either that or I have been fooling myself into believing that what I am doing is correct. One of the two. Anyways, when I get around to presenting what I have, hopefully it will be peer reviewed by the scientific minds here (I think there are many hiding in the shadows). I don't mind getting a little egg on my face if there is something I am doing wrong. It will save me some grief down the road.

Nov 22, 2020 11:43:21 AM       
Posts: 27    Pages: 3    1 2 3 Next
 Last Post