Boosting your returns

Jrinne · November 21, 2020, 12:01am

So XGBoost is the best Boosting program on the planet. But JASP’s Boosting program is menu driven and one of the easiest Boosting programs on the planet (I might add free). And I believe it is powerful enough to make one want to look into Boosting further.

Download for JASP: JASP

As far as data, I looked at data on a spreadsheet with over 1,000,000 rows of data at one point. There are problems (and strengths) with that study. But there are 3 problems with that data. 1) An average member cannot duplicate it quickly. 2) It may not be easy even for P123 to duplicate.

And let me be scientific about this. Scientifically speaking and in retrospect I think some of the factors in that study s*ck as factors.

I think the study supported the effectiveness of XGBoost and of TensorFlow. But let me move to a method that can be replicated by everyone and with everyone trying their own factors that may not s*ck as much.

I used a P123 sim to sample 25 highly ranked stocks. For practical reasons I set sell rule: 1, Force Positions into Universe: No, and Allow Immediate Buyback: No. This allowed me to get the rank of a stock as the input and alway have one week’s return as the label and have the sim always buy exactly 25 stocks each week.

Keep in mind that using this method the stocks selected each week were not always the highest ranked stocks due to: Allow Immediate Buyback: No. Also I did not use slippage in order to avoid adding noise to the predicted returns. (Image1 of sim below).

I then separate out one compound node (header name Factor1) and 2 other factors (Factor2 and Factor3) and got the ranks of those Factors for the same tickers over the same period. Quickly, I kept some factors in a compound node because they were highly correlated and separating them out would not help and add noise (I think). Generally, separating some of the factors was expected to be helpful with boosting because the inputs and returns were not a linear relationship.

The spreadsheet also had returns, excess returns (compared the the average return of all 25 stocks for that week) and percent excess returns (xsp). Also there are column headers for date and ticker. So I got the data into a spreadsheet (image2).

I loaded it into JASP and select Machine Learning-> Boosting (image3). Selected the input and label (Image4). Changed some of the settings (image5). JASP has train, validate and test set. JASP marks the Test data and also writes a column of predictions onto the spreadsheet. I exported this data to a spreadsheet. I sorted the Test column and removed anything not from the test set (image5).

Results:

The correlation of the predictions and the excess returns (xsp) was 0.039 p-value = 0.012

Sorting by prediction the 20% of the stocks with the best predictions had an average excess return of: 0.39%

Sorting by rank the 20% of the stocks with the highest rank had an average excess return of: 0.25%

Conclusion:

The predicted returns using Boosting were significantly correlated to the actual returns with a p-value = 0.012, n = 4100.

Boosting was a clear winner for the Bottom line. Selecting 20% of the stocks (n = 820) from this sample based on the Boosting predictions gave an average weekly return of 0.39% compared to 0.25% for the same number of stocks selected on the basis of rank. This if an over 50% greater excess return.

Discussion: I think that last supports the idea that a closer look at this may be warranted.

I will be interested in what other’s find.

Jim

Jrinne · November 21, 2020, 12:46am

I forgot to add that Boosting is not a black box.

One can print out an “Relative Influence Plot” as JASP calls it or a Feature Importance plot as it is more commonly called.

As I would expect factor1 (which is actually a compound node) is most responsible for the improvement in predictions from boosting (image). Factor2 is much less important.

test_user · November 21, 2020, 9:51am

Very cool - thanks for sharing!

A quick question, you say that: “I then separate out one compound node (header name Factor1) and 2 other factors (Factor2 and Factor3) and got the ranks of those Factors for the same tickers over the same period.”

Is there an easy way to do this? Do you use the screener?

Jrinne · November 21, 2020, 11:22am

Ole Petter,

Thank you for your interest.

So the simple answer to this is yes it can be done pretty easily but it could be better. P123 could add A LOT along the lines of what Steve is working with Marco on with regards to DataMiner. So just a hat tip to what Steve and Marco are working on.

But for now the best way to get data is through the sims, I think. You can get about 20,000 buy transactions at a time through a sim. This dwarfs anything DataMiner can do for now.

So let me ask: do you have a level of membership that gives you access to sims?

One of the problems with the screener is that—whether you use Python or a spreadsheet—it is a bit of a nightmare matching the returns, the returns of the universe (or the sim each week), the excess returns and rank of of your system and then the rank of each in individual nodes or factors and match them up. You can do it with sims but just barely.

So if you have access to sims with your membership “all” is the ranking system for the sim shown above. Factor1 is the rank of a node in the ranking system for the sim. The other factors (Factor2 and Factor3) are just P123 factors (not functions). But you could do this with functions too.

The ranks for the factors are obtained by simply putting 100% weight on Factors (or a node) in the ranking system and 0 for all of the other weights. Repeated until you have done this will all of the factors, nodes or functions.

I think you will need this in a sim and it is not immediately obvious. You need something like this in the buy rules: portfolio(“MLFactor”).

This is the only thing that will keep all of the different sims you run synced up so you can easily concatenate them (whether in Python or a spreadsheet). You run 4 different sims here. One using the optimized ranking system, 3 others with 100 weight on one Factor (or node). In other words on Factor1, Factor2 and Factor3 using portfolio(“MLFactor”) to keep them synced up in this example.

And as I mention above, I think you have to use this: “sell rule: 1, Force Positions into Universe: No, and Allow Immediate Buyback: No.” Otherwise, “buy/sell difference” with each rebalance messes everything up and you have to manually remove each one. And the label can be for more than one week really screwing things up.

Anyway, if your membership allows you to use sims I can get you up and going with this. Please let me know what I can expand upon.

For screens you have to do one week at a time and P123 will cut you off after 5 weeks even though you are using just ranks.DataMiner might not cut you off at 5 downloads but you can only download one week at a time and you will have to figure out a way to get excess returns.

This will not work with raw returns in my experience. The data is too noisy—fluctuating with every change in oil price, Fed move or Trump tweet.

Anyway, my advice is to not waste your time if you cannot get excess returns. And I do not think a cap-weighted benchmarks will cut it.

Hope this helps some. Sorry for the length of this post. But there are quite a few tricks to getting this (with my method at least). And I probably did not cover them all and probably was not very clear on the ones I did cover as it is.

But once you get the tricks you can do machine learning at P123. You do not have to follow Marc over to Chaikin Analytics to do machine learning;-) Isn’t it ironic?

I am trusting that Marco will not block this method after I responded to his request to learn how to do this himself. I do not think P123 wants to block data. They just do know know their potential yet. That is my hope anyway.

Best,

Jim

test_user · November 21, 2020, 12:34pm

Thanks again Jim, I have access to sims and I was aware of the portfolio() function, but I would never have thought of using it that way - brilliant! When I find the time I will try to replicate your analysis with my own ranking system.

Jrinne · November 21, 2020, 1:03pm

Ole Petter,

Thank you!

Please contact me on the forum or by email if I can help at all.

Also, if you want P123 to streamline this in any way you might consider contacting Steve Auger by email.

For whatever reason, Steve has been able to capture Marco’s attention on this. And the combined programming skill of Steve and Marco are out of this world.

I think Steve (or I) can share some code for XGBoost and/or TensorFlow if you are interested.

And Steve is seeking to form a group to avoid bothering people who have no interest in this.

There is a lot more that can be done with this. Stuff that is done everywhere but here: like screening the entire universe for a large number of factors using the Feature Importance mentioned above.

One can rationally argue how useful Feature Importance really is. But de Prado is clear about this in his book:

[i]“Backtesting is not a research tool. Feature importance is." [/i]

de Prado, Marcos López. Advances in Financial Machine Learning (Kindle Locations 3402-3404). Wiley. Kindle Edition.

Actually, P123 now agrees that backtesting has limitations.

It is not clear what they see as the best alternative or how that will evolve. But again, de Prado is clear on this.

In any case, whether it is about feature importance or anything else make sure to contact Steve or me.

Best,

Jim

InspectorSector · November 21, 2020, 4:02pm

In a few weeks time I will present my method and design for an AI-based indicator for current quarter surprise for a subset of software stocks. My objective is to generate interest in use of P123 in conjunction with AI. The indicator design will be presented here as a series of posts, unless Marco creates a separate platform/forum specific to ML. Ultimately, you should be able to do everything with a few lines of code plus the s/w library that I am working on polishing up right now. To get maximum benefit from my posts, readers might want to brush up on their Python skills. I found this to be a good site for reference: https://www.w3schools.com/python/default.asp

Python is a pretty easy to use programming language. If you are already a programmer it won’t take long to pick it up and use.

I will be using Google Colab as the development environment and Google Drive for file storage and retrieval. The advantage of Colab is that you don’t have to muck up your PC with all sorts of installations that often result in strange effects on the functioning of your PC. Users can use their own development environment but will have to tailor any code I provide to accommodate storing and retrieving data files and importing libraries.

Also, xgboost will be the primary AI engine: https://xgboost.readthedocs.io/en/latest/python/python_api.html
I will also provide a tensorflow interface, but training is much slower and the results not as good.

Steve

WalterW · November 21, 2020, 5:06pm

I’m just starting to look at this issue but it seems to me that dumping ranks (top and sub-node(s)) should be easy and relatively inexpensive for p123. For a sim, those values need to be computed anyway so the dumping them along the way is the only additional step. Disk storage (file size) and sim bottlenecks (disk IO) may be issues, though. I would hate to see date collection get overly complicated.

Jrinne · November 22, 2020, 8:43am

Steve knows what he is talking about here.

The above demonstration with JASP was done in about an hour with the time mostly spent on writing and screen shots. And a little time with JASP.

Normally one would spend some time adjusting (and validating) the hyper parameters in a Boosting program.

The only hyper parameter I changed was “Shrinkage” to 0.01 (based on previous experience with boosting programs). I also changed the Training and Validation Data to K-fold with 5 folds which is not a hyper paramater. That was all the time that I spent. I did this before I saw how the test sample performed.

I thought my point was already made. And that no one would claim that changing these 2 things from their default settings was just too hard for a serious investor.

Anyone wanting to spend more time with JASP should also change “Min. observations in node” and “Interaction depth.” The defaults that I used here are almost certainly not optimal. And the optimal hyper parameters will be different for different data (including your data).

The real time that I have spend with boosting has been with XGBoost which is the program professionals use and it does offer some additional capabilities. But is XGBoost better than a neural net as Steve says?

Steve’s opinion of neural nets is shared by many. Here is a quote from the web. I do not think it is from a famous person but the same quote can be found everywhere:

“Xgboost is good for tabular data………whereas neural nets based deep learning is good for images….”

“Tabular data” is just what is found in an Excel spreadsheet.

I actually disagree with this blanket statement. TensorFlow can beat XGBoost SOMETIMES, I think.

But XGBoost is the place to start. And Steve is using TensorFlow too.

If you just want to make money you should see if Steve has something you can use.

InspectorSector · November 22, 2020, 4:43pm

I have limited experience with one model only. But from what I can see, xgboost is far superior for the type of application that I am developing. Either that or I have been fooling myself into believing that what I am doing is correct. One of the two. Anyways, when I get around to presenting what I have, hopefully it will be peer reviewed by the scientific minds here (I think there are many hiding in the shadows). I don’t mind getting a little egg on my face if there is something I am doing wrong. It will save me some grief down the road.

test_user · November 24, 2020, 4:15pm

Interesting twitter thread. [url=https://twitter.com/RobinWigg/status/1331168066177294336]https://twitter.com/RobinWigg/status/1331168066177294336[/url]

philjoe · November 26, 2020, 3:11pm

1.0039 ^52 = 1.22… where are you getting a 50% excess return from? (The P123 chart has annualized return of 49.5%?)

Jrinne · November 26, 2020, 4:01pm

Thanks Philip,

“greater” excess return.

(0.039/ 0.25 - 1) * 100 = 56%

Admittedly a medical way to look at this as in “people on statins have a .0001% chance of dying while those on placebo had a 0.00015% chance of dying” A fifty percent increase in deaths for the placebo group.

Despite the obvious problems with this we keep talking that way.

You ask a great question. One I did not really think about until posted this: is this significant in a “clinical” sense.

But it is a good question and an obvious one. So soon after posting I ran these numbers: (1 + (0.0039 - 0.0025)) = 1.0014

1.0014^52 = 1.0755 or 7.55%

Thank you for expanding on this. This is also something that is endlessly discussed in medicine. Should you take a statin? Should I go through the extra work of Boosting?

So I like your way of looking at it. And perhaps 7.5% is the number we want.

Meaningful? I think so. And I think one can do better with just a little more work. Especially with XGBoost.

But I would be interested to see what others find with their ranking systems. And see if they think what they find is meaningful.

This was meant as just a simple, first-look at Boosting that most people could do on their own (although even this is not exactly easy). For me personally, i have found much more meaningful numbers are possible.

I think with Marco’s API releases and Steve’s sharing of code members can see how much more potential XGBoost might be able to provide for their own systems and not have to trust me on any of this.

Thanks.

Jim

Quantonomics · November 26, 2020, 4:36pm

@Jrinne
Did you take into account transaction costs + slippage? Because your portfolio turnover of 5000% is a true killer. I wouldn’t be surprised if your returns were down the gutter / poor after properly taking into account transaction costs + slippage.

philjoe · November 26, 2020, 4:43pm

You’re missing the point. Your P123 chart says you achieved 50% annual return. The Benchmark do 50% over the entire time. Your claiming a 0.39% weekly excess return. That doesn’t equate to a 50% annual return. Add 0.39% weekly return to any benchmark you want (SPX, SP1500 Value, etc) you don’t end up with a 50% annual return. Somethings wrong.

Just to clarify what you’ve achieved, you used JASP’s boosting technique to optimize weighting for three factors (which happen to be composites, but JASP only saw the three factors), which increased your weekly excess return from 0.25% to 0.39%?

Jrinne · November 26, 2020, 5:08pm

My apologies if I am misleading.

So the tickers are every single trade in the P123—exactly 25 trades every week.

For my study the excess returns are excess relative to the sim. I cannot stress enough how one has to get the noise of the market out of the data.

So if one week ticker ABC happened to have 0.39% excess return as I use it here this that would be in addition to the return of the 25 stock model.

Is that responsive at all?

I have to go for a while. But please ask about anything.

I am pretty sure that Boosting did about 7.5% better annualized with this simple model than if you picked stocks based on rank. I think I can explain that (or apprreciate the correction if I missed something).

Jim

philjoe · November 26, 2020, 5:17pm

Ahhhh ok so let me go again: You had a model that was decent and already did say a 40% annualized return. You then used JASP to tweak the weightings for 3 composites. The new tweaked model had an annualized return of 50%.

Is that right?

Jrinne · November 26, 2020, 5:17pm

This is not a sim for trading.

This is a sim for getting data.

Just as the API Marco will provide gives you factor ranks and returns.

If Marco is smart he will not add any noise to that data with slippage.

You/he will have to work out the slippage later.

This is a method to get data and only to get data. Data used to train boosting (or TensorFlow).

Jim

Jrinne · November 26, 2020, 5:28pm

Quantinomics,

The weekly trunover is intentional for collecting data for boosting.

The target (label) should be over the same time-period. One ticker where the target (label) is 1 week’s return and another were the target was 6 week’s return does not work all.

Jrinne · November 26, 2020, 5:43pm

Yes. Exactly.

And what I did with JASP was not a serious attempt. One can do better.

Thank you Philip.