Index | Recent Threads | Who's Online | Search

Posts: 13    Pages: 2    1 2 Next
Last Post
New Thread
This topic has been viewed 421 times and has 12 replies
yuvaltaylor
decision trees, bagging, and boosting

I was wondering if anyone could explain how these ML tools can be intuitively understood by a human being when applied to factors, ranking systems, and/or equity curves. I have tried reading a number of explanations aimed at laypeople (non-programmers) but have failed to understand how decision trees can substitute for multiple regression. I know that they CAN, and I know that they produce better results than multiple regression analysis. I just want to understand the mechanism in an intuitive manner. For example, I can understand multiple linear regression by extrapolating linear regression into multiple dimensions. I understand correlation and probability and how regression analysis relies on them. But every explanation I find about decision trees seems Boolean and makes no reference to correlations or probability, and I can't see how a Boolean process could apply to what we do here.

Let's take a very simple problem. We have three factors and we want to weight them so that the combination will be optimally effective when applied to a certain group of stocks over a certain period of time. The multiple regression solution is to look at the results of each of the factors on its own and then weight them accordingly. Another solution (I don't know what it's called) is to try out about twenty different combinations, take the best one, and create variations of it, and so on, until a good fit is found. What's the decision tree/bagging/boosting solution?

Yuval Taylor
Product Manager, Portfolio123
invest(igations)
Any opinions or recommendations in this message are not opinions or recommendations of Portfolio123 Securities LLC.

Jan 11, 2021 4:31:16 PM       
azouz110
Re: decision trees, bagging, and boosting

The problem you have is an optimization problem. It can be solved using genetic algorithms, pbil and other methods like the ones you highlighted.
Decision trees and other supervised machine learning algorithms try to solve other problems. You need inputs/features (example: ranks of different factors, for different stocks and for different dates) and one or several outputs/labels (return next month).

The model will get the inputs and will try to predict the output.

Note that you don't have a label in your problem.

Jan 11, 2021 5:09:28 PM       
Jrinne
Re: decision trees, bagging, and boosting

Yuval,

I know you understand this and have used bagging and bootstrapping yourself. Written on these topics elsewhere. Nice explanations BTW. As far as other ideas on how to explain this, I would start by what they are trying to accomplish. The why.

Here is an example from a book about how boosting can fit non-linear data: Image.

If you want to fit a line to the data do not worry about it just use multiple regression or the 2-dimensional equivalent here: linear regression. If you want a little better fit use boosting. In this example, at 10 iterations the boosting model (blue) is fitting the data pretty well already. To my eye, a little better than a line would. But that could be just me.

That is the why. Or more technically: Not all curves are a line.

For the how I would recommend an easy book for $2.99. This book truly has all the theory one needs to understand this. Not easy enough to fit into one post however. Too many pictures in the book for one post: Machine Learning With Boosting: A Beginner's Guide

Best,

Jim

Attachment Screen Shot 2021-01-11 at 6.07.20 PM.png (446111 bytes) (Download count: 80)


From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Jan 11, 2021 5:20:36 PM       
Edit 25 times, last edit by Jrinne at Jan 11, 2021 7:10:44 PM
Jrinne
Re: decision trees, bagging, and boosting

Azooz,

Nice to hear from you. I hope things are going well.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Jan 11, 2021 5:25:31 PM       
yuvaltaylor
Re: decision trees, bagging, and boosting

The problem you have is an optimization problem. It can be solved using genetic algorithms, pbil and other methods like the ones you highlighted.
Decision trees and other supervised machine learning algorithms try to solve other problems. You need inputs/features (example: ranks of different factors, for different stocks and for different dates) and one or several outputs/labels (return next month).

The model will get the inputs and will try to predict the output.

Note that you don't have a label in your problem.


Ah, now I see. Thank you! Do these three machine learning algorithms use the mathematics of probability in any way? In my reading about them, I haven't found any evidence that they do.

Yuval Taylor
Product Manager, Portfolio123
invest(igations)
Any opinions or recommendations in this message are not opinions or recommendations of Portfolio123 Securities LLC.

Jan 12, 2021 1:53:47 PM       
yuvaltaylor
Re: decision trees, bagging, and boosting

Yuval,

I know you understand this and have used bagging and bootstrapping yourself. Written on these topics elsewhere. Nice explanations BTW. As far as other ideas on how to explain this, I would start by what they are trying to accomplish. The why.

I have definitely used bootstrapping but I don't remember ever understanding what bagging is. On the other hand, my memory is terrible, so if I did, I apologize.

For the how I would recommend an easy book for $2.99. This book truly has all the theory one needs to understand this. Not easy enough to fit into one post however. Too many pictures in the book for one post: Machine Learning With Boosting: A Beginner's Guide

Best,

Jim

This looks like a terrific book. I've purchased it and will read it soon. Thanks a million!

Yuval Taylor
Product Manager, Portfolio123
invest(igations)
Any opinions or recommendations in this message are not opinions or recommendations of Portfolio123 Securities LLC.

Jan 12, 2021 1:59:10 PM       
Edit 1 times, last edit by yuvaltaylor at Jan 12, 2021 2:34:52 PM
azouz110
Re: decision trees, bagging, and boosting

Jim,

Thank you. Everything is alright. Hope things are going well for you as well.
I started working again with ML the last three or four months. Still doing a lot of experiments but seems promising when using the right features.
Feature engineering is certainly one of the most important aspects to come up with a good model.

Jan 12, 2021 3:01:44 PM       
azouz110
Re: decision trees, bagging, and boosting

Yuval,

Boosting and bagging is what is called Ensemble methods. These are methods to combine multiple models to get a better prediction.
https://towardsdatascience.com/ensemble-metho...and-stacking-c9214a10a205

Other models such as Random forests, XGboost/Lightgbm (These two are boosted decision trees) are based on decision trees.
https://www.hackerearth.com/practice/machine-...l-decision-tree/tutorial/

Jan 12, 2021 3:15:04 PM       
Jrinne
Re: decision trees, bagging, and boosting

Yuval,

I do not want hit you with too much at once. But as you look at this you may encounter mean-squared-error (mse) or root-mean-squared-error (rmse).

Just be aware that every time they use those metrics that you can also use mean-absolute-error (mae) or even other metrics of your choosing. It is trivial to change the metric with XGBoost (and other ML programs).

As I recall, at one time you were not a fan of rmse and preferred mae. As do I—especially when there are outliers.

Don’t spend too much time worry about which metric is used in the examples is my only suggestion. You can always change it.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Jan 12, 2021 5:44:09 PM       
yuvaltaylor
Re: decision trees, bagging, and boosting

Yuval,

Boosting and bagging is what is called Ensemble methods. These are methods to combine multiple models to get a better prediction.
https://towardsdatascience.com/ensemble-metho...and-stacking-c9214a10a205

Other models such as Random forests, XGboost/Lightgbm (These two are boosted decision trees) are based on decision trees.
https://www.hackerearth.com/practice/machine-...l-decision-tree/tutorial/


Thanks. These articles do help me understand to some degree. I'm still puzzled as to how decision trees can be used with P123 data, though. I can definitely see how ensemble methods can (and I use those myself, clumsily). Maybe I'll get it after I read the the Machine Learning with Boosting e-book.

Yuval Taylor
Product Manager, Portfolio123
invest(igations)
Any opinions or recommendations in this message are not opinions or recommendations of Portfolio123 Securities LLC.

Jan 12, 2021 11:24:41 PM       
Edit 1 times, last edit by yuvaltaylor at Jan 12, 2021 11:25:36 PM
Posts: 13    Pages: 2    1 2 Next
 Last Post