Index  Recent Threads  Who's Online  Search 

New Thread 

yuvaltaylor

I was wondering if anyone could explain how these ML tools can be intuitively understood by a human being when applied to factors, ranking systems, and/or equity curves. I have tried reading a number of explanations aimed at laypeople (nonprogrammers) but have failed to understand how decision trees can substitute for multiple regression. I know that they CAN, and I know that they produce better results than multiple regression analysis. I just want to understand the mechanism in an intuitive manner. For example, I can understand multiple linear regression by extrapolating linear regression into multiple dimensions. I understand correlation and probability and how regression analysis relies on them. But every explanation I find about decision trees seems Boolean and makes no reference to correlations or probability, and I can't see how a Boolean process could apply to what we do here. Let's take a very simple problem. We have three factors and we want to weight them so that the combination will be optimally effective when applied to a certain group of stocks over a certain period of time. The multiple regression solution is to look at the results of each of the factors on its own and then weight them accordingly. Another solution (I don't know what it's called) is to try out about twenty different combinations, take the best one, and create variations of it, and so on, until a good fit is found. What's the decision tree/bagging/boosting solution? Yuval Taylor Product Manager, Portfolio123 invest(igations) Any opinions or recommendations in this message are not opinions or recommendations of Portfolio123 Securities LLC. 


Jrinne

Yuval, I know you understand this and have used bagging and bootstrapping yourself. Written on these topics elsewhere. Nice explanations BTW. As far as other ideas on how to explain this, I would start by what they are trying to accomplish. The why. Here is an example from a book about how boosting can fit nonlinear data: Image. If you want to fit a line to the data do not worry about it just use multiple regression or the 2dimensional equivalent here: linear regression. If you want a little better fit use boosting. In this example, at 10 iterations the boosting model (blue) is fitting the data pretty well already. To my eye, a little better than a line would. But that could be just me. That is the why. Or more technically: Not all curves are a line. For the how I would recommend an easy book for $2.99. This book truly has all the theory one needs to understand this. Not easy enough to fit into one post however. Too many pictures in the book for one post: Machine Learning With Boosting: A Beginner's Guide Best, Jim Screen Shot 20210111 at 6.07.20 PM.png (446111 bytes) (Download count: 80) From time to time you will encounter Luddites, who are beyond redemption. de Prado, Marcos López on the topic of machine learning for financial applications 

Edit 25 times,
last edit by
Jrinne
at Jan 11, 2021 7:10:44 PM

Jrinne

Azooz, Nice to hear from you. I hope things are going well. Jim From time to time you will encounter Luddites, who are beyond redemption. de Prado, Marcos López on the topic of machine learning for financial applications 


yuvaltaylor

The problem you have is an optimization problem. It can be solved using genetic algorithms, pbil and other methods like the ones you highlighted. Decision trees and other supervised machine learning algorithms try to solve other problems. You need inputs/features (example: ranks of different factors, for different stocks and for different dates) and one or several outputs/labels (return next month). The model will get the inputs and will try to predict the output. Note that you don't have a label in your problem. Ah, now I see. Thank you! Do these three machine learning algorithms use the mathematics of probability in any way? In my reading about them, I haven't found any evidence that they do. Yuval Taylor Product Manager, Portfolio123 invest(igations) Any opinions or recommendations in this message are not opinions or recommendations of Portfolio123 Securities LLC. 


yuvaltaylor

Yuval, I know you understand this and have used bagging and bootstrapping yourself. Written on these topics elsewhere. Nice explanations BTW. As far as other ideas on how to explain this, I would start by what they are trying to accomplish. The why. I have definitely used bootstrapping but I don't remember ever understanding what bagging is. On the other hand, my memory is terrible, so if I did, I apologize. For the how I would recommend an easy book for $2.99. This book truly has all the theory one needs to understand this. Not easy enough to fit into one post however. Too many pictures in the book for one post: Machine Learning With Boosting: A Beginner's Guide Best, Jim This looks like a terrific book. I've purchased it and will read it soon. Thanks a million! Yuval Taylor Product Manager, Portfolio123 invest(igations) Any opinions or recommendations in this message are not opinions or recommendations of Portfolio123 Securities LLC. 

Edit 1 times,
last edit by
yuvaltaylor
at Jan 12, 2021 2:34:52 PM

azouz110

Jim, Thank you. Everything is alright. Hope things are going well for you as well. I started working again with ML the last three or four months. Still doing a lot of experiments but seems promising when using the right features. Feature engineering is certainly one of the most important aspects to come up with a good model. 


azouz110

Yuval, Boosting and bagging is what is called Ensemble methods. These are methods to combine multiple models to get a better prediction. https://towardsdatascience.com/ensemblemetho...andstackingc9214a10a205 Other models such as Random forests, XGboost/Lightgbm (These two are boosted decision trees) are based on decision trees. https://www.hackerearth.com/practice/machine...ldecisiontree/tutorial/ 


Jrinne

Yuval, I do not want hit you with too much at once. But as you look at this you may encounter meansquarederror (mse) or rootmeansquarederror (rmse). Just be aware that every time they use those metrics that you can also use meanabsoluteerror (mae) or even other metrics of your choosing. It is trivial to change the metric with XGBoost (and other ML programs). As I recall, at one time you were not a fan of rmse and preferred mae. As do I—especially when there are outliers. Don’t spend too much time worry about which metric is used in the examples is my only suggestion. You can always change it. Jim From time to time you will encounter Luddites, who are beyond redemption. de Prado, Marcos López on the topic of machine learning for financial applications 


yuvaltaylor

Yuval, Boosting and bagging is what is called Ensemble methods. These are methods to combine multiple models to get a better prediction. https://towardsdatascience.com/ensemblemetho...andstackingc9214a10a205 Other models such as Random forests, XGboost/Lightgbm (These two are boosted decision trees) are based on decision trees. https://www.hackerearth.com/practice/machine...ldecisiontree/tutorial/ Thanks. These articles do help me understand to some degree. I'm still puzzled as to how decision trees can be used with P123 data, though. I can definitely see how ensemble methods can (and I use those myself, clumsily). Maybe I'll get it after I read the the Machine Learning with Boosting ebook. Yuval Taylor Product Manager, Portfolio123 invest(igations) Any opinions or recommendations in this message are not opinions or recommendations of Portfolio123 Securities LLC. 

Edit 1 times,
last edit by
yuvaltaylor
at Jan 12, 2021 11:25:36 PM

