Index | Recent Threads | Who's Online | Search

Posts: 13    Pages: 2    Prev 1 2
Last Post
New Thread
This topic has been viewed 420 times and has 12 replies
Re: decision trees, bagging, and boosting

. I'm still puzzled as to how decision trees can be used with P123 data, though. I can definitely see how ensemble methods can (and I use those myself, clumsily). Maybe I'll get it after I read the the Machine Learning with Boosting e-book.


One thing I know for sure it that you should stick with your present method that you are now using for your investing.

Personally, I think it is good that your are looking at boosting to see if it does (or does not) have a place in your investing. But you certainly have no pressing need to make any changes now that I see.

The book will NOT give you an answer as to whether to add some boosting to you investing, I think.

I think, in addition to looking at the book, listening to what Azooz has to say as well as using other resources you need to focus whether non-linear is better than linear. That is what will tell you what methods to use, I think

Use more basic models to understand this. Is linear regression or polynomial regression (with its curved line) better for non-linear financial data?

Make up some contrived financial data that is clearly nonlinear. Do you want a curved-line fitting that data or a straight line. What 2-dimensional model is best for that? Is Theil-Sen estimator a better estimator? Is that the best? Something better still?

Theil-sen estimator is better for many things because it uses a different metric than standard linear regression but is it still linear, I think. Could you do better still if there were something that uses the Theil-Sen estimator metric but is non-linear?

If the answer to this last question is yes then boosting could give a similar non-linear result with a Theil-Sen-like metric.

Anyway, stick with 2-dimensional examples to sort this out is what I would recommend. Read the book but focus later on how boosting gets its non-linear solution.

Figure out how P123 is really picking stocks. Is that linear or mostly linear? Are the weights of the factors constant? Does that make it linear? If not linear then what curve is it? Would a non-linear P123 classic solution be better still?

Anyway, the only thing I know for sure is that you should continue to use your present system and make any changes gradually.

Maybe focus on the why to use non-linear methods (or not) if you want to know what models to use.

And for sure keep talking to Azooz. He knows what he is talking about.

And he can give you, P123 or anyone else a computer solution to any machine-learning problem in any computer language you could imagine. Want it in C++: no problem. Want it this afternoon: no problem. Any other computer solutions you want by this afternoon since you have such an easy request for me?

Also notice what Azooz is doing is not a cookbook solution. If I understand correctly, he is having to look closely at what factors to use and the best methods of selecting them. He can expand on this further if he wants and I might not have understood his post fully. But perhaps even he does not have a cookbook solution.

Anyway, use him as a resource. Email him if you haven’t already.

And he could do some programming for P123 on machine learning if there is ever a need.



From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Jan 13, 2021 8:09:01 AM       
Edit 6 times, last edit by Jrinne at Jan 13, 2021 1:30:14 PM
Re: decision trees, bagging, and boosting


P123 classic is a powerful tool and no one has to use machine-learning to make money here. Furthermore, Marco is working to expand the tools available through the API for whatever methods people want to employ including (but not limited to) machine-learning.

So I am not trying to convince anyone to use machine learning. That having been said I have benefitted tremendously from discussing machine learning with Steve Auger, Azooz, you and an others on this forum.

I also know that you have been a fan of de Prado in the past. He has this to say about multiple regression (and linear regression):

"If the statistical toolbox used to model these observations is linear regression, the researcher will fail to recognize the complexity of the data, and the theories will be awfully simplistic, useless. I have no doubt in my mind, econometrics is a primary reason economics and finance have not experienced meaningful progress over the past decades."

Okay, that is a bit too strong.

From time to time I review linear regression. I forget the equations. Lose sight of how closely correlation is tied to the slope of the regression line. How important Z-score is to linear regression. Lose sight of how the line is related to the data. Honestly, if one fully understands that then I think the rest is just an extrapolation.

Boosting is just linear regression with a curved line and an extra dimension or two. Ultimately the equations are equivalent if you use root-mean-squared-error as your boosting metric.

Okay, that’s not right. Boosting is linear regression without any of the limiting assumption—like homoscedasticity, normality and importantly of linearity. But that is all theory one needs to understand.

Ultimately however, if one is interested in the equations they are equivalent to the equations for linear regression and with the same purpose. I think authors go out of the way to make this difficult so they can look super-smart. That having been said, Steve Auger and Azooz really are super-smart and do not have to go out of the way to make it seem difficult.

Me, I just try to fully understand linear regression and imagine how those equations can be used on a curved line. Copy a little Python code from the internet. Reuse it for my next project. Anyone can do it if they have an interest.

And maybe--with a little persistence--I will have data to see how close de Prado is to the truth.



From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Jan 14, 2021 3:49:32 AM       
Edit 22 times, last edit by Jrinne at Jan 14, 2021 5:08:18 AM
Re: decision trees, bagging, and boosting

Thanks, Jim, that makes a lot of sense to me. It's a nice explanation indeed.

I was doing some non-linear regression yesterday regarding the relationship between portfolio size and buy and sell position ranks (i.e. to get a portfolio of 40 stocks with a buy rule of rankpos <= 10 what does the corresponding sell rule (rankpos > X) have to be?). I found a power equation that came pretty close, but the best thing was just grabbing a bunch of data points and making educated guesses as to the in-between values. I suppose that's somewhat comparable to what the decision-tree ML algorithms are doing.

At some point I should learn Python . . .

Yuval Taylor
Product Manager, Portfolio123
Any opinions or recommendations in this message are not opinions or recommendations of Portfolio123 Securities LLC.

Jan 14, 2021 11:32:09 AM       
Posts: 13    Pages: 2    Prev 1 2
 Last Post