Index | Recent Threads | Who's Online | Search

Posts: 127    Pages: 13    Prev 1 2 3 4 5 6 7 8 9 10 Next
Last Post
New Thread
This topic has been viewed 3182 times and has 126 replies
marco
Re: Python code for calling 123 API

So that we don't get bogged down too much we would just add a new API endpoint specifically for generating ML data for features (ranks of nodes and composites ) and labels (based on technical data only). Packaging the data for consumption would be done after, either in a DataMiner operation or in a Python program that could be community developed

Fred's suggestions for labels are great. Can we get more specific examples of types of labels? We can probably just let you write your own formula for the label, but having some examples helps. Is this a good start for labels ?

Total Return after X bars
Relative Return after X bars
Future Volatility -- parameters t.b.d
Sector or Industry Return after X bars

Is there anything missing that cannot be derived from this list?

Also, the question if we needed other types of ranking was never really addressed. I guess for now we will just use what we have. Rank by sorting and place NA's in the middle or the bottom, and derive the percentile.

Portfolio123 Staff.

Nov 19, 2020 10:24:05 AM       
Jrinne
Re: Python code for calling 123 API

Thanks Marco.

I would like the excess returns of the ticker over the next rebalance period. The next week’s excess returns for a weekly rebalance or monthly for monthly rebalance

Preferably in excess to the equally-weighted-returns of the universe rather than a separate cap-weighted universe.

I can show you that this minimizes the noise from the market and just works. At least for the data imaged above.

And that is what I want to know (predict) when I buy a stock.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 19, 2020 10:43:49 AM       
Edit 1 times, last edit by Jrinne at Nov 19, 2020 10:44:28 AM
piard2
Re: Python code for calling 123 API

Steve, it looks great. Your code seems to do a big part of the job for a generic "P123 to supervised ML" interface (I have not looked into it though). The table you provide as a result is very close to the structure I describe and it should go through pandas.read_csv() to make a dataframe for platform-agnostic data wrangling. It needs to be generalized and packaged in an API with appropriate parameters (RS used for features, label/target definition, date range, etc...) . It would be great to have the opinion of other people working with other ML environments to see if they have specific requirements (personally I am not really involved in ML projects now).

Nov 19, 2020 10:50:00 AM       
Edit 1 times, last edit by piard2 at Nov 19, 2020 10:54:57 AM
piard2
Re: Python code for calling 123 API

Marco, "Sector or Industry Return after X bars" is not a good label because we must suppose it cannot be predicted from a single stock's features. Features may be industry attributes, not labels, at least in this context. They may be if we create datasets where the index is (industry, date) instead of (ticker, date), to train algos to make predictions on industries.

Nov 19, 2020 11:33:25 AM       
Edit 1 times, last edit by piard2 at Nov 19, 2020 11:39:02 AM
Jrinne
Re: Python code for calling 123 API

Steve, it looks great. Your code seems to do a big part of the job for a generic "P123 to supervised ML" interface (I have not looked into it though). The table you provide as a result is very close to the structure I describe and it should go through pandas.read_csv() to make a dataframe for platform-agnostic data wrangling. It needs to be generalized and packaged in an API with appropriate parameters (RS used for features, label/target definition, date range, etc...) . It would be great to have the opinion of other people working with other ML environments to see if they have specific requirements (personally I am not really involved in ML projects now).

Frederic,

Steve got that format from me when we were doing neural nets and boosting. And I can say it works for all machine learning methods using REGRESSION. Starting from regularized regression including LASSO regression and Ridge Regression, kernel regression, polynomial regression, robust regression using Huber, LOESS, CUBE, Random Forests (a regression tree algorithm that can also do classification), Boosting (another regression tree algorithm that can also do classificatioin), Neural Nets (which can be regression or classification) etc.

You have mentioned in previous posts that you would also like to look at classification. There is a lot of literature regarding classification for stocks and my experience is that this tends to work just as well. You have a great point.

This is one example where a csv downloads onto a local hard drive have helped me. I am sure Steve can do it in Python or Marco can make it available.

But I have just sorted the label in a spreadsheet and started a new "Classification" column next to the label column Then but a 1 in the classification column where the returns (next to it) were positive and 0 when not.

This will predict the probability that the week will have a positive return when trained. It seems to work about the same a regression as I said. Your idea was a good one.

Would you or Steve prefer to program this in Python or just do it in a spreadsheet? Whatever your answer is the spreadsheet has the ability to visualize--right there--that you have one it right. And it offers more flexibility for those not as experienced in Python.

So anyway, Steve’s format (preferably with excess returns as an option for the label) works for everything I have tried in Python and R for both regression and classification. Assuming a csv download onto a local drive to get the classification label.

I like your idea of using classification. It works in my experience.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 19, 2020 11:38:19 AM       
Edit 6 times, last edit by Jrinne at Nov 19, 2020 11:53:22 AM
piard2
Re: Python code for calling 123 API

Jim, I don't have specific programming requirements at this time because I have no ML project. You have visualization packages in Python to draw all kinds of charts. Anyway pandas has functions to make a dataframe from a csv file and the other way (with some requirements on the csv structure: https://pandas.pydata.org/pandas-docs/version..._csv.html#pandas.read_csv ).

Nov 19, 2020 11:46:28 AM       
Edit 1 times, last edit by piard2 at Nov 19, 2020 12:01:37 PM
Jrinne
Re: Python code for calling 123 API

Jim, i don't have specific programming requirements at this time because I have no ML project. Anyway pandas has functions to make a dataframe from a csv file and the other way (with some requirements on the csv structure: https://pandas.pydata.org/pandas-docs/version..._csv.html#pandas.read_csv ).

Frederic,

You are absolutely correct on this. Steve, likes Colab and we have worked on getting the data into Colab together (with Steve actually figuring it out). I like Colab too for a lot of reasons.

But my Anaconda still works on my computer and I use the code you mention. I appreciate your pointing this out in case I was not familiar.

For anyone else reading this here is an example of this command that works:

alldata=pd.read_csv("~/opt/ReadFile/MLFactorUpload.csv"). Is the "~" a Mac thing? Maybe.

Here is a write command that works:

s.to_csv("desktop/s.csv", index=False)

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 19, 2020 11:59:15 AM       
Edit 1 times, last edit by Jrinne at Nov 19, 2020 12:20:19 PM
InspectorSector
Re: Python code for calling 123 API

Fred's suggestions for labels are great. Can we get more specific examples of types of labels? We can probably just let you write your own formula for the label, but having some examples helps. Is this a good start for labels ?

Total Return after X bars
Relative Return after X bars
Future Volatility -- parameters t.b.d
Sector or Industry Return after X bars


Marco - I don't understand your preoccupation with technical analysis and labels. Labels should be user-specified and every column should be user-specified. Don't make any assumptions and try to force things that people don't want. The return for Python should be a 2D array.

Nov 19, 2020 12:32:47 PM       
marco
Re: Python code for calling 123 API

Here's my proposal for enhancements to the API to retrieve ranks so that it's better suited to generate data for an ML system.

You will be able to define any number of 'Extra Data' which will be actual, "raw" values , not ranks. Without a data license you will only be able to write formulas using technical data (prices, dividends, splits for stocks & etfs). With a data license you can use the full set of factors/functions.

The 'Extra Data' items can either be a feature or a label (using AI speak). If you use a -ve value you are specifying a label, otherwise it's a feature . It is up to you to keep it straight and not feed label data for feature data when you submit this data for processing. I used "label" and "feature" in the column name just to help identify.

Attached is a screenshot of what the settings would look for the API call (using pseudo code) and the output it will generate. I color coded in gray the reference columns, in green is the 'actual value' columns that can be either labels or features, and the rank data which is always feature data.

Notice the n/a for 6/1/2020 for the 6mo label since Dec 2020 has not happened yet.

Let me know what you think. Does this make generating ML data a snap ?

And this should not take long to do.

Thanks

Attachment Capture.PNG (71060 bytes) (Download count: 47)


Portfolio123 Staff.

Nov 19, 2020 5:15:01 PM       
Edit 3 times, last edit by marco at Nov 19, 2020 5:33:40 PM
InspectorSector
Re: Python code for calling 123 API

Marco - it looks fine except hopefully the Python routine doesn't return the first line of the spreadsheet, just a 2D array with column names. If the ranking nodes are complex, they should still be 'flattened' so we end up with a 2D array.

Nov 19, 2020 11:38:34 PM       
Edit 1 times, last edit by InspectorSector at Nov 19, 2020 11:40:22 PM
Posts: 127    Pages: 13    Prev 1 2 3 4 5 6 7 8 9 10 Next
 Last Post