Index | Recent Threads | Who's Online | Search

Posts: 127    Pages: 13    Prev 1 2 3 4 5 6 7 8 9 10 Next
Last Post
New Thread
This topic has been viewed 3180 times and has 126 replies
Jrinne
Re: Python code for calling 123 API

If you want I can give you that (if Jim is OK with that) and it will get you up and running quickly.


Thanks Steve that would be good.

Honestly, my interests aside, I think there is a business opportunity here especially since Quantopian is leaving a vacuum.

And to reinterate no one is asking for P123 to provide any machine learning.

I do not think Colab will go away. Google is essentially training young people in machine learning that they hope to hire in the future. It is used in all of Googles online courses in Coursera.

And honesty even JASP is good. Drag and drop. Menu driven. Completely free.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 18, 2020 2:24:36 PM       
Edit 1 times, last edit by Jrinne at Nov 18, 2020 2:25:23 PM
Jrinne
Re: Python code for calling 123 API



Marco, you should provide an easy download into a csv file that is immediately ready for upload into JASP.

Jim


from where? time series? what data? multiple periods?

example please.

Marco,

So this is all you need. This is what Steve sent me and I immediately uploaded this into Python and had results in 15 seconds.

But you need the date, ticker, any factors and then the "target." The target can be anything but for me it is the next weeks returns.

This is what is need for JASP and for Python to be useful at all.

I see no reason to make it hard to get a csv in this format in DataMiner or to make anyone use data wrangling or munging to get here.

Jim

Attachment Example.png (164906 bytes) (Download count: 84)


From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 18, 2020 3:11:30 PM       
marco
Re: Python code for calling 123 API

Jim, I don't follow. Target is ok for us to include. But what's Input1, Input2, etc. ? Those are not ranks. Sure looks like data / factors and we cannot re-distribute that. They need to be ranks of some sort. That's why I was proposing different ranking methods that transform the data but basically hold the same information for an AI system w/o breaching the license.

Portfolio123 Staff.

Nov 18, 2020 8:57:09 PM       
Edit 1 times, last edit by marco at Nov 18, 2020 8:57:51 PM
InspectorSector
Re: Python code for calling 123 API

Marco - What Jim was showing was a small sample of data that I scraped somewhere. it will all be ranks. I just wrote a routine that pulls ranking system data weekly and dumps it into a file. I haven't tested a neural net with Ranked inputs but will be doing that soon. All is good.

Nov 18, 2020 10:51:53 PM       
Jrinne
Re: Python code for calling 123 API

Jim, I don't follow. Target is ok for us to include. But what's Input1, Input2, etc. ? Those are not ranks. Sure looks like data / factors and we cannot re-distribute that. They need to be ranks of some sort. That's why I was proposing different ranking methods that transform the data but basically hold the same information for an AI system w/o breaching the license.


Marco,

Right. I was just focusing on the column headers. Sorry for not being more clear.

The reason for the oversight in the way I presented this to you is XGBoost literally does not care (and therefore I was not even looking at Steve’s numbers). Steve could have sent me raw data, rank positions, multiplied everything by 1,000, given me the Z-Scores and XGBoost will spit out the same answer. Literally exactly the same answer (because the order is preserved with each of these "transformations").

It is true that it is probably best to normalize neural net inputs which ranking already does. If a person who uses neural nets likes to have the inputs normalized between 0 and 1 (as is often recommended) the ranks can be divided by 100.

Steve has more experience with neural nets than is apparent in his posts. Perhaps his is avoiding jargon that he thinks will not be understood by everyone in the forum. We talked about normalizing the data (or standardizing it) for the neural net. But he knows a neural net can generally handle data that is not normalized. Often it just takes longer to run as was the case with his data.

Consider this. Neural nets are famous for recognizing picture of cats on the internet. Who thinks the distribution of cats on the internet is a Gaussian distribution? Or that the data for self-driving cars is a bell-shaped curve?

And the code I shared with him uses "BatchNormalization" that normalizes each batch and GENERALLY resolves these issues. Not that he won't address this at some point before he funds a neural net model or makes it available to others: he will I think.

And I reinterate that this whole topic of normalization can be dropped completely for boosting.

So anyway to the point. Here is an example of a spreadsheet with over one million rows of data.

This does use ranks.

I like to use excess returns for the Target. This reduces the noise that comes from random changes in the market.

"Combined Rank" is P123’s rank using all of the factors. Factor1, Factor2,……., Factor7 are ranks of individual factors (each given 100% weight).

So I could get predictions on each ticker and compare (after sorting) to see how "Combined Rank" performed. I compared how P123’s rank performed to the predicted returns with a neural net or with boosting using the the individual factors as inputs for ML. So for the ML predictions the ranks of individual factors were used as inputs.

My opinions on the value of ML were not formed without evidence.

Interesting work Steve. Perhaps this is the ticket for us to distinguish ourselves from the rest.


Yep. Steve gets it. Steve is incredibly intelligent, is a good programmer and has more experience that people realize.

As an aside this seems weird:

Chaikin uses https://www.r2.ai/ and he must have just learned it


Isn’t Marc Gerstein over there at Chaikin? You know the guy who c*sses every time anyone uses a number and calls them a quant?

Marc is now over there producing a product that uses machine learning? Has he gotten Physics Envy all of the sudden?

Maybe h*ll has frozen over.

So P123 is now literally the last place on earth to advocate using stock IDs to download data onto a spreadsheet to simulate bootstrapping (if Yuval is still recommending this use of spreadsheets) when one can do real bootstrapping with a drag and drop program that is menu driven (like JASP). Do it correctly and literally in seconds. Or if so inclinded use some of the most advanced programs on the planet (like TensorFlow) without that much additional work.

Weird. I think even Marc is not willing to die on that hill and is willing to adapt.

Anyway, it would take just a little for P123 to move away from spreadsheets and the DDM as its only business model. And you can keep the ranks, I believe.

Best,

Jim

Attachment Enough Data to Form a Reasoned Opinion.png (303758 bytes) (Download count: 64)


From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 19, 2020 5:47:42 AM       
Edit 26 times, last edit by Jrinne at Nov 19, 2020 8:50:00 AM
piard2
Re: Python code for calling 123 API

Here are my 2 cents. I am not an ML specialist, just a former software architect. I have tested only a couple of scikit-learn algos in the (dead) Quantopian environment, I don't know Colab and JASP. These are my ideas to start a higher-level interface between P123 and any ML environment
First question (to Marco and P123 staff): Does P123 want to provide a higher-level Python interface to help advanced users feed ML apps or web services?
If the answer is yes, then the first step is to define a GENERIC OUTPUT FORMAT for tables of features/labels, independently of the target app or service (should it be hosted by Google, Sagemaker, Azure ML or other) . Specialization to fit a specific environment (JASP, Colab) may come in a second step if needed. ML algos eat tables of features and labels, separed in training, validation and test sets. A generic output format for features/labels is especially important because ML is a fast changing world, but the paradigms of features/labels and training/validation/test are constants in supervised learning (which is the subset of ML we seem to be interested in).
The output format can be determined by a few constant and variable characteristics:
- We are manipulating financial timeseries, so the index of the table's first axis (= key of each row) is a couple (time,ticker). We are handling daily data, so time is a date. The number of rows in the table depends on the combination of (ticker,date) we want in the training set (or validation set, or test set).
- The columns of the table are inputs used for prediction (= features) and outputs to predict (= labels). The number of columns (excluding the double index) is the number of features, plus at least one column of label (more if we need multi-labels: predicting return AND volatility for example). If P123 allows us to export ranks and not raw data, one feature is either a rank, or another authorized exportable data (sector or industry for example). The label is the outcome to be predicted. It may be the future return of the ticker 1 week, 1 month, 1 quarter after the date of the row's index value (when the ranks are measured). But it should not be limited to that. It may be a discrete classification with 2 or more values (example with 3 values: beats the sector index by 1% or more, lags the sector index by -1% or more, or in between). I think classification algos (predicting a category) are more appropriate for our purpose than regression algos (predicting a return), but this is debatable: so let the choice to the user.
A key point in the structure: a feature is measured on the date of the index value, a label is measured after a specified elapse time (outcome to be predicted).
THE OUTPUT STRUCTURE MAY BE DEFINED AS A CSV FILE. HOWEVER, IN A PYTHON API WE MAY DIRECTLY OFFER POWERFUL OBJECTS: PANDAS DATAFRAMES.
In an ML app, the upstream data wrangling (cleaning, reformating, enrichment of features) is much more important than the algo itself. Pandas is the best of Python for that. An overview of what is embedded in a pandas dataframe:
- Usual array functions optimized for large datasets, with cleaning operations (like dropna).
- multi-indexing (naturally fits our need of a double index)
- a bunch of statistical operations (useful for feature enrichment: creating new features from existing ones)
- SQL-like groupby function calling statistical functions on subsets (for example to create new columns with cross-sectional features on sectors) and multi-table queries (join and merge).
- Json, csv or binary serialization (to feed external apps or web services)
Hope this helps

Nov 19, 2020 7:35:39 AM       
Edit 2 times, last edit by piard2 at Nov 19, 2020 8:52:05 AM
Jrinne
Re: Python code for calling 123 API

Frederic is right about machine learning (as he always is).

I did not read through all of his his post as much of it does not apply to me.

But he is clearly correct about cross-sectional data and time-series data both being widely used and being different.

I emphasized cross-sectional data. For the most part that is what P123 does with its ranks and sims.

One reason I did not mention time-series data is that I just get that from Yahoo! when I look at that time-series data. I am not sure that this is where P123 can show its true strength.

And I have tried recurrent neural nets with advanced Long Short-Term Memory for time-series data that has been developed just recently. I have not gotten it to work for me. That is not to suggest that someone else cannot get it to work for them.

Anyway Frederic and Steve both have good insights, IMHO.

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 19, 2020 7:54:02 AM       
Edit 5 times, last edit by Jrinne at Nov 19, 2020 8:51:09 AM
InspectorSector
Re: Python code for calling 123 API

Does P123 want to provide a higher-level Python interface to help advanced users feed ML apps or web services?

I have Python code right now that takes a simple ranking system and dumps the outputs to a csv file stored on your Google Drive. You provide the date range and the program generates the weekly RS data. It is written for Colab but you can easily port it to your desktop. A sample file output is attached.

I will be making this public. P123 can take it and make the code more professional if they wish.

Attachment sample.jpg (286881 bytes) (Download count: 58)


Nov 19, 2020 9:03:23 AM       
InspectorSector
Re: Python code for calling 123 API

The Colab code for generating a file of RS data can be found here: https://colab.research.google.com/drive/1V2hH...73n4KOj0IdWML?usp=sharing

You need to put p123api.py onto your Google Drive in a subdirectory called 'Modules'. You can of course modify the code and directory as you wish. The p123api.py file has to be in its native form (text file). If you try to extract it from GitHub and then save it using Colab you will get an error when you try to import it into your code.

The CSV file will be written into the Google Drive directory 'Modules'.

A sample ranking system is here: https://www.portfolio123.com/app/ranking-system/374641

The RS should have a flat structure with no composite or conditional nodes. The node labels become the column labels for the csv file.

Enjoy!

Nov 19, 2020 9:26:34 AM       
Edit 1 times, last edit by InspectorSector at Nov 19, 2020 9:27:25 AM
Jrinne
Re: Python code for calling 123 API

The Colab code for generating a file of RS data can be found here: https://colab.research.google.com/drive/1V2hH...73n4KOj0IdWML?usp=sharing

You need to put p123api.py onto your Google Drive in a subdirectory called 'Modules'. You can of course modify the code and directory as you wish. The p123api.py file has to be in its native form (text file). If you try to extract it from GitHub and then save it using Colab you will get an error when you try to import it into your code.

The CSV file will be written into the Google Drive directory 'Modules'.

A sample ranking system is here: https://www.portfolio123.com/app/ranking-system/374641

The RS should have a flat structure with no composite or conditional nodes. The node labels become the column labels for the csv file.

Enjoy!


Who would have know that I just had to reshape the array into a flat structure? I have done this before in TensorFlow for time-series data: but just barely.

Anyway, excellent. I think I can get what I need thanks to Steve’s help.

Still, if it were me I would make it a little easier.

I know i am the big advocate for machine learning including the use of TensorFlow. But some days I wouldn’t mind taking ten seconds to run a regularized regression on JASP. AND MARCO SEEMS TO BE MINDFUL OF THIS. Or download some data on my Mac without using Parallels or Bootcamp or going back to the office for a Windows machine. And Steve has been doing this all of his life.

I don’t really like doing this any more than Steve would want to take over for me during surgery, I suspect.

And just to be clear for anyone having any doubts about using machine leanring, TensorFlow and XGBoost are not the hard parts of this. I think Steve will attest to this in a few weeks if not already. Our hardest problems have been getting the data over to Colab without a doubt. And it took Steve to do it.

P123 could make if a little easier I believe. But still very much appreciated all around. Thank you Steve and P123. I could live with it as it is. Probably.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Nov 19, 2020 10:17:59 AM       
Edit 10 times, last edit by Jrinne at Nov 19, 2020 10:36:31 AM
Posts: 127    Pages: 13    Prev 1 2 3 4 5 6 7 8 9 10 Next
 Last Post