Index | Recent Threads | Who's Online | Search

Posts: 22    Pages: 3    Prev 1 2 3 Next
Last Post
New Thread
This topic has been viewed 705 times and has 21 replies
Jrinne
Re: ML integration Update

Marco, do you have a reason to prefer AWS over Azure? I had a 2-week course a few months ago on Azure ML studio, it seemed ahead at modeling pipelines, fast prototyping and managing the full app life cycle. They also have an auto-ML feature with a GUI taking a set of algos with ranges of parameters as inputs, to compare them on your datasets when you don't have a clear idea where to start. I don't know about cost/performance compared with AWS, but it seems cost is easier to control on Azure. I have read a few bad stories from AWS users (individuals and small businesses) who received unexpected large bills because there was no way to set a hard limit for spending in AWS (only alerts). Most of the time they had a way to negotiate the bill, but it was time- and stress-consuming.

Fred,

I wonder if you might expand on your knowledge about Azure. I went to sign up. Looks like a free account is possible but they want a credit card which I do not think I will do today.

Are AWS and Azure different products? Does Azure use Python? Does Azure have pre-packaged solutions: like drop-down menus? Or is it pretty-much like Colab? Colab for me was just like Jupyter notebooks but with different ways to upload files.

Any comparisons to Python (e.g., Jupyter Notebook or Colab) and your experience with Azure (and/or AWS) would be informative for many I think.

Best,

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Feb 22, 2021 8:12:49 AM       
Edit 13 times, last edit by Jrinne at Feb 22, 2021 9:03:53 AM
piard2
Re: ML integration Update

Jim,
Azure ML Studio is a visual layer above the code. It looks like a professional integrated development environment (IDE), where you put and link boxes to model data sources, algos, data pre-processing. A pipeline is modeled as a chart where you drag-and-drop components from a big library, copy-paste them, enter parameters in contextual windows depending on the type of components, build, train and deploy a model without writing a line of code (It is possible to write python code too). MSFT has 3 decades of experience in IDEs and software life cycle management, they have used it to make a tool to develop and deploy apps fast in a quite seamless way. The drawback like with all IDEs, if you start a project in ML Studio, it will not be seamless to port it elsewhere (executable trained models can work outside the platform, but maintenance and iterative development would be complicated without it).

Feb 22, 2021 9:22:17 AM       
Edit 1 times, last edit by piard2 at Feb 22, 2021 9:40:42 AM
InspectorSector
Re: ML integration Update

Several years ago when Azure was in its early stages, there were complaints that credit cards were billed automatically once the free trial was over without informing the user. The free trial was consumption-based, not time based, so you don't really know when the free trial is up. I think that the problem was was rectified but the billing concerns are real and you want to make sure that you are able to set limits on what can spend. I can see having an algorithm that unwittingly burns a lot of CPU time by accident or your account getting hacked. Just a thought.

Feb 22, 2021 9:22:58 AM       
InspectorSector
Re: ML integration Update

He too has run both. He likes boosting for a lot of practical reasons but speed was not one of them.

Jim - speed is one of the reasons. Right now I am running XGBoost against the entire universe of dividend paying stocks, monthly history back to 2003, 11 inputs. I can do about 250 complete training runs in ~20 minutes. It would probably take at least 1 day for 1 training run using Tensorflow.

Feb 22, 2021 9:26:59 AM       
Jrinne
Re: ML integration Update

Jim,
Azure ML Studio is a a visual layer above the code. It looks like a professional integrated development environment (IDE). Drawing and linking boxes to model data sources, algos, data pre-processing. A pipeline is modeled as a chart where you drag-and-drop, copy-paste components, enter parameters on components, build, train and deploy a model without writing a line of code (It is possible to write python code too). MSFT has 3 decades of experience in IDEs and software life cycle management, so they have used it to make a tool to develop and deploy apps fast in a quite seamless way. The drawback like with all IDEs, if you start a project in ML Studio, it will not be seamless to port it elsewhere (trained models can be exported outside the platform, but maintenance and iterative development would be complicated without it).

Thank you Fred,

I am for whatever Marco and users like you and Steve think will work. I am happy to discuss my experience with Python for informational purposes in helping the decision process.

Taking neural nets as an example, seems like Azure might be an easier solution. Not necessarily better if P123 has roadmap for that.

Thank you for the information.

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Feb 22, 2021 9:32:43 AM       
Edit 1 times, last edit by Jrinne at Feb 22, 2021 9:43:38 AM
Jrinne
Re: ML integration Update

He too has run both. He likes boosting for a lot of practical reasons but speed was not one of them.

Jim - speed is one of the reasons. Right now I am running XGBoost against the entire universe of dividend paying stocks, monthly history back to 2003, 11 inputs. I can do about 250 complete training runs in ~20 minutes. It would probably take at least 1 day for 1 training run using Tensorflow.


Thank for the information.

My experience has been different. I wonder why? As you know, I used a lot of layers in my neural-net when I shared my code. You commented on that.

My universe was pretty big: @1,400 stocks per week rebalanced weekly (somewhere over 1,000,000 rows for the neural-net array)

Also true as you know, I do not tend to use a lot of factors/nodes (6 or 7). So not a lot of columns in my array (DataFrame). I considered that as a possible limiting factor when I mentioned that perhaps people might need to consolidate factors into nodes.

As you know I use early stopping.

Should you want to try TensorFlow again in the future: Did you start normalizing or standardizing your data? Batch Size?

Anyway, interesting and great information for P123 to consider.

Best,

Jim

From time to time you will encounter Luddites, who are beyond redemption.
--de Prado, Marcos López on the topic of machine learning for financial applications

Feb 22, 2021 9:41:09 AM       
Edit 9 times, last edit by Jrinne at Feb 22, 2021 10:36:11 AM
piard2
Re: ML integration Update

Several years ago when Azure was in its early stages, there were complaints that credit cards were billed automatically once the free trial was over without informing the user. The free trial was consumption-based, not time based, so you don't really know when the free trial is up. I think that the problem was was rectified but the billing concerns are real and you want to make sure that you are able to set limits on what can spend. I can see having an algorithm that unwittingly burns a lot of CPU time by accident or your account getting hacked. Just a thought.

As I wrote above, in 2020 I have read about more billing issues with AWS because the expense limit is not a hard limit but only triggers custom alerts. If I remember correctly, the worst (and non verified) horror story was a guy who had put AWS login infos in a private Github team directory, which was hacked by a rogue bitcoin miner. He probably received alerts while sleeping and found a 5-digit bill in the morning. Azure has hard limits. Maybe AWS has implemented some since last year.

Feb 22, 2021 9:54:42 AM       
marco
Re: ML integration Update

I have no cloud preference since we have not used them much. Whichever is better suited for our use case

We want to kick off learning & predictions from P123 . And suck data back in. Also would like to be able to get an estimate before kicking off a learning process. This will be very useful once we use our own company account to run user's workloads (and pass on the cost). And doing it all under a single company account should benefit from discounts

Portfolio123 Staff.

Feb 22, 2021 11:50:36 AM       
sgmd01
Re: ML integration Update

I recently used AWS and they update the number of credits you use sometime the next day. So if I exceeded my available credits I wouldn't know it until the next day. I mildly exceeded my credits and that was permitted.

Feb 22, 2021 12:16:55 PM       
enisbe
Re: ML integration Update


enisbe, having our own ML libraries in our system is still being evaluated and will take a bit. There's lots of "wiring" to do and NN require specialized hardware which we do not have right now.


I haven't tried importing these yet. What I had in mind does not actually require anything special. I can provide a trained model saved in a persistent state which I would upload. All that is needed is that p123 "hooks" the model and scores my universe with a ranking system. Technically this is just replacing weights from what you have in your current ranking systems with the model weights. p123 wouldn't do any training of the models. Only scoring. The hook I am referring to is just tensorflow/scikit package that can read my model.

It might be too big to chew at this time but we'll get there.

Feb 22, 2021 2:19:35 PM       
Posts: 22    Pages: 3    Prev 1 2 3 Next
 Last Post