Deep Learning a Deep Problem for us?

For whatever reason, I have been looking at Deep Learning recently. Do not get me wrong. I do not think I could use this now and I am not recommending that Marco go out and buy a deep learning machine. But the machines they use are incredible!

To start with, if you really want to use this technique a CPU is out of date. You want to use a GPU. The amount of data they can process boggles my organic neural network (brain). There are few commercial machines available—most are specially made for Google, Amazon, Facebook and perhaps your favorite hedge fund.

You could get an NVIDIA machine for over $100,000. It seems graphics cards handle large arrays particularly well. I am not computer savvy but I can’t help but think of processing 8k pixels 60 times a second ^ petoflop level multiplied by hundreds of processors (something like what Google has). I did read about a quant fund that spends $1,000,000 a year on electricity but I have no insight into their system.

Fortunately, structured data (e.i., spreadsheet stuff) probably does not need an artificial neural network. Whew! My wife said I could not get that NVIDIA machine.

But this method may be useful for time-series using LSTM. LTSM stands for Long Short-Term Memory. Useful for time-series pattern recognition using recurrent neural networks. It would be extremely difficult (okay impossible) for me to program something that works. And let’s face it: when I was done it would be overfit.

But could you imagine someone with the resources programming everything Georg Vrba does (and more) into one artificial neural net?

Crazy huh? Just about as crazy as a computer figuring out how to beat the best human player at the game Go, I think. Do you think Google is building those incredible machines and using some of the best programmers in the world to just play games? Maybe companies are just building self-driving cars, or Amazon’s facial recognition system, Facebook’s um……sales/advertising Cambridge Anayltica um stuff. Maybe.

Doesn’t Google publish some limited data on its searches (regarding sentiment) for investors? Maybe they were too busy playing Go to see if this data has a pattern that could be recognized using deep learning. BTW, Google released TensorFlow as an open source and this is just one of the (often used) methods. One of the public ones anyway.

You can find some public research by Stanford (using TensorFlow). But what they are doing is LIMITED BY THEIR RESOURCES as they state in their own paper. link: [url=http://cs229.stanford.edu/proj2017/final-reports/5241098.pdf]http://cs229.stanford.edu/proj2017/final-reports/5241098.pdf[/url]

It is arguable as to how useful the technique used in this paper really is. It does not seem to be arguable that their conventional technique–used for comparison—is not so good.

Speaking of games: you cannot win this one without P123 (and maybe more) on your side.

-Jim

Jim,

If you like this, I’d play with BigML and some of the other cloud based machine learning as a service models. They are very accessible. Problem is to continually get the data into them - have to write code, but they have open API’s and are very powerful and pretty easy.

I built some systems with random forest ensembles and deep neural nets in the past that seem to work for market timing on weekly and monthly basis - but too hard to run regularly given my life. But could be coded.

Best,
Tom

Thanks Tom! That is a good idea and something I might use at some point.

-Jim

Just left NVIDIA. My takeaway was that we’ve come pretty far, but there are a lot of problems that remain to be solved before ML/DL reach their full potentials.

Biggest problem, imho, is the need for warehousing cross-compatible tagged data repositories. Currently, it seems like even different people in the same teams follow their own tagging and labeling best practices. All the fancy algos in the world are nothing without clean and accessible tagged data. As a result, more than a few junior engineers have been relegated to overpaid data-labelers.

If someone can figure out how to scale this, PM me. I have ideas and will work for stock options.

You mean alternative financial data is hard to label? You mean something like: Satellite images of malls, quantitating and standardizing text message data, labeling data on searches, etc. Maybe you mean labeling the stuff that Facebook uses? Or do you mean the usual financial data is not adequately labelled by SP500 and other sources (earnings etc)?

Please tell us more!

-Jim

I was really talking about data that’s not already structured. My comment doesn’t really apply to conventional financial data, except pattern recognition when it can be applied to conventionally structured data.

For example, people are really good at pattern recognition but conventional computer logic is not. After studying technical analysis for about 30 minutes, I think most people would be able to recognize most kinds of named patterns. However, programming these patterns is exceptionally hard. I tried to do so years back at a hedge fund using “wavelets” (i.e., local regressions), but found that even that framework was too restricitve to encode a 30 minute introduction to technical analysis.

So I guess what I’m saying is that there are probably some novel ways to apply DL by re-tagging conventional data with patterns that are difficult to programmatically detect (and that make sense economically?) and then training neural nets to find similar occurrences.

The next step is tying the pattern detection component into an event-based backtester.

I haven’t done any of this, but it be somewhat more trivial to hack had I access to tagged data.

David,

Thank you for your insight.

I know less than you so I will refer you to the link above again http://cs229.stanford.edu/proj2017/final-reports/5241098.pdf

My short impression (because I am probably wrong and do not want to waste people’s time) is that with enough layers (that’s why they call it deep) the machine can finds the pattens in structured financial data on its own. Being a black box, maybe you will never know what they are. You never have to tell it about “head-and-shoulders” patterns and it may, or may not, be noticing this pattern.

They listed the structured input that they used in the paper. They use LSTM that, I think, can take some structured inputs and I think the machine was finding new patterns.

In the paper they just noticed that the machine did not do as well when the market was trending but did not seem to know why. Their answer to this was reinforcement learning. They did not look to programming their own patterns into the neural network.

But there is a lot of research into unstructured data too. Some of the early stuff used Youtube cats—lots of cat stuff and already labelled. Amazon has a system that it is trying to sell to the CIA that can recognized 100 faces in a crowd in real time.

P123 uses CAPTCHAS (completely automated public Turing test to tell computers and humans apart) but there are machines that do better than I can (I am not so good and it is not rare for me to take 2 tries to send an email to a fellow member). Computers are already better than I am at one Turing test.

Alternative financial data probably does need to be labelled (if it is not already labelled on Youtube). I imagine there are machines that use both already.

This may not offer any additional insights. And anyway, I will stop here before I make a complete fool of myself.

-Jim

Tensorflow is designed for image recognition and would therefore be ideal for 2D pattern recognition such as head-and-shoulders, cup with handle, etc. The basic problem is feeding the neural net with such patterns that have been pre-identified, and also series with no patterns. It would take a lot of manual labour but the results could be fruitful.

As for using neural nets for stock prediction based on fundamentals and other custom series, DEEP neural nets are not appropriate. Internal nodes should be kept to a minimum so as to avoid memorizing the past. When I worked with NN’s 20 years ago, I found that the number of internal nodes should be about half the number of inputs for optimum prediction capability. You want the neural net to make predictions not regurgitate the past. You also need approximately 4x historical data for training/evaluation as OOS data before the NN needs to be retrained.

Steve

I will have to read the paper, but it’s interesting that that a hedge fund manager I know has had the opposite problem – the nets worked when the market was trending, but blew up when it went sideways. I guess it depends how the nets are trained and what data they are exposed to. As Steve mentioned, the net will just memorize the past if it allowed to be have enough layers (and in a very obfuscated way!).

I agree with all.

There may be other interesting use cases in finance for DL outside pattern recognition. I have been exposed to a few papers where researchers trained the nets to price options. What’s even more interesting is that while the nets should be able to prices options as the market has, they will not be able to determine when the market is pricing them incorrectly. Thus I would conclude that neural nets add zero to the market’s price discovery process.

The advantage for neural nets. if designed properly, is the ability to consolidate many often contradictory indicators into a single signal. And there is of course the chance of squeezing a small amount of alpha out of hidden relationships. But they won’t work magic.

It sounds like DL is therefore mostly a tool for doing the heavy lifting on exploratory data analysis. Conventional models, it seems, are still better suited to guide actual decisions.

If so, our preferred use cases for DL dovetail nicely with the concurrent discussion on moderator variables.

What are they? How do they interact with other variables? When do they work? Etc.

However, I don’t think nets are yet able to answer why.

So here is a possible play:

Solution: stick with one of the best machine learning tools for fundamentals there is—P123.

But I would decompose the results to:

Expected return = p * (average return for winners) - (1-p) * (average return for losers) [equation 1]

p = probability of a positive return

Yes. Indeed, the most common implementation of neural networks with one layer is equivalent to a logistic regression—which I think confirms Steve’s point.

So, “consolidate” any market-timing indicators you use into a logistic regression, or single layer neural net, to obtain p(hat):

p(hat) = predicted probability of a positive return for that port

plug p(hat) into equation 1 above to get your expected return (over say the next week).

I won’t quibble over which kinds of factors are best for this. I would just suggest that you use the ones that you think are good (if any).

Add new positions to the ports with the greatest expected returns. If no ports have positive expected returns you do not open any new positions (and begin to move out of the market).

Finally, add more layers to see how much Alpha there is from adding interactions (see Steve’s post).

Easy to do (and I probably will). As usual it is the factors you chose that is most important. More important than any methods I might entertain.

Thanks guys!!!

-Jim