Pure data mining. But true?

So here is an example of pure data mining. Truly data mining and something completely outside of stocks and finance: “Eye Color: A Potential Indicator of Alcohol Dependence Risk in Euorpeain Americans”

Just the abstract but probably enough unless you have blue eyes and…

An association was found. A story developed afterwards: the gene for alcoholism perhaps close to the gene for eye color so that they get inherited together.

Maybe no different than thinking a rising pendant may signal a buying opportunity.

So how strong does the evidence have to be before you would even consider this possible? Should such statistical observations be persued?

Just a different way to think about this.

Bear in mind that data mining is not inherently bad.

It is in investments because it fails to address the core issue – using associations discovered in Group A to motivate action in Group B, which can be very different in significant ways leading to outcomes that can range from excellent to disastrous based on luck rather than research. But when the research is conducted in and meant to be used in the context of the same Group, it’s a whole different ballgame and can be very much desirable with the inquiry focusing instead on the effectiveness of the data mining effort (e.g., robustness of results, permissible inferences, etc.).

That paper is an example of proper data mining, a situation where the research and application addressed the same group, and it looked like the authors were aware of what could and could not be concluded from the data. Now if they wanted to try to conclude that eye color in humans is relevant to addictive traits in household pets, I suspect things would get ugly during the per review process (unless they introduced and demonstrated some sort of behavioral bridge; i.e. traits in the master leading to behavior in the master and in turn to trained or imitating behavior in the pets.).

Ultimately, the only iron-cl;ad rule in any of this is logic. :slight_smile:

Okay. I love this stuff so I have an answer. Not theeeee answer but an answer. One could use the Bayseian belief-updating equation. So it would depend on your prior.

It would depend on your prior belief (odds) that this was true. I would say that it is no higher than 1 to 10,000. Maybe less but let’s be generous. So you could multiply 1/10,000 times the chance that you would get the study result if it is true: let’s be generous and call it 1.00. You then divide that by the chance that you would get the result it was false. That is just the p-value. Or 0.0005. Or 1/(10,000 * 0.0005). Or 1 to 5 odds. Just 1/6 probability or 17% chance that blue eyes are related to alcohol dependence and this assumes a good study.

So even though the study had a very small p-value your prior belief makes it unlikely that blue eyes has anything to do with alcohol dependence.

Here is the equation copied from Phillip Tetlock’s new book. Tetlock is famous for showing that experts do worse than chance at predicting in many situations. He also coined the terms Hedgehog and Fox for styles of prediction. His new books shows than some people can predict well and can even be trained to do so. From his book:

“P( H | D)/ P(-H | D) = P( D | H)/ P( D |-H) • P( H)/ P(-H) Posterior Odds = Likelihood Ratio • Prior Odds The Bayesian belief-updating equation”

Tetlock, Philip E.; Gardner, Dan (2015-09-29). Superforecasting: The Art and Science of Prediction (p. 170). Crown/Archetype. Kindle Edition.

Edit: Marc I did not see that you had posted (we posted at the same time–I think). Please correct me if I am wrong on this. So some factor on examination may be found to be related in some way to the DDM making it a valid (useful) factor. BUT your prior belief that some random factor will be related is small. Using this example maybe less than 1/10,000. So even a great p-value of 0.0005 still puts the odds very much against that factor being relevant.

I know you are making other points about samples too–and I can agree with those points also. I use the DDM and noise factors that are pretty standard fare. That puts me in pretty good shape regarding my priors–possibly better than 1/10,000.

Regards,

Jim

I think of the question in different terms.

DDM framework (not the literal formula but the logic) is valid with a 100% probability – assuming, as I do, that nobody willfully trades a $5 bill for a $1 bill (collectibles excluded!).

Logic also tells us whether a potential p123 factor or formula can be seen as related to DDM. Again, there are no probabilities here. Either it’s logically rational or it isn’t. Example: IntCovTTM. As IntCovTTM falls, we’re increasingly able to say that interest expense, which does not vary with sales, consumes a bigger portion of operating income, which inevitably makes for a more volatile earnings trend (i.e. earnings will vary by magnitudes that exceed the variations in sales), and this is likely to increase future share price movements and hence risk, and that increases the risk component of R in the DDM . . . etc.

The unknown we face are in terms of definition and specification. We’re looking for the risk component of R (among other things). We’re not looking for historic risk but future (unknown risk). Maybe they’ll be the same (in which case Beta is all one needs) or maybe not. We study, test, revise, etc. to develop our specifications.

I suppose one might empirically develop a probability that such-and-such spec will suffice, but I’m not sure how useful that would be. If we were to go that route, we’d need to develop it in terms of conditional probability – and perhaps several layers down a conditions tree so to speak. Example: What is the probability of default on home mortgages? That’s easy. Everybody knew that – even before 2008. But what was the probability of default on home mortgages among borrowers who were spending 65% or more (instead of the stereotypical 35%) of disposable income on debt service? When I was at Reuters back in 2007, I actually posed that question in a discussion among quants and math PhDs and got chewed out pretty soundly for mouthing off on something outside my area of competence. Oh well. :slight_smile:

We really need to be careful here. It’s too easy to run through the usual battery of statistical tests. We all learn that stuff in school. The challenge is in applying it to correctly formed questions. So back to my IntCovTTM example. A typical statement of probability would not be useful. To be useful, we’d need that statement to address a host of conditions – different ranges of leverage (IntCov of 4.6 may be functionally identical to 3.2 or 7.5 but wildly different from 2.8), economic trends (the stronger and more stable the economy, the higher the IntCov that is tolerable), differences among industries (WMT or CL could handle or a decent electric utility could handle much narrower IntCov than could, say, a commodity chip maker), etc., etc., etc., etc.

Do we really want to go there – especially since no matter how many conditions we specify, we’re likely to get sabotaged by the one we didn’t think about? Perhaps in the future, as computational power continues to grow and as databases continue to gain sophistication in terms of what’s collected and how it can be crunched, we’ll get to a position where such conditional probability trees (each condition, of course, interacts with all the others) can be capably done. At present, though, and in the context of the investment problem (going from past to future rather than interpolating within a defined population) I think the risks of complacency that come from crunching a number based on what’s reasonably doable now vastly outweigh potential benefits.

The more I think about and discuss all this, the more I find my mind wandering back to law school, where we got zero points on exams for choosing who won (plaintiff or defendant) and all of our points for the way we identified and articulated the correct questions. I think it’s the same thing here. Anybody with a tolerable computer and basic education anybody can get the right answers. It’s getting the right questions that make or break us. (It’s like at high school parent teacher conferences, where I’d explain to my son’s math teachers that he has the right answers – it’s just that the answers didn’t match the questions on the exam. They got the joke and laughed, but failed him anyway.)

Separate philosophical question:

Does randomness exist? Or is it a fancy way we take ourselves off the hook for our inability to identify and measure all the interacting chains of causation that are out there?

Ed Thorp (“Beat the Dealer” author and successful hedge fund manager) was able to predict what range of numbers were likely to come up on a roulette wheel by measuring the speed of the wheel and the ball. He even developed equipment he could conceal and take into a casino. So there is always the potential to get more information for some systems that are not too chaotic. There are always limits to this approach.

But I don’t want to get too philosophical or even statistical really. Let me accept that a successful strategy will use the DDM as its core and could possibly include some noise factors. That this is necessary for a good strategy. I actually think this is probably true–exceptions to everything of course.

Is that sufficient? Will every strategy, factor and function based on the DDM work? Personally, I do not think every factor that makes sense is always going to return more than the transaction costs–let alone lead to a comfortable retirement.

So how do you know which (logical) factors to use?

When you get right down to it there are only 3 options: 1) Close your eyes. 2) Open them and look just at the graphs. 3) Or, perhaps, look at some numbers while you have your eyes open.

But really I was making the same argument you have been making. Assuming you are not in the close your eyes camp, you cannot just look at the numbers–like a small p-value–blindly. And there is not just one reason for this.

Aw come on . . . philosophy can be fun. And i’m really curious to know if Thorp made $$$ at roulette!!!

I wish, I wish, i wish. If every DDM factor worked I’d be tossing spare millions into Donald Trump’s tin cup (unless he got there first and renamed it TDM, the Trump Discount model).

The logic is impeccable. The problems are two-fold. First, there is no iron-cl;ad rule as to which proxies are best for representing the variable, much the way artist’s are plagued by the reality that there is no skin tone color (they argue, they experiment, and they debate over infinite combination of the three primary colors, red, blue and yellow. And come to think of it, there’s no green either so landscape painters do likewise when it comes to representing green). Just because life gives us a perfect model doesn’t mean it gives us a perfect way of pragmatically representing it. (HEY . . . I got some philosophy in here after all.) The second problem is that even if we could identify the perfect factors, we still can;t have perfect inputs because we don;t know what the future will be.

So how do we address this infinite puzzle?

I suppose you can use probability along the lines you suggested; i.e. not blindly. Also, as you assess the usefulness of the statistics, keep in mind the nature of the sample.

Marc,

Here is a link that is pretty good about the roulette Ed Thorp Roulette

I’m reading the link and will correct this if my memory is not correct. Note he did this with Shannon who–as I understand–was big in information theory. Perhaps, most immediately recognized for error correction and file compression. Supposedly essential for our present cellphone calls and hard-drives. More importantly, we could not have copied all of those Napster Files without him.

The link says it gave him a 44% edge. He tapped his toes in sink with the ball/wheel rotation to get the speed: there were sensor in his shoes. A wire went to a computer on his belt with 12 transistors. It generated a tone that was transmitter by a thin wire–that kept breaking–to an earphone. A tone gave him a range of likely numbers. He could do all of this, interpret the tone and place the bet before bets were stopped.

Devices like this are now strictly illegal in casinos.

But his story regarding options and investing is just incredible. Have you met him? His hedge fund, Princeton/Newport was targeted by Rudy Guiliani under the Ricco Act.

As far as philosophy, I am in your camp. Until they develop a quantum roulette wheel, the number that comes up is deterministic.

As far as what happens with a quantum roulette wheel, I find all of the theories to be a little sketchy. Multiple universes being quite popular. There is no reality, just a model of reality, a good fallback. I could be wrong on this theory: is there an actual wave associated with the quantum wave equation? If I really understood this I would have my physics degree. The mother of all curve-fitting, Fourier Transforms, did me in. So I wish I were better at curve fitting than I am.

I will tell you the truth. All of these ridiculous unproven physics theories are giving me “Finance Envy.” :wink:

I do not think I have any disagreements regarding the usefulness of DDM and I appreciate your teaching it and finance in general.

Regards,

Jim

@#$^!&

That means I have to keep working on those models. :frowning:

Devices are only illegal if you get caught :slight_smile: The trick is to invite a different friend along for every visit to the casino. Your friend leaves with big winnings while you go home a “loser”. The casino will happily invite you back but your friend will be persona non grata. Just make sure your friend is truly a friend and willing to split the dough.

Steve

Steve,

Have you ever counted cards? Sounds like it. I know you have the skills.

I’m not that good at keeping the count. But we used to make regular road trips to Lake Tahoe in Nevada when I was studying physics at U.C. Berkeley. My now math professor friend was passable at counting.

BTW, that is were I first learned that you can’t cut your bets when you are behind. You have to keep increasing your bets when the count is with you: behind or not. To do otherwise is to ultimately give all of your money to the house. For investing, don’t get out at the bottom.