Statistical references for analyzing stock data

Folks,

As probably many of you know, white papers on equity strategies often analyze a strategy at a statistical level, including but not limited to regressions, t-stats, intercepts, fixed time effects, controlling for one variable over another, etc. These are not tools included in the P123 suite, as far as I can tell.

There are an unlimited number of statistics texts out there, but it would be nice to see statistics explained specifically when applied to stock research and historical returns.

If anyone knows of a great statistics resource (book/website/etc) in the context of historical stock data and stock strategies, I would be very happy to know.

Thanks in advance!

Ryan

Hi Ryan,

I actually think this is 95% of what people need to know. It is simple and related to stocks: https://www.stat.berkeley.edu/~aldous/157/Papers/harvey.pdf One can also use the information ratio rather than the Sharpe Ratio. This is a simple way to convert either of these (along with the holding period) to a t-statistic.

But you seem very sophisticated.

I believe a more comprehensive approach for investing can be found in this book: https://www.amazon.com/Advances-Financial-Machine-Learning-Marcos-ebook/dp/B079KLDW21/ref=sr_1_3?crid=2W4G7AD41DAM5&keywords=advances+in+machine+learning&qid=1562952861&s=digital-text&sprefix=advances+in+machine%2Caps%2C177&sr=1-3

This focuses on the best methods for cross-validation and backtesting. IMHO, once you do the cross-validation or hold-out period testing right the particular statistic that you use is usually not so important. Any of the statistics should tell the same story.

The book sells for about $25 and has a lot of additional information.

As you say there are a lot of papers that use these statistics and you seem like you are already familiar with these. I will not try to point my favorites.

I am afraid this may not help much. But I will look for (think about) other sources.

-Jim

I haven’t read it yet, but Chapter 6 of Alphanomics by Charles Lee and Eric So is called Research Methodology: Predictability in Asset Returns, and it looks great; Lee is a big fan of Portfolio123. I hope to read it in the next few weeks.

Jim suggests the work of Marcos Lopez de Prado, whom I also admire. A very brief paper of his is worth reading before you get into the field of financial statistics: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2819847 . . . You can find the rest of his work, some of which goes way over my head, here: http://www.quantresearch.org/ . . .

Osam.com has published a number of white papers that I find very interesting. See https://osam.com/Philosophy-and-Process and https://osam.com/Commentary . . . They do use some of the statistical tools that you mention, but it’s not a real focus of theirs.

I haven’t read more than half of the papers here - https://hurricanecapital.wordpress.com/2015/02/01/links-michael-j-mauboussin/ - but what I have read has been extremely interesting. Mauboussin’s books are also very enjoyable reads.

None of these specifically address your question, I’m afraid, but they are all interesting reads, and may well complement any more focused statistical books you find.

A lot of people bash mainstream finance, but they still teach Fama-Macbeth (1973) to B-School students around the world. It’s like 1000 lb gorilla in the room finance. No one wants to talk about, but it’s still there.

At it’s essence, Fama-Macbeth extends the two-factor CAPM (i.e., multiple linear regression) to account for an arbitrary number of betas.

Nearly every well-cited journal article on pricing anomalies published since Fama-French (1992 and 1993) use some version of the Fama-Macbeth regression.

To really understand the canon of asset pricing/return models since the early 1990s, I feel one must first understand Fama-MacBeth.

https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.632.511

This is separate from David’s post. David’s post is short and I do not fully understand what he does and does not accept about the Fama-French models.

But generally there seems to be a LARGE logical inconsistency and misunderstanding in the form.

  1. We often see people saying they do not believe that statistics works while they praise the use of the Fama-French model. EVEN IN THE SAME PARAGRAPH!!! But Fama and French use statistics.

HAS ANYONE REALLY LOOKED AT THE FAMA-FRENCH MODEL MATH?

2) Their “LINEAR REGRESSIONS” did not assume a continuous linear function.

FAMA and French used index variables: more akin to a t-test (or ANOVA for multiple index variables). Two points do determine a line so it was linear in that regard. But the x-variable was not a continuous variable in their models (it was an index variable). There was never (to my reading) an assumption that the continuous function was linear—for good reason.

Modern finance statistics skip the index variables and go right to the t-test.

For some reason, P123 members (often) accept the Fama-French papers (with index variables) and not the equivalent (modern) t-test data. There is no (rational) way to resolve this inconsistency.

-Jim

I agree with David that you have to understand Fama-MacBeth to understand the canon of asset pricing/return models since the early 1990s. But the paper is quite difficult and somewhat out of date. I suggest a later paper that may be a better introduction: “The Capital Asset Pricing Model: Theory and Evidence” by Eugene F. Fama and Kenneth R. French. You can find it here:

https://pubs.aeaweb.org/doi/pdfplus/10.1257/0895330042162430

In this lucid and extremely cogent paper, Fama and French point out a number of serious problems with the capital asset pricing model. These problems began to be discovered in the early 1970s, but they weren’t as widely recognized by the time of Fama and MacBeth’s paper as they are now. As they write, “The intercepts in time-series regressions of excess asset returns on the excess market return are positive for assets with low betas and negative for assets with high betas.” In other words, low-beta portfolios tend to outperform high-beta portfolios, which is the opposite of what the CAPM suggests. (Fama and French had a memorable argument with William Sharpe about this in 1973 that got pretty heated.) A number of papers published in the early and mid 1970s empirically showed that this was the case across all asset classes. (I was able to come up with a mathematical proof that shows that it’s inherent in the definitions of alpha and beta that they will have an inverse correlation. I published this proof a year ago on Seeking Alpha: Why Low Beta Outperforms | Seeking Alpha Unfortunately, the idea behind this mathematical proof had not occurred to the economic theorists who were puzzling over this paradox.)

As Fama and French point out, by the 1990s, there were two basic reactions to the failure of CAPM. One was to explain its failure by offering behavioral reasons. The other was to offer a much more complicated model, based on multiple linear regressions and numerous risk premia. The latter approach is the one taken by Fama and French in the 1990s, when they offered a three-factor model for expected returns. (Unfortunately, that approach may be subject to some of the same mathematical problems that I pointed out in my paper last year. Returns can only be positively correlated with risk premia [betas] when the returns are more likely to be negative than positive.)

Fama and French conclude, “despite its seductive simplicity, the CAPM’s empirical problems probably invalidate its use in applications.” Perhaps that “seductive simplicity” is at the root of the problem. The “simplicity” is based on a mathematical simplification. The equation for a linear regression line is as follows: expected return = alpha plus beta times market return. From this it SEEMS that the higher the beta, the higher the expected return. However, if you take TWO assets with different alphas and betas but the same market return, the expected return from the one with HIGHER beta will be more likely to be LOWER than the other, as I showed, if the market return is the average of the two returns and it’s more likely to be positive than negative. One can extrapolate from that to get a nice mathematical explanation for the low-beta paradox, which is a serious problem that any theory dependent on risk-return correlations is going to encounter.

Thanks Jim, David and Yuval for the feedback. I spent some of weekend reading the Harvey & Liu paper, and another by de Prado on the “Probability of Back Testing Overfitting”. Very heady stuff, particularly the latter. I will strive to get through it though, as the risk of overfitting is quite possibly the largest risk in strategy design. They even discuss that using a “hold out period” (what several P123 users do by splitting the test universe into evenid=0 and evenid=1) as a not very robust testing method as the researcher already knows how the strategy generally behaves. I am still digesting, but interesting thoughts (I have a background in engineering, but this math is something else!). I believe Jim has posted about this previously.

On the Harvey/Liu paper, the whole idea of multiple tests (or optimizing) reducing the true value of the Sharpe ratio is an eye opener. Again, I’m still digesting and hope and provide a summary in the future.

I am familiar with Fama-French, but admittedly I’ve never actually read it. Until now I didn’t know there was a Fama-Macbeth partnership! Thanks David!

Ultimately what I am looking for is a consolidated resource for how to effectively assess a strategy. While P123 offers several metrics for measuring past performance, what it does not offer is tools to help determine how over-optimized or robust the testing method or design is. Perhaps this is experience (which I’ve been at for a couple of years now), but having established, statistical rules on this would be an important resource for anyone designing strategies. It is too easy to backtest a 50% per year strategy, however it is not easy to assess just how robust the strategy is, or the method to find the strategy, to be.

Thanks again all.

Thank you!!!

Please let us know what you learn as you research this.

-Jim

The most helpful thing I ever read about this was the following: The Power of Back Testing Investment Strategies | What Works on Wall Street

I think some P123 members will not like the above linked article. But P123’s main ideas are based on James O’Shaughnessy’s work.

Another resource I found very helpful was this thread: https://www.portfolio123.com/mvnforum/viewthread_thread,6939

I’ve written a guide to robust backtesting which I think some people here will disagree strongly with. It reflects my OWN opinion and not that of P123. You can read it here: How To Backtest For Superior Out-Of-Sample Results | Seeking Alpha

The following article I wrote will give you another viewpoint on overfitting: The 2 Types Of Investing Or Trading Errors | Seeking Alpha Once again, I think some P123 users will disagree with it strongly.