SmartAlpha simulated & out of sample debate

marco · November 12, 2015, 2:22pm

Dear All,

I wanted to isolate the hot topic of discussion from the rest: the removal of backtest data from Smart Alpha/R2G. I think the current implementation achieves our main goals: make designers focus of the future robustness (not past performance), and avoid giving false expectations to subscribers.

However, I can see a lot of value in comparing performance after launch, and prior. Maybe something along these lines will be better than what we have now:

Show two tabs in the chart: “Since Launch” and “Simulation”
Simulation chart will only available if model has been launched for at least 9 months ; this way a more meaningful comparison is possible
The Simulation chart will look like the old R2G , with the little ‘L’ circle indicating the launch date
Below the Simulation chart we could even add a table of statistics for launch and simulated , like: sharpe, alpha , etc.
Simulation results cannot be searched, screened, downloaded, etc, just like it is now.
A big WARNING label in the simulated section

I think this is better, but I haven’t asked Marc yet

WERNER · November 12, 2015, 2:27pm

Sounds very good!
Werner

dweiss · November 12, 2015, 3:32pm

Marco, I am interpreting your proposal as follows… please clarify where needed.

Default view will be the ‘since launch’ chart we now see on Smart Alpha along with all the other information.

Secondary tab:

Will contain a chart comprising backtest data and live performance similar to the previous R2G chart with “L” circle designating the launch date.

Request: Add “R” circles designating any revisions
Request: Allow for date range input similar to previous R2G chart

Only available for models with launch date > 9 months.
Data analytics for Backtest period vs. Since Launch period

Request: Make this a comprehensive set of data that not only includes alpha, sortino, etc. but includes things such as…
• Win %
• Win Return %
• Loss Return %
• Max DD
• Largest single position loss %
• Largest single position gain %
• Median Position Gain %
• Median Position Loss %

No search, download, screen capability on Secondary tab information
Warning / Disclaimer labels

I think this Secondary tab comparison will go a long way in supporting your two main stated goals. It shines a focused light on the robustness of the model. Additionally, as a consumer of the models I can discern between the “Backtest” and “Since Launch” periods and determine for myself whether the launch performance is within my tolerance given current market conditions and it’s backtested history.

While I would like to download all the model secondary tab data into Excel to create historical data snapshots over time, I could live without it given that I at least have the data available.

mgerstein · November 12, 2015, 3:35pm

By now, we discussed. Everybody knows my feelings so there’s no point in me cluttering up this particular thread. Marco and I continue to discuss off line as this emerges.

marco · November 12, 2015, 4:24pm

Here’s a picture (always helps)

The Simulation Tab shows both performances, since launch and simulated, but with a twist: the Model and Benchmark are both set to 100 on Launch Date. This chart should immediately tell you something about the model post launch. In this case:

Model was outperforming prior to 2008
2-3 years before launch, model was instep with benchmark
After launch out-performance resumed

I like it.

portfolio123 · November 12, 2015, 4:47pm

Here’s a picture of the simulated & after launch performance of a curve-fitted model.

Benchmark & Model both set to 100 on Launch Date.

If this doesn’t raise flags I don’t know what will.

.

shsunbarot · November 12, 2015, 4:48pm

I like having the focus on OOS performance. You can see simulated performance by adding a model to a book and running it. If you add simulated results to SmartAlpha, I would like it to have very clear warnings (“this performance is simulated and MAY be based on an over-optimized methodology designed to produce high alpha and consequently may not be an indication of future performance”) - not sure exactly how to word this since since some models are not over-optimized. Would be nice if the optimization could somehow be restricted but I realize that would be near-impossible.

chris355 · November 12, 2015, 5:01pm

Marco,
I think this is a very reasonable compromise. I agree with shsunbarot that simulated results should be de-emphasized and include a disclaimer.

Also, I stated in the other post, I believe Smart Alpha was the right move. You’re going to take some heat from the usual suspects. But that doesn’t diminish that this change was made for the right reasons and that those right reasons for good for everyone in the long run… even if they can’t see it yet.

dweiss · November 12, 2015, 5:50pm

Marco,

Your chart examples are great and better than the previous chart version on R2G. I would still recommend putting in the “R” circle to denote when in time a revision was made but I could live with it as you’ve shown.

For the record, I agree with the direction P123 is heading by focusing on OOS performance. As a consumer, if all the models had 20+ years of OOS performance across many different types of markets I wouldn’t care about the backtest data. But all these models are still too young. By removing the IS backtest data this early, it feels we’re being asked to take a leap of faith on 1 or 2 years of OOS performance without having any idea how the model will or should perform in different market environments.

David

stumo · November 12, 2015, 5:59pm

Marco,

The question shouldn’t be should P123 limit customer exposure of data in order to reduce risk, but it should be how can P123 expose system weakness by running additional tests on systems in order to help customers understand, and quantify risk? Customers have very limited visibility into how each system is constructed, so we really do need additional data to help us understand risk.

Here again I say create a set of standardized tests that will help expose risk. Hire experienced quant expertise to help you here. Use Monte-Carlo sims, and other techniques currently used by some system authors to help quantify robustness. Some system authors have been using their own robustness tests, but this is a disorganized, non standard approach, done at the discretion of the author, and these tests are not double checked by P123 for accuracy. Here, you can help by creating standardized tests, run them, and report this data for each system. This will add new real value to SmartAlpha.

Regards,
Stu

DennyHalwes · November 12, 2015, 6:13pm

Debbie,

I tried that but it only works on Ports that we are subscribed to. All the others don’t show up in the Port selection window.

InspectorSector · November 12, 2015, 6:17pm

Here is an excerpt from a document originating from the Ontario Securities Commission (OSC). Although this organization is Canadian, I can assure everyone that the same policies will apply to the SEC. Canada never does this sort of thing in isolation. This guidance suggests that even model out of sample performance requires care in how it is presented and perhaps requires a client questionaire to be filled out to ascertain experience before viewing the data. It also suggests that only real performance data (i.e. trade data) should be presented to a general audience.

https://www.osc.gov.on.ca/en/SecuritiesLaw_csa_20110705_31-325_marketing-practices.htm

CSA STAFF NOTICE 31-325 – MARKETING PRACTICES OF PORTFOLIO MANAGERS

PURPOSE

Staff in various provinces from the Canadian Securities Administrators (CSA staff or we) conducted a focused compliance review (the review) of the marketing practices of firms registered as portfolio managers (PMs). This notice summarizes our findings from the review and provides guidance to portfolio managers on suggested practices in the preparation, review and use of marketing materials…

1. Preparation and use of hypothetical performance data

Hypothetical performance data is performance data that is not the performance of actual client portfolios. It is sometimes referred to as “simulated” or “theoretical” performance data and typically consists of either:

• back-tested performance data (i.e. past period), or

• model performance data (i.e. real time or future periods)

Hypothetical performance data also includes statistics such as standard deviation and Sharpe ratios, which are measures of volatility. Some of the PMs we reviewed presented the hypothetical performance data for the primary purpose of attracting new clients.

Concerns

Approximately 20% of the PMs we reviewed had deficiencies with the hypothetical performance data they presented to investors. We identified the following general concerns related to the use of hypothetical performance data:

• many investors may not have sophisticated investment knowledge sufficient to fully understand the inherent risks and limitations of this data

• any outcome may be achieved as the performance data is produced with the benefit of hindsight and is subject to potential manipulation

• the data is often combined or linked with actual client performance data, which may give the appearance of a longer track record and that the information is based entirely on actual client performance

• there is inadequate disclosure regarding the methodology and assumptions used by the PM in calculating the data

• PMs can take increased risks with the creation of hypothetical portfolios as they do not have to manage these portfolios in real market conditions

• it is difficult to verify the calculation of hypothetical performance data

Guidance

We expect PMs to market their actual client performance results. However, if a PM presents hypothetical performance data, considering the factors described above, we typically expect the following practices to be applied:

• ascertaining an investor’s level of investment knowledge sophistication, as part of the PM’s obligation to obtain KYC information and assess suitability, prior to the presentation of hypothetical performance data

• restricting the presentation to investors known to have sophisticated investment knowledge (i.e. not widely disseminating the presentation on a website or in an advertisement)

• labelling the presentation as “hypothetical” in a clear and prominent manner

• not linking the hypothetical performance data with actual performance returns of the PM. We expect hypothetical performance data to be presented separately from actual client performance data

• including clear and meaningful disclosure regarding the methodology and assumptions used to calculate the performance data, and any other relevant factors, and

• disclosing clearly a description of the inherent risks and limitations of the hypothetical performance data

tkp · November 12, 2015, 9:48pm

Marco, thank you for this brave step!

While I welcome the proposals you have made I can see most of worries were expressed specifically on simulation chart presentation and I agree and accept reasonable proposals on the thread.

However, as you acknowledged, the most value lies in comparison and chart is of little use here. I would like to emphasize that IS vs. OOS comparison stats is not manageable by designers in any way because no one knows the future so I see no reason and possibility for designers to rush for fitting here. In comparison we will mostly target consistency with a reasonable deviation. Even if these stats can be somehow managed by designers this will be the most valuable thing they can do with back-testing - namely create robust model. More, I see such statistics as the best way to address worries the Steve outlined above.

That is why I do not understand reason for ANY limitation on comparison statistics:

[quote]
…

Below the Simulation chart we could even add a table of statistics for launch and simulated , like: sharpe, alpha , etc.
Simulation results cannot be searched, screened, downloaded, etc, just like it is now.
…
[/quote]Can you explain the nature of these limitations?

I might be subjective here, but:

the table of statistic is much more important than the chart, can we see graphically what you mean?
will there be comparison and how will be managed possible reasonable deviation, no one expect IS vs. OOS to be 1:1?
what is the reason not to sort, filter etc. models by comparison stats, I see this as the most valuable measure of trust in all other metrics?
and why not to put into download, what is the point to show stats and then artificially limit it’s usage?

tkp · November 12, 2015, 10:32pm

And as for robustness test and IS vs. OOS chart, I thought about something like attached.

I would probably invest in a model like this if decided solely by robustness test, having in mind no same market conditions exist and all models sometimes under-perform. At least I would appreciate such a disclosure greatly.

InspectorSector · November 12, 2015, 10:34pm

As I see it, in order for Portfolio123 to be squeaky clean they should only present equity curves to the general public from data generated through the trade module. These could be made readily available and it would be an advantage for model designers to use Trade to invest in their own portfolios. I don’t say this with a light heart because that will eliminate me from providing such models. (I am not making this argument for my own benefit.)

The models available for public viewing / mass marketing cannot have simulated data associated with them. I suggest that P123 make available a Tab where designers can attach a PDF “prospectus”. The prospectus is the designer’s opportunity to “market” his model. This allows him/her to express whatever backtest data he/she feels is important, the system rules, etc. The designer has to be in charge of this, not P123. This is because there is no mathematics that can prove or disprove anything, P123 cannot be involved in this. In order for designers to be prosperous they need to sell their model to prospective subscribers.

Before the general viewer can access the prospectus, they are required to answer a question regarding level of investment experience. P123 would also provide wording to the reader that detaches the company from any liability incurred by the prospectus. The responsibility for the contents of the prospectus falls on the designer. The PDF file would be attached at time of launch and this insures that live performance is not as an extension of the backtest and thus there is no implication of performance from simulated data.

As for internal use (not for public viewing), anyone wanting to see anything other than trade data needs to go through a similar process of answering a questionaire in order to access hypothetical data. What P123 does from there is P123 business. But I still suggest keeping hypothetical performance separate from hypothetical backtest

Take care everyone
Steve

hengfu · November 13, 2015, 2:22am

(1) Showing back-test results would eventually run into trouble with US regulators, and we don’t want it to happen to P123.
Here is a recent case using false backtested result, and the $20B+ RIA bankrupted a few months after SEC action:
http://www.sec.gov/litigation/admin/2014/ia-3988.pdf

(2) Here is SEC guideline on performance claims:
https://www.sec.gov/divisions/investment/advoverview.htm

SEC staff has indicated that it may view performance data to be misleading if it:

does not disclose prominently that the results portrayed relate only to a select group of the adviser’s clients, the basis on which the selection was made, and the effect of this practice on the results portrayed, if material;
does not reflect the deduction of advisory fees, brokerage or other commissions, and any other expenses that accounts would have or actually paid;

(3) From subscribers’ point of view, even the new OOS sample performance of SmartAlpha is not enough to compare models, because (a) it does not deduct subscription fees in performance calculation; (b) many models use price-only as benchmark instead of total return benchmark. Why bother to confuse subscribers with unrealistic back-tested data?

tkp · November 13, 2015, 9:07am

hengfu, very true, two can play this quasi-legal game. I highly encourage P123 to adjust performance data including subscription fees, trading commissions, slippage etc. on a subscriber-based preferences or at least on the best-effort basis. This can be a win-win practice, for example I would agree to share my theoretical assumption of investable sum into model I am interested in with designer. I would like to see my personally customized model performance stats including subscription fee drag and designer can gather statistics to adjust model pricing and volume accordingly.

Am I missing something: since when P123 provides investment services instead of subscription services to be enforced by SEC advice in any way? If this is a best-effort P123 and designers practice to protect investors and comply to SEC, then lets start from disclosure (exposing the data to investor in a best possible way) instead of disclaiming (hiding data from investors because it can be misleading)?! Let’s make a step further, not a bad data presentation ruin capital, the bad data producer (models) do.

InspectorSector · November 13, 2015, 1:03pm

Konstantin - you seem to be missing the point.

judgetrade · November 13, 2015, 1:28pm

hengfu: they “sold” backting data as OOS, and they cheated with the signals, that got them in trouble…

As I pointed out, there is no way p123 is going to make any decision without much dislike of the user community (one or the other side).

Anyway, how it is it is fine, Modell designers can provide in sample data in documentation.

What we also could do is to create (the Modell designer) a standard robustnes test (e.g. like best practise) and attach that (in free will)
to every r2g.

E.g. every r2g could be tested with:

evenid = 0 and evenid =1
With no sell and buy rules
With sell and buy rules switched on pair by pair
with 5 10 20 stocks
with trade on open, trade on close, between close and open
with and without timing etc.

etc.

We could then build one standard framework document and all r2g providers that want to follow this self created best practise
way to robustnes test could create it per r2g.? It would be a USP for Modell designers and (some, nothing
is perfect) protection for the subscribers.

Interested?

Regards

Andreas

P. S. Konstantin: there is now way p123 or a Modell designer can protect anyone.

marco · November 13, 2015, 2:04pm

We tried all sorts of things, rolling tests, odd/even, etc. It’s a wild goose chase to try to “prove” robustness, and dangerous. With 16 years of data it’s just not possible. The robustness of a system at this point comes from qualitative measures: designer experience, age, honesty, etc. The only thing backtest data can be used for is for comparison with out of sample, and the longer it is the more meaningful the comparison.

We, P123, either show what the designer simulated before launch, or we don’t. Simple. Showing backtest results after 9 months (or more) from launch is still the only improvement we can make, IMHO. I think it’s better than anything else to show quality.