How many R2G's have INCORRECT PERFORMANCE Stats?

How do we know which R2G’s have been affected by P123 data “fixes”? From my understanding performance stats are “locked” and may not be reflecting accurate information after a data fix has been made. Please refer to this thread… http://www.portfolio123.com/mvnforum/viewthread_thread,8135_offset,0 . I don’t understand how inaccurate stats can be useful. How do we know how many data fixes there have been over time and which models they affect and to what degree? Per this comment… http://screencast.com/t/Sb4aUJUmd can we get some feedback from P123 as to how and when this can be addressed?

Rerunning a simulation of a launched r2g when there are subs will likely change the current holdings creating a lot of confusion/turnover for subs following the model.

Several designer take it upon themselves to rerun a simulation when fixes are announced, and report their findings in the model description.

As a customer you can/should ask the designer to run a current sim.

But in the end, out-of-sample is what matters and time will sort it out.

What we can say is that usually the fixes are not material to sims.

There is no easy solution and we’re open to suggestions. We could, for example, periodically run sims for every r2g and report the stats in a separate tab called ‘latest sim’

Thanks

Developers should be required to re-run simulations of their R2Gs on at least a monthly basis and report. I see a lot of R2Gs that have not been looked at for over 30 days. I have a feeling many developers just put them on set and forget. We all know things are changing over time and it is un-nerving to believe the model is not being tested regularly. This should include all the free ones.

@RJJ, what’s the point? simulation’s stats are useless for a customer. They are only useful in context, as a feedback process for the developer.

I’d like to be able to change the benchmark on existing R2G though. Many models including mine have the S&P 500 as a benchmark when we should have the Russell 2000. This is an oversight that would be good to eventually correct.

Especially so that out of sample alpha and beta can be standardized throughout R2G.

David,

I think there have been issues in the past with “last viewed”-stats. And even if designers have viewed it, it doesn’t mean they have done so for re-evaluating the model. Rather, they might review their respective port (that one which models the R2G) much more frequently and try to improve it that way. But in general I agree that more interaction between the designer and the community/sub might help to keep models cutting edge. In the end it will be like with most mutual funds, it’s a trade-off between transparency for building trust and secrecy to keep your “model ingrediants” private. OOS data will sort the wheat from the chaff.

Aurélien,
what rules do you propose to check for the benchmark? Thankfully we can roam free to build our own custom universes. I think it would hard to find the perfect match and it costs lots of time to examine the model. Maybe it would be easier to introduce a mandatory default benchmark, like the S&P 500. Fair enough, the choice will often not match the model, but at least it’s as objective as it gets.

Best,
fips

A few comment about re-running R2G Ports.

There are other reasons to pick a benchmark besides the classical comparison of a mutual fund or ETF to it’s similar index. Some of my Ports use the S&P 500 as the benchmark because it is used by the market timing rules. If I change the benchmark to reflect more closely the range of stocks the Port buys, it makes a BIG difference in the timing dates that I don’t want.

Also, some members want to compare Ports to the S&P 500 since it is viewed as the safer index for buy and hold comparisons.

Currently, a R2G designer can’t make any changes to a Port without “revising” it. That includes simply re-running it with no changes. If we do, it shows up as a revision. How do we allow re-running a Port and assure the designer didn’t “change” it?

Do we re-run the out-of-sample data also? Over time there will be data changes in that also. I see that opening a can of worms.

And how do we re-run the multiple revisions starting at the proper revision dates using the Sim that created the revision while incorporating them properly with the original pre-launch data and later revisions? Did the designer even keep a copy of all intermediate revision Sims? I see that as a bigger can of worms.

We must be careful what we ask for and/or require of the designers!

@ Denny I used to use close(0,#bench) for volatility models but switched to close(0,getseries(“$sp500”)) for exactly that reason.

Marco, I think your idea of “periodically run sims for every r2g and report the stats in a separate tab called ‘latest sim’” is a great idea. As a subscriber to P123 I would “expect” that to be done if changes are made in the background. I’m still perplexed as to why stats are locked and not updated with correct info to begin with. I don’t know how you address a change in recommended holdings if sims are re-ran but the subs have a right to know the accurate data. If I’m holding XYZ corp and come to find out it didn’t meet the rules based on “fixed” data it is not ethical to not disclose that info to me.

As for waiting for out of sample data to sort things out…??? Folks subscribe to R2G’s with the trust the info they based their decision on to do so is accurate. If it turns out to not be accurate isn’t that a huge liability to not disclose and update the new findings? Waiting until they lose a lot of money to only find out there was a “fix” in the data that affected the sim is not ethical. We appreciate your forth coming in fixing things as needed in the backdrops. We just need peace of mind that data that is put in front of us is the updated and accurate AFTER those fixes.

I do not think that putting the ball in the sub’s court to “ask” the developer to re-run the sims is appropriate. Most subs do not even know there is reason to do so. If P123 makes changes that affect R2G’s it should be P123 responsibility to disclose correct performance stats of all models at the same time. This needs to be done asap.

Marco, I have trust that you will get new stats disclosed soon. Thank you in advance!

The title of this post is leading and misdirecting. The ports’ performances are NOT INCORRECT. Rather, they are out of sample. While it would be nice to compare the pro-forma simulation versus the out of sample, what was traded in the past needs to be written in stone.

I agree with Primus. From the beginning, R2G Ports were to be written in stone. I see an even bigger problem with rerunning the Sims than the ones I mentioned above. Many of them will inevitably end up holding different stocks. What will happen is a data item that was fixed 10 or more years ago will result in buying a different stock which in a few years will snowball into all different stocks bought at different times making different performance. That happens ALL the time I rerun Sims a year later.

We still don’t have the actual open prices for stocks from 1999 to 08/27/2004. Hopefully, Marco will up date that some time in the future as he promised. What will that do to R2G Sim reruns?

Are those of you wanting to rerun the Sims willing to sell all current holdings, buy new stocks, and explain to your subs why you screwed up their portfolio? I don’t think there is enough reflection in this thread on all the problem that rerunning the Sims will cause. This WILL be a can of worms.

As I said, be careful of what you ask for!

It is very simple and Marco’s idea makes perfect sense. Since we have the launch date and therefore the separation between simulation and out of sample, we could:

-Re-Run the Sim up until the launch date. At that point, all positions are sold, and the stats shown are only the out of sample ones, which should be written in stone.

This way there are no problems of changing holdings, etc and the sim is more credible than it was before.

This is exactly why only out-of-sample data only should be presented.
Backtest performance is a BEST case scenario. It significantly underestimates risks. We would need to present all backtests that were ever run to develop a strategy to give a good picture. And even then there would be the risk of regime change, whereas what worked in the past don’t work in the future anymore, which is not represented either in backtest performance.
Out-of-sample performance is the only that accounts for ALL risks, period.

Out of sample performance can only start at the date after a series is corrected, before the fix date OOS performance is not correct.

When “fixes” are made to historic data series performance usually deteriorates, because the model’s algorithm was optimized for the data as it existed when the model was designed. For example my Best(SPY-SH) model’s trading rules use volatility, risk premium, earnings estimates together with moving average cross-overs. Earnings estimates of the S&P500 were recently “fixed” reducing CAGR for the model and may also influence future performance. Here are the risk measurements for the period 1/2/2000 to 8/30/2013 before and after “fix”. The model’s documentation with performance history to 8/30/2013 can be found here: http://imarketsignals.com/wp-content/uploads/2013/09/spy-sh-r1-tbl3.png

Benchmark SPY return and max draw-down is the same. Standard Deviation for SPY should also be the same, but it is not. P123 must have recently changed method of how Standard Deviation is calculated, but I don’t recall an announcement to this effect.


Best(SPY-SH) before fix.png


Best(SPY-SH) after fix.png

Maybe we need more color coded flags on the equity curve, like the current “designer revision” flag, but for “database revision,” and “factor revision,” at the least for major revisions that heavily impact the equity curves.

I think it’s time for a redesign.

The current implementation is too restrictive because an R2G is one sytem: it’s a simulation that is converted to a live port. This makes rebuilding the simulated part impossible.

We’ll change the R2G’s to be two systems: a live port and a simulation. This way we can show separate stats, easily re-run just the simulated portion, present the data differently, etc.

We’ll have more details soon and get started on this now.

Thank

Thank you very much Marco!

Marco - if you are going to do this then please don’t allow simulation stats to be integrated with OOS as they are now and don’t allow backtest data to be sorted as one would performance data. Otherwise you will have created a simulation paradise where you just put up a new improved simulation when the system performs poorly. We would have “simulation showboating” in perpetuity. Not what you want.

I see nothing wrong with the way mutual funds and ETFs do it, where they create a prospectus with a theoretical index but the data is not integrated with OOS performance. The OOS has to be standalone.

Steve

While we are on this subject of restructuring rules for R2G models I would like to draw attention to the possibility that having a constant selection process for a universe may not be desirable. I was reading how MSCI constructs their USA Minimum Volatility Index. http://www.msci.com/resources/factsheets/MSCI_USA_Min_Vol_Factsheet.pdf

The MSCI Minimum Volatility Indices are designed to provide the lowest return variance for a given covariance matrix of stock returns. Volatility Index is calculated using Barra Optimizer to optimize a given MSCI parent index for the lowest absolute volatility with a certain set of constraints. These constraints help maintain index replicability and investability and include index turnover limits, for example, along with minimum and maximum constituent, sector and/or country weights relative to the parent index. Each Minimum Volatility Index is rebalanced (or is re-optimized) semi-annually in May and November.

That means the parameters for the index are changed twice a year. Hence, backtesting over a 15 year period would imply changing the selection process for the universe 30 times. I have been running a Best12(USMV)-Trader model live on my website iMarketSignals.com since the beginning of July using the stockholdings of USMV as universe and updating the universe every 3 months. So far with great success. The model is up about 22%, whereas SPY only gained 6.7% over this period. I would love to offer this model as a R2G, but it would not qualify under the existing rules. Perhaps P123 could provide a section for specialty R2G models such as this one, where the backtest period is short, but the methodology is spelled out for subscribers.


Geov, brings a point worth reflecting about.

Does R2G aims at competing with Collective 2?

Because there are strategies that we can’t run on P123. Some data is not there or some is too recent. Market timing using options data for example.
There is a market for these, it might as well happen here. But not as long as R2G is marketed as backtested P123 strategies only.

Could a R2G model ever be made of a simple live portfolio where all trades would be entered manually by the author based on his own strategies, manual or automated?

We have “foundation, investor, trader, ETF” sections. Why not an “unregulated” section, that would be constituted of live portfolios with manual inputs. Who knows, half the people from Collective2 might flock in. That is if you stick to the 80-20.

I think an unregulated section would be a good idea. People can then watch the performance for several month before subscribing. Also P123 could charge a fee upfront for unregulated R2Gs, instead of the 20% cut. Having to pay a launching fee would ensure that only designers who are confident of their models would submit them. Currently one can put anything up in the hope of someone subscribing, because it cost nothing.