There is survivorship bias.
No, there isn’t. A “bias” is, here, the tendency of a given statistic measured on a “random” sample to differ from that statistic measured on the population. Suppose the question were “How well did professional fund managers perform over the last 10 years?” The population is “All of the professional fund managers who managed funds at any time during the last 10 years”. However, the data that is readily available for manager performance only includes managers/funds that are around today. The funds/managers that had a losing year sometime in the last decade were fired and replaced. So a sample of this readily available data will have a “survivorship bias” such that statistics measured on samples will not trend towards the statistics of the true population as the number/size of samples increases. The performance of other systems you have built and discarded are not part of the population whose average you are estimating, thus no survivorship bias.
Most statistical tests have biases. For example, suppose I have 1000 people who are normally distributed for height. Those 1000 people are the population I’m interested in. I can measure the standard deviation of the whole population using the equation SQRT( SUM_FOR_ALL_Y( (y - ybar)^2 ) / NUM_Y). Or I could estimate the population’s SD by taking a randomly sample of 10 people from the population and measuring the standard deviation of their heights. However, I have to modify my equation for standard deviation because otherwise it will have a bias, it will under/over-estimate the population’s standard deviation. So I use: SQRT( SUM_FOR_ALL_Y( (y - ybar)^2 ) / (NUM_Y-1)) or SQRT( SUM_FOR_ALL_Y( (y - ybar)^2 ) / (NUM_Y-1.5)) depending upon the application. Excel provides STDEV and STDEVP for standard deviations of a sample vs population.
I continue to think what is meant by “population.” Notice I did not really define it and I have yet to see it well defined in any books or on the net.
Continuing the thought, yes, defining “population” has to be the starting point and is where your analysis got off track. The math that you are currently doing assumes that consecutive weeks are independent, but they are not, they are dependent in two ways: 1) the performance of a week correlates to some extent with the performance of the prior week (the relative success of any price based technical indicator tells you this); 2) the performance of the portfolio over those weeks is the product of its weekly returns, not the sum. So asking about “average” weekly return returns a meaningless answer. You could reasonably ask about the geometric average (which is related to what p123 calls Annualized Return), but that still doesn’t get around the interdependence of consecutive weeks.
Perhaps more helpfully, you could ask about your portfolio’s performance on a given week versus a random portfolio (of equal) on that week. So the population is “All Possible 5 Stock portfolios on Week N”. The null hypothesis is “My 5 stock portfolio on Week N was chosen at random”. The universe I use is basically the top 3000 or so stocks according to liquidity. On a given week, there are [3000 choose 5] possible 5 stock portfolios (Google says that’s 2,000,000,000,000,000). This population is far from normally distributed. I know because I have measured, that for individual stocks in this universe, a little over 50% of the stocks lose money on any given week. 60% do worse than the average for that week. 20% do very much worse and 20% do very much better. Doing random samples of 5 stocks from this universe narrows the range of possibilities, but the distribution remains very similar. Asking “How much better than random am I” can’t be easily answered even with all of that data. There are weeks when my 5 stock port is in the top 10% of possible 5 stock portfolios (it did better than 90+% of the possible combinations of 5 stocks that could have been held that week). There are weeks when my 5 stock port in in the bottom 10%. It averages in the top 35%. Does that tell me how much it will make next week? No.
There is yet another way to try and answer the question “How much better than random am I”. Instead of looking at a single week or averages of weeks, I can look at performance over the whole time period. So I can say my 5 stock port made X1% from week 1 to week N, and here are the total returns of 200 random (sell rule true, ranking system of “Random”, port size 5) portfolios over that same N week period. You’ll notice two big things about the random returns: 1) they are much closer to normally distributed than the individual weeks (though they are still fat tailed); 2) the average of the total returns is much lower than the universe’s average total return over that time period. In any case, at this point you can start applying some statistical tests to answer the question “How likely is it that a 5 stock portfolio of randomly selected stocks would perform as well as my 5 stock port?” Maybe you calculate the mean and standard deviation of the random ports and measure how many standard deviations out your port’s performance is and then use a cumulative distribution function to calculate the odds. Maybe you use that same process but use a log-normal distribution or use formulas that permit you to customize a distribution’s skew and kurtosis. Say you get a number you’re happy with. Say it’s 9 standard deviations. At 4.25SDs, the odds are 1 in 100,000. What do you know now? Does this tell you what your return will be next week? Does it tell you how “robust” your system is? It is not particularly difficult to beat a portfolio of 5 random stocks over any reasonable period of time. On my universe a random 500 stock portfolio will beat a random 5 stock portfolio almost 60% of the time in terms of total return over 15 years.
I did not do it in excel; however, all of the data crunching I did could have been done using p123 and excel. The tedious work is getting the data for your universe’s performance each week for the period you’re interested in. Create a ranking system with “Random” as the only factor. Create a single stock sim with that as the ranking system, and a sell rule of true. Run the sim a few hundred times and download the weekly performance data each run. Crunch data until you arrive at a lie you like.