# Back-testing Stock Selection - Data Analysis Help Needed

 Sci Advisor P: 3,313 In my opinion, you can't think clearly about problems involving statistics unless you have a probability model for the phenomenon you are studying. If you do the traditional type of statistics without formulating such a model, the methods you use are actually assuming a particular probability model, so you haven't escaped the requirement - you've only succeeded in pushing it into the background or remaining ignorant of it. If the primary goal is do a conceptually clear analysis, you should formulate a probability model for how the data is generated. On the other hand, if the primary goal is to write an acceptable academic document, then focus you attention on which people are going to approve it. The use of statistics is subjective and different people may have very strong opinions on statistical methods. Certain methods are established traditions in certain fields of study. The simplest course of action is to ask the people who are going to review the document for suggestions and to follow those suggestions. Your question of how extract information from the data is a natural question, but it puts people who answer in the position of doing a mind-reading exercise as to goals of the analysis. My mind reading attempt is this: There is some sort of "utility" function that defines how well your valuation method works. I can't follow exactly how you compute it, but I'll imagine it as something like this: We assume that at time t = n, investor A invests, say, $10,000, in stocks that your method reccomends, dividing his money equally among those stocks. At the same time, investor B invests$10,000 in the same manner in stocks your method recommends against. We assume that at time t = n + 1, both investors sell their stocks. The utiity of your method from time t = n to t = n + 1 is (profit of investor A - profit of investor B). Let's assume your goal is to use the traditional (non-Bayesian) sort of statisitcs to "prove" that your method "works". The simplest formulation of this is to a "hypothesis test". For this, we need a "null hypothesis", which will express the general idea "There is no difference in the performance of investor A and investor B". However, the null hypothesis must say more than this. It must say enough to let us compute the probability of observing some statistic ( like the t-statistic ). For example, if you assume that each "step" in the observed data from time t = n to time t = n+1 is an independent draw from a random variable representing the distribution of utility then you can compute the proability of various statistics. Since your are dealing with time series data, I'm dubious of such a simplified assumption. It seems to me that what happens in successive steps of the process aren't independent events if you believe in "trends" in the stockmarket. Again, I emphasize that if you know the people who will review your document, then you should try to divine their opinions on such matters. If you can't speak to them directly, then try to look at work that they have written and see what they did.