# Back-testing Stock Selection - Data Analysis Help Needed

• unleash
In summary, It is important to have a clear understanding of the probability model underlying your data when working with statistics. This can help you choose the appropriate methods and accurately interpret the results. However, if the primary goal is to write an academic document, it may be necessary to follow established traditions and guidelines set by the reviewers. Ultimately, the goal is to use statistics to prove that the stock selection method is effective, and this can be achieved by formulating a null hypothesis and conducting a hypothesis test. It is also important to consider the underlying assumptions, such as the independence of data in time series analysis.
unleash
Hey everyone!

I'm a finance grad and am doing my first big project back-testing some stock selection methods.

I have spent the last few weeks writing a big vba program to run the back-test and I have the following:

10 dates (5 years semi-annual) and 40 companies where, for a given date, if data is availalbe then I have
a) the stock price on that date
b) a valuation
from which I then calculate % difference to determine whether I value the stock at more or less than it's trading.

On each of these dates, I have a set of companies (fewer - approx 10 for the earlier dates since not all companies had sufficient data for a back-test that far back and a full 40 for the latest few dates) and I have for each company a 'spread' which is used to indicate whether to buy or sell the stock.

I have tried a non-parametric method of testing whether the stock-selection method works by ranking the stocks on each date by spread and creating an equally weighted portfolio of the top quartile and similarly for the lower quartile and then check the return over the next 6 months.

The results are as hoped with the quartile with the highest spread (valuation suggests that they're a 'buy') yielding the highest return over the following period and conversely, the lower quartile significantly under-performs relative to the top quartile and relative to an equally weighted holding of all stocks tested for that given sub-period.

I would now like to statistically test this relationship. A t-test comes to mind but I'm unsure about whether I should just take the top quartile versus lower quartile just for each sub-period and do 10 t-tests (similarly for buy vs equally weighted sample portfolio) ... or whether I should somehow do a test over the entire set of 10 dates (given that the number of companies on each date is different and so each portfolio is different.

Also, any other suggestions of nonparametric or other statistical methods to draw some juicyness out of the data will be much appreciated! :)

Regards
a

In my opinion, you can't think clearly about problems involving statistics unless you have a probability model for the phenomenon you are studying. If you do the traditional type of statistics without formulating such a model, the methods you use are actually assuming a particular probability model, so you haven't escaped the requirement - you've only succeeded in pushing it into the background or remaining ignorant of it.

If the primary goal is do a conceptually clear analysis, you should formulate a probability model for how the data is generated.

On the other hand, if the primary goal is to write an acceptable academic document, then focus you attention on which people are going to approve it. The use of statistics is subjective and different people may have very strong opinions on statistical methods. Certain methods are established traditions in certain fields of study. The simplest course of action is to ask the people who are going to review the document for suggestions and to follow those suggestions.

Your question of how extract information from the data is a natural question, but it puts people who answer in the position of doing a mind-reading exercise as to goals of the analysis.

My mind reading attempt is this: There is some sort of "utility" function that defines how well your valuation method works. I can't follow exactly how you compute it, but I'll imagine it as something like this: We assume that at time t = n, investor A invests, say, $10,000, in stocks that your method reccomends, dividing his money equally among those stocks. At the same time, investor B invests$10,000 in the same manner in stocks your method recommends against. We assume that at time t = n + 1, both investors sell their stocks. The utiity of your method from time t = n to t = n + 1 is (profit of investor A - profit of investor B).

Let's assume your goal is to use the traditional (non-Bayesian) sort of statisitcs to "prove" that your method "works". The simplest formulation of this is to a "hypothesis test". For this, we need a "null hypothesis", which will express the general idea "There is no difference in the performance of investor A and investor B". However, the null hypothesis must say more than this. It must say enough to let us compute the probability of observing some statistic ( like the t-statistic ).

For example, if you assume that each "step" in the observed data from time t = n to time t = n+1 is an independent draw from a random variable representing the distribution of utility then you can compute the proability of various statistics.

Since your are dealing with time series data, I'm dubious of such a simplified assumption. It seems to me that what happens in successive steps of the process aren't independent events if you believe in "trends" in the stockmarket. Again, I emphasize that if you know the people who will review your document, then you should try to divine their opinions on such matters. If you can't speak to them directly, then try to look at work that they have written and see what they did.

## 1. What is back-testing and why is it important for stock selection?

Back-testing is a process of testing a trading strategy on historical market data to evaluate its performance. It is important for stock selection because it allows investors to assess the effectiveness of their trading strategies and make informed decisions based on past results.

## 2. How do I conduct a back-test for stock selection?

To conduct a back-test for stock selection, you need to first define your trading strategy, gather historical market data, and use a back-testing software or platform to run the test. The software will generate a report of the performance of your strategy on the historical data, which you can use to refine and improve your strategy.

## 3. What are the limitations of back-testing for stock selection?

Back-testing has some limitations, such as the fact that it is based on historical data and may not accurately reflect future market conditions. It also does not account for transaction costs, slippage, and other factors that can impact the performance of a trading strategy in real-time.

## 4. How can I use back-testing to optimize my stock selection strategy?

Back-testing can be used to optimize a stock selection strategy by allowing you to test and refine different variations of your strategy on historical data. By analyzing the results, you can identify the most effective strategy and make adjustments to improve its performance.

## 5. Are there any alternative methods to back-testing for stock selection?

Yes, there are alternative methods to back-testing for stock selection, such as forward-testing and paper trading. Forward-testing involves testing a trading strategy on real-time data in a simulated environment, while paper trading involves executing trades on paper without using real money. These methods can provide more realistic results compared to back-testing, but they also have their own limitations.

• Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
• MATLAB, Maple, Mathematica, LaTeX
Replies
8
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
• Computing and Technology
Replies
25
Views
3K
• Set Theory, Logic, Probability, Statistics
Replies
3
Views
4K
• Set Theory, Logic, Probability, Statistics
Replies
19
Views
2K
• Beyond the Standard Models
Replies
1
Views
2K