# Proving that treatment is statistically effective - paired t, P value, CI?

1. Apr 21, 2012

### MBCT

Hi,

I'm really struggling with a statistical problem:

5 people had a medical treatment, they were tested before and after using a questionnaire which recorded some numerical data. I want to analyse the before and after results to see whether there was any change, whether it is statistically significant and then write this up correctly.

I don't have SPSS so I have been using the Data Analysis functions of Microsoft Excel as described here:http://www.stattutorials.com/EXCEL/EXCEL_TTEST2.html

So (before score 1-5, after score 1-5)
Person 1 - 2.75, 3.65
Person 2 - 1.75, 3.25
Person 3 - 3.75, 3.50
Person 4 - 1.75, 2.50
Person 5 - 2.75, 4.00

So first I did a t-Test Paired Two Sample for Means which gave me the following:
Mean 2.55, 3.384
Variance 0.7, 0.31853
Observations 5
Pearson Correlation 0.600384
Hypothesized Mean Difference 0
df 4
t Stat -2.77529
P(T<=t) one-tail 0.02503
t Critical one-tail 2.131847
P(T<=t) two-tail 0.050059
t Critical two-tail 2.776445

So, the important information seems to be that on a one tail P I have got 0.02 which is less than 0.05 and therefore I could reject the null hypothesis and say that the change is not due to chance and the treatment is statistically significant.

But, I only have a sample of 5 and therefore can't really say that can I? What sample size would I need to reject the null hypothesis and how can I calculate this?

Also, as tempting as it is to use one-tail, actually the treatment might be ineffective or make things worse and so maybe I should use two-tail, in which case is 0.050059 enough above 0.05 to make rejecting the null hypothesis impossible even if the sample size was much higher?

I then worked out the difference between the two sets of data by doing "after minus before" treatment giving me:

Person 1 - 0.92
Person 2 - 1.50
Person 3 - -0.25
Person 4 - 0.75
Person 5 - 1.25

Using Excel's Descriptive Statistics gave me this information:

Mean 0.834
Standard Error 0.30051
Median 0.92
Mode #NA
Standard Deviation 0.67196
Sample Variance 0.45153
Kurtosis 1.856022
Skewness -1.24463
Range 1.75
Minimum -0.25
Maximum 1.5
Sum 4.17
Count 5
Confidence Level (95%) 0.834348
Confidence Interval -0.00035, 1.668348

Now, as far as I can tell all I really need to know is that the Mean change is +0.834, the Standard Deviation is 0.67 and the Confidence Level is 0.834.

But, I don't really understand what this tells me about the treatment, was it effective? Was its effectiveness only held back by the small sample size?

As an added complication, I also have some data where I would expect the scores to get worse, where the treatment is the same but what I'm looking at is something like tumour size where I expect the treatment to reduce it. When I am looking at the difference between the before and after scores, do I do "after minus before" the same as above? If so then I would be happier with minus differences rather than plus... or is it just a case of how you then describe the data?

I hope that this makes any sense at all, and someone can help me in a jargon free way.

Thanks.
MBCT

2. Apr 22, 2012

### SW VandeCarr

Significance levels are just conventions and refereed journals often set their own rules. The "usual" two sided $P\leq 0.05$ is often set aside for stricter levels. Even a "significant" t test for only 5 subjects would probably not get published anywhere except maybe as a letter in Lancet if the subject was of broad interest. Your example falls just short of "significance" for the two sided test, The 95% confidence intervals for a significant difference in the means includes 0 which corresponds to your p value just over 0.05 for the two sided test.

Your data also shows that one patient responded minimally in the wrong direction to the intervention. Reviewers would look at this as a negative as much as they would the p value.

If you have specific questions about the outputs, could you list them?

EDIT: You don't "prove" a treatment is effective except in individual cases. For populations, you only meet some artificial criterion that is accepted as a sufficient demonstration of effectiveness.

Last edited: Apr 22, 2012