Sample size needed for power of a study

Click For Summary

Discussion Overview

The discussion revolves around determining the sample size needed for a study to achieve a power of 80% when testing the effect of special instructions on 5th graders' achievement test scores. Participants explore statistical concepts related to hypothesis testing, z-scores, and sample size calculations.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant presents a scenario involving a mean score of 200 and hypothesizes that a special instruction will raise the mean to 208, seeking help on calculating the necessary sample size for 80% power.
  • Another participant corrects the formula for z-scores and suggests that the z value corresponding to an 80% probability is approximately 0.85, leading to a calculation that indicates a sample size of at least 26 is needed.
  • A subsequent reply reiterates the z-score formula and proposes a different calculation, suggesting a sample size greater than 36 based on their interpretation of the necessary conditions.
  • Another participant emphasizes the importance of specifying the significance level (alpha) in the calculations, discussing the implications of Type I and Type II errors and providing a detailed mathematical approach to determine the sample size needed for both groups in the study.
  • This participant also introduces a formula involving the standard deviation of the difference between means and provides a calculation that suggests needing over 566 students per group, referencing an external tool for sample size calculations.
  • One participant expresses surprise at the complexity of the calculations and thanks others for their guidance and resources.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the exact sample size required, with differing calculations and interpretations of the necessary parameters. Multiple competing views remain regarding the correct approach to determining sample size and the significance level.

Contextual Notes

Limitations include varying interpretations of the significance level and the assumptions underlying the statistical models used in the calculations. Some participants may not be familiar with all statistical terms and concepts, which could affect their understanding of the discussion.

Math Is Hard
Staff Emeritus
Science Advisor
Gold Member
Messages
4,663
Reaction score
36
I can't remember how to figure out this type of problem. I swear I figured this out once before, but now I am clueless..

Let's say there's a certain achievement test and you know that 5th graders in general score a mean of 200 on the test. The known standard deviation of the population is 48 on this test.

You hypothesize that giving a group of 5th graders special instructions before the test (to choose the first answer that comes to mind) will cause them to score higher. The predicted mean is 208 for this group.

What I want to find now is how many 5th graders I would need in my sample size for the power of the study to be 80%.

What I have figured so far is that z-score I will need to get on my distribution of means for population 1 (based on the research hypothesis) is -.84.
The standard deviation on that distribution of means will be 48/ sqrt(N).
N being the number of kids in my sample. The mean will be 208.
I know that z = (x-m)sd but I am stuck on how to solve from here.

I would appreciate any help. Thanks! :smile:
 
Physics news on Phys.org
Actually z = (x-m)/sd and if you are using m=200 you will need to find the z value corresponding to

Pr(observation < z) = 0.80 - I think that this z value is 0.85 but I haven't checked it too much

and then your x (observation) would be

x > m + z*sd

This will give you 80% confidence if x really is big enough -- you say that you observe an x of 208 -- in that case

for z*48/sqrt(N) to be less than or equal to 8, sqrt(N) will have to be > 48*0.85/8 = 5.1

or N > 26.
 
Last edited:
donjennix said:
Actually z = (x-m)/sd
doh! sorry - typo! :redface:

and if you are using m=200 you will need to find the z value corresponding to

Pr(observation < z) = 0.80

and then your x (observation) would be

x > m + z*sd

This will give you 80% confidence if x really is big enough -- for 48/sqrt(N) to be less than or equal to 8 sqrt(N) will have to be > 6, N>36

Thanks for your help! I am not familiar with Pr() and "observation" - but I think I see what you're saying.
 
...
 
Minimum sample size 2 x 567 students for sigma/delta=1/6 alpha=0.05, 1-beta=0.8

None of you state the significance level alpha, which enters quite crucially into the calculation.
Alpha is the probability to falsely reject your H0 hypothesis (no difference between the groups) in case it is true. This is the "patient's error" because it will lead to the patients/students bearing the side effects of an ineffective intervention.

Assuming equal variance and normality of the distributions of scores in both groups, specifying the common alpha-level of 0.025 one-sided (or 0.05 two-sided), you will have a significant result, if the difference d between the group means turns out to be d > 1.96 *SigmaD where SigmaD is the standard deviation of d.

The 1.96 is computed as y=1-alpha/2 (=0.975); x=sqrt(2)*erfinv(2*y-1), (=1.96) where erfinv is the inverse error function.

This standard deviation is SigmaD=sigma*sqrt(1/Ni+1/Nc) where
o sigma=48 is the standard deviaion of the scores within each group
o Nc is the sample size of the "c"ontrol group
o Ni is the sample size of the "i"ntervention group
so we need d > 1.96 * sigma*sqrt(1/Ni+1/Nc) in order to reject the H0-Hypothesis of no difference between the groups.
Up to here, this is independent of the expected group means of 200 and 208.

In addition, you want to avoid the "manufacturer's error" of failing to reject H0 in case it is false. A commonly accepted risk for this to happen is 20% or beta=0.2. You say you want power 1-beta=0.80 of the expected distribution of d (with mean 8) to lie to the right of the above value of 1.96 * sigma*sqrt(1/Ni+1/Nc)

This 20% percentile is at Delta - 0.8416 * sigma*sqrt(1/Ni+1/Nc)

where Delta is the expectation value for d (which is 208-200=8 in this example)
The number 0.8416 results from y=0.80;x=sqrt(2)*erfinv(2*y-1); which gives x = 0.8416.

So we have
1.96 * sigma*sqrt(1/Ni+1/Nc) < Delta - 0.8416 * sigma*sqrt(1/Ni+1/Nc)

or
2.8016*sqrt(1/Ni+1/Nc) < Delta / sigma

1/Ni+1/Nc < (Delta / sigma / 2.8016)^2

If you choose Ni=Nc=N, you get N > 2*(2.8016*48/8)^2 =566 So you will need N>566 students in each group.

In case you are not into DIY math, You can get this standard computation ready made at the interactive site:
http://hedwig.mgh.harvard.edu/sample_size/quan_measur/para_quant.html
(they round differently and get 567 per group BTW)

In addition, here is the Maple code to compute the standard deviation of the distribution of d for this example:

P:=proc(x,m,sigma) exp(-(x-m)^2/abs(2*sigma^2))/sqrt(abs(2*sigma^2)*Pi) end proc;
Pd:=int(P(x,200,48/sqrt(Nc))*P(x+d,208,48/sqrt(Ni)),x=-infinity .. infinity);
sqrt(int((d-8)^2*Pd,d= - infinity .. infinity));
 
Last edited:
Holy cow! I didn't think it would be that much of a pain in the butt to calculate. Thank you for the instructions - and for that link.
 

Similar threads

Replies
1
Views
1K
  • · Replies 31 ·
2
Replies
31
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
20
Views
3K
Replies
1
Views
2K
  • · Replies 24 ·
Replies
24
Views
7K
  • · Replies 6 ·
Replies
6
Views
3K