Is there an optimal upper limit for power in statistical testing?

Click For Summary

Discussion Overview

The discussion revolves around the concept of statistical power in hypothesis testing, specifically questioning whether there is an optimal upper limit for power to avoid rejecting the null hypothesis when there is no true effect. Participants explore the implications of sample size on power and the potential risks associated with high power levels.

Discussion Character

  • Exploratory
  • Debate/contested
  • Technical explanation

Main Points Raised

  • One participant notes that increasing sample size (N) enhances statistical power (1 minus beta), but raises concerns about the risk of incorrectly rejecting the null hypothesis when there is no effect, suggesting a need for an upper limit on power.
  • Another participant cites various sources indicating that a power level of 0.80 is commonly accepted, with some advocating for 0.90 in specific contexts like clinical trials.
  • It is mentioned that while power is often set at 0.80, some argue that higher power levels (e.g., 0.90 or 0.95) could be acceptable, depending on the research context.
  • One participant clarifies that increasing N does not increase the probability of Type I error if alpha is fixed, suggesting that the concern about high power may be misplaced.
  • A participant questions the definition of power, indicating that it is a function rather than a single value, and asks what effect size is assumed when stating power as 0.80.
  • Another participant agrees that power should be represented by a "power curve" and emphasizes the importance of specifying the effect size when discussing power levels.

Areas of Agreement / Disagreement

Participants express differing views on the implications of high power in statistical testing, with some advocating for a specific upper limit while others argue that increasing power does not inherently lead to increased Type I error. The discussion remains unresolved regarding the optimal power level and its implications.

Contextual Notes

The discussion highlights the lack of formal standards for power levels in statistical testing and the dependence on context, such as the type of study being conducted. There is also an acknowledgment of the need for clarity regarding effect sizes when discussing power.

tmp_acnt
Messages
1
Reaction score
0
I'm new here, so first of all Hi :)

I did some reading & searching but didn't find an answer direct enough to the issue that bothers me: there's something regarding the power of a statistical test, 1 minus beta, that doesn't add up for me. I'd appreciate any assistance, and if it's possible please provide reliable references.

1. It is known that one way to achieve greater power is by using a larger sample size, N.
2. However, an N too large will result in a higher probability to reject the null hypothesis even when there is no effect at all, which is obviously a bad thing.
3. Since a large N also increases the power, one may conclude that a power too large will also be considered as a bad thing.

So, what I'd like to know is whether there is some "recommended upper bound/limit" for the power, one that you shouldn't pass in order to reduce the chances of rejecting the null hypothesis even when there is no effect. Something like the conventional 0.05 for the value of the alpha (in some fields).

Thanks
 
Physics news on Phys.org
"Although there are no formal standards for power, most researchers who assess the power of their tests use 0.80 as a standard for adequacy. "
wikipedia
wikia, The Psychology Wiki

"It is customary to set power at greater than or equal to 0.80, although some experts advocate 0.90 in clinical trials (Eng, 2003). As this is not a clinical trial, then it is appropriate to set power at 0.80."
Research considerations: power analysis and effect size (MedSurg Nursing, Feb, 2008, by Lynne M. Connelly)

"Power=0.80 is common, but one could also choose Power=.95 or.90." How to Calculate Sample Size & Power Analysis Information for Dissertation Students & Researchers (ResearchConsultation.com in memory of Dr. Jeffrey W. Braunstein)

"Often times, a power greater than 0.80 is deemed acceptable."
EPA Endocrine Disruptor Screening Program (EDSP) statistical documentation

"Cohen (1988) suggests that the conventional Type II error rate should be 0.20, which would set power conventionally at 0.80. A materially smaller power would result in an unacceptable risk of Type II error, while a significantly larger value would probably require a larger sample size than is generally practical (Cohen 1992). Setting [beta] at 0.20 is consistent with the prevailing view that Type I error is more serious. Since [alpha] is conventionally set at 0.05, Cohen suggested setting [beta] at four times that value (Cohen 1988, 1992)."
An Analysis of Statistical Power in Behavioral Accounting Research (Behavioral Research in Accounting, 01-JAN-01, by Borkowski, Susan C. ; Welsh, Mary Jeanne ; Zhang, Qinke Michael)
 
Last edited:
tmp_acnt said:
2. However, an N too large will result in a higher probability to reject the null hypothesis even when there is no effect at all, which is obviously a bad thing.
That's a type I error and it is controlled by alpha. For a fixed alpha, increasing N increases the power of the test without increasing the chance of type I error, so what you mention here is not a problem.
 
Well I found that the best way to do it is through practice, similar to doing your http://theory-test-practice.co.uk"
 
Last edited by a moderator:
The definition that I've seen for the power of a statistical test defines it as a function, not as a single number. So when the number .80 is mentioned for the power of a test, what is the definition of "power" in that context? What calculation produces this number? Is some particular "effect size" assumed?
 
Yes, Stephen Tashi, you are right. Power is best described by a "power curve" rather than by a single number. Those who say the power is 80%, for example, have a particular effect size in mind (and reallly ought to say what it is). The user might well be interested in different effect sizes.
 

Similar threads

  • · Replies 24 ·
Replies
24
Views
6K
  • · Replies 30 ·
2
Replies
30
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 9 ·
Replies
9
Views
4K
Replies
1
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K