Type 1 Error Increases with Sample Size?

Click For Summary

Discussion Overview

The discussion centers around the relationship between sample size and Type I error rates in hypothesis testing, specifically whether Type I error increases with larger sample sizes while keeping alpha constant. Participants explore examples, clarify definitions, and question the implications of statistical analysis in various contexts.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • Some participants suggest that increasing sample size while keeping alpha constant may lead to an increased likelihood of Type I error, citing examples such as coin flips.
  • Others challenge this view, asserting that keeping alpha constant means Type I error remains constant, and that the probability of rejecting the null hypothesis does not inherently increase with sample size.
  • A participant proposes that Type I error might actually decrease with larger sample sizes, although this is not substantiated with detailed reasoning.
  • Clarifications are made regarding the definitions of p-values and their relation to alpha, with some participants expressing confusion over the correct interpretation of these terms in the context of hypothesis testing.
  • There is a discussion about the appropriateness of using normal distribution versus Student's t-distribution based on sample size, with some participants noting that large sample sizes typically align with normal distribution.
  • Participants express uncertainty about the implications of hypothesis testing methods and the rationale behind using probabilities of outcomes that did not occur.

Areas of Agreement / Disagreement

Participants do not reach a consensus on whether Type I error increases with sample size. Multiple competing views are presented, and the discussion remains unresolved regarding the implications of sample size on Type I error rates.

Contextual Notes

Participants note limitations in their understanding of statistical concepts, such as the definitions of p-values and Type I error, as well as the relationship between sample size and hypothesis testing methods. There is also mention of the need for more sophisticated concepts like "power" and "power curves" to fully address the questions raised.

beakymango
Messages
4
Reaction score
0
TL;DR
Can you argue that type 1 error increases with sample size?
My professor is teaching us that type 1 error increases with sample size if you keep alpha constant, and I think I understand what she's getting at, but I can't find anything online that supports the idea. Here's what I'm thinking:

We accept that there is an equal chance that a flipped coin will land on heads or tails. This is one scenario where we know that the null hypothesis cannot be rejected. However, if you flip 10,000 coins and you find that 5,005 coins land on heads and 4,995 coins land on tails, you might be able to show that p<0.05 that coins are more likely to land on heads, so you would falsely reject the null. With a smaller sample size, you would be able to disregard the variation as insignificant.

But I'm pretty sure we don't apply statistical analysis to things like this. And when we're testing the efficacy of a drug compared to placebo, we use statistical analysis instead of testing it on increasing sample sizes to see if the numbers converge. I can't exactly put my finger on why that is (besides practicality), but I think that's why my coin example isn't valid.
 
Physics news on Phys.org
beakymango said:
We accept that there is an equal chance that a flipped coin will land on heads or tails. This is one scenario where we know that the null hypothesis cannot be rejected. However, if you flip 10,000 coins and you find that 5,005 coins land on heads and 4,995 coins land on tails, you might be able to show that p<0.05 that coins are more likely to land on heads, so you would falsely reject the null. With a smaller sample size, you would be able to disregard the variation as insignificant.
Your probability is off. The null hypothesis, ##H_0## is p = 0.5. The alternate hypothesis, ##H_a##, would be p < 0.5, not p < 0.05.
beakymango said:
But I'm pretty sure we don't apply statistical analysis to things like this.
No, that's not correct. That's exactly how you would determine which hypothesis to accept. In this case, with a large number of sample coin flips, a normal distribution could be used for this binomial probability. It's been quite a few years since I taught any statistics classes, but I believe what I'm saying is true.
 
Mark44 said:
Your probability is off. The null hypothesis, ##H_0## is p = 0.5. The alternate hypothesis, ##H_a##, would be p < 0.5, not p < 0.05.
No, that's not correct. That's exactly how you would determine which hypothesis to accept. In this case, with a large number of sample coin flips, a normal distribution could be used for this binomial probability. It's been quite a few years since I taught any statistics classes, but I believe what I'm saying is true.
Sorry if I was unclear. I didn't mean p in this case to be probability, I meant p as a Student T-test. Like p<alpha. My main question is if type 1 error increases with sample size.
 
beakymango said:
Sorry if I was unclear. I didn't mean p in this case to be probability, I meant p as a Student T-test. Like p<alpha. My main question is if type 1 error increases with sample size.
I'm still not following how p as you describe it relates to ##\alpha## or the Student' t-test. For one thing, the Student's t-test is generally used for relatively small sample sizes. With large sample sizes, like 10,000 in your first post, the t distribution is identical to the normal distribution.
For a binomial distribution, p represents the probability that one of two events occurs.
Also, a Type I error is defined as ##P(\text{Type I error}) = P(\text{we reject H}_0 | \text{H}_0 \text{ is true})##.
##\alpha## represents the total probability outside the critical region.
Please describe what you are using p to represent.

Just off the top of my head, I'd say that Type I error decreases with larger sample sizes, but I haven't worked out anything to allow me to justify that.
 
beakymango said:
My professor is teaching us that type 1 error increases with sample size if you keep alpha constant,

Are you sure that's what the professor said? Try stating what was said word for word.

By the usual terminology, alpha is the probability of rejecting the null hypothesis when it is true and such an error is called type 1 error. So if you keep alpha constant, you keep type 1 error constant.
We accept that there is an equal chance that a flipped coin will land on heads or tails. This is one scenario where we know that the null hypothesis cannot be rejected.
You mean "should not" be rejected.

However, if you flip 10,000 coins and you find that 5,005 coins land on heads and 4,995 coins land on tails, you might be able to show that p<0.05 that coins are more likely to land on heads,
Taking a guess at what the professor actually said, we can say this:
We usually don't base rejecting the null hypothesis on the probability of the exact outcome of an experiment. The probability of each exact outcome of a fair toin tossing experiment goes down as we increase the number of tosses. For example, in 2 tosses, the probability of 1 head and 1 tail (in some order) is 1/2. By contrast, the probability of the exact outcome of 5,005 heads and 4,9995 tails (in some order) is ##{10000 \choose 5005} (1/2)^{10000}##. Whatever that is, it's smaller than 1/2.

However, the commonly used hypothesis tests are not based on assuming the null hypothesis and computing the probability of the exact outcome of an experiment. Instead, they are so-called "one tailed" and "two tailed tests" where we compute the probability of an event that includes the exact outcome of the experiment plus other outcomes that did not happen.

If you scrutinize hypothesis testing, you can ask the very good question "Why should hypothesis testing involve the probability of outcomes that did not occur?". It is easy to give various intuitive answers to this question. However, a mathematically precise answer requires introducing the concept of the "power" of a statistical test and the concept of "power curves". The usual approach in introductory statistics is to present one tailed and two tailed hypothesis tests as "accepted" procedures, without getting into the sophisticated concepts that somewhat justify using them.
 
  • Like
Likes   Reactions: beakymango
Stephen Tashi said:
Are you sure that's what the professor said? Try stating what was said word for word.

By the usual terminology, alpha is the probability of rejecting the null hypothesis when it is true and such an error is called type 1 error. So if you keep alpha constant, you keep type 1 error constant.
You mean "should not" be rejected.

Taking a guess at what the professor actually said, we can say this:
We usually don't base rejecting the null hypothesis on the probability of the exact outcome of an experiment. The probability of each exact outcome of a fair toin tossing experiment goes down as we increase the number of tosses. For example, in 2 tosses, the probability of 1 head and 1 tail (in some order) is 1/2. By contrast, the probability of the exact outcome of 5,005 heads and 4,9995 tails (in some order) is ##{10000 \choose 5005} (1/2)^{10000}##. Whatever that is, it's smaller than 1/2.

However, the commonly used hypothesis tests are not based on assuming the null hypothesis and computing the probability of the exact outcome of an experiment. Instead, they are so-called "one tailed" and "two tailed tests" where we compute the probability of an event that includes the exact outcome of the experiment plus other outcomes that did not happen.

If you scrutinize hypothesis testing, you can ask the very good question "Why should hypothesis testing involve the probability of outcomes that did not occur?". It is easy to give various intuitive answers to this question. However, a mathematically precise answer requires introducing the concept of the "power" of a statistical test and the concept of "power curves". The usual approach in introductory statistics is to present one tailed and two tailed hypothesis tests as "accepted" procedures, without getting into the sophisticated concepts that somewhat justify using them.

Okay -- that makes a lot of sense. I think I can conceptualize why we care about the probability of outcomes that do not occur. Nonetheless, I still don't quite understand what my professor is saying. We defined alpha as the researcher's tolerance for false positives, if that helps? This is copy pasted off our powerpoint: "Larger sample size while keeping same α results in higher chance of type I error"
 
beakymango said:
This is copy pasted off our powerpoint: "Larger sample size while keeping same α results in higher chance of type I error"

We need to look at the definitions of "alpha" and "type I error" that your course is using. Is the powerpoint about a specific hypothesis test? - or is it a general claim?
 
  • Like
Likes   Reactions: beakymango
Stephen Tashi said:
We need to look at the definitions of "alpha" and "type I error" that your course is using. Is the powerpoint about a specific hypothesis test? - or is it a general claim?
Just in general, but we are a biostats class if that makes anything different. We define Type 1 Errors as "false positive" and alpha as "the highest risk of making a false positive error." I asked for clarification today, and she it's because as you increase the sample size, you're making it more likely to reject the null hypothesis, so you're increasing the risk of making a false positive. She explained that if you use a large enough sample size, you can almost always prove a statistically significant difference between two groups of people. This sort of makes sense to me, but I also thought increasing sample sizes made it easier to detect smaller (but true) differences between two groups by shrinking the standard deviation.
 
beakymango said:
Just in general, but we are a biostats class if that makes anything different. We define Type 1 Errors as "false positive"
That use of "Type 1 Error" is like defining it to be throwing a head in a in single toss of a coin. By contrast, the event used in defining Type 1 Error for a coin toss experiment will be something like " 7000 or more heads out of 10,000 tosses".

Of course, it's true that you're more likely to get, say, at least 1 head in 10,000 tosses of a fair coin that to get at least 1 head in 2 tosses of a fair coin.

and alpha as "the highest risk of making a false positive error."

Perhaps "alpha" is being used to refer to the probability of a false positive on a single trial.

In mathematical statistics, "type one error" does not refer to the occurrence of a single "a false positive" and "alpha" does not refer to the probability of getting a false positive on a single trial. So it isn't surprising that you found no sources to support the professor's claims. They are incorrect vis-a-vis the standard terminology.
 
  • #10
I would believe the opposite would be the case because of the Law of Large Numbers and related. Effect sizes would approach the " True Value" the more evidence you have.
 

Similar threads

  • · Replies 43 ·
2
Replies
43
Views
6K
  • · Replies 24 ·
Replies
24
Views
7K
  • · Replies 24 ·
Replies
24
Views
6K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
Replies
1
Views
4K