Sample variances & ANOVA: How different is too different?

  • Thread starter Rasalhague
  • Start date
  • Tags
    anova
In summary, the conversation discusses the use of ANOVA in analyzing the performance of three groups with different teaching methods. The assumptions that must be considered include the normal distribution of the populations and the teaching method being the only explanation for performance differences. ANOVA can be done with the given data, but the sample variances must not be too different. The criteria for determining if the variances are too different is not mentioned, but the F test and Bonferroni correction are possible methods.
  • #1
Rasalhague
1,387
2
Koosis: Statistics..., 4th ed., p. 177:

A number of students is assigned randomly to three classes with three different teaching methods. The following statistics summarize the performance of the three groups [...] Can you perform an analysis of variance with these data? What assumptions are involved?

Group 1: n = 10, s2 = 100.
Group 2: n = 11, s2 = 81.
Group 2: n = 8, s2 = 64.

The given answer is yes. "You must be prepared to assumed the populations from which you are sampling are normally distributed" and that "the teaching method is the only reasonable explanation of the differences between groups."

In the next problem, the situation is the same, but the data are different. Now the sample variances are 144, 81 and 64. Can ANOVA be done on these data? Answer no, the sample variances are too different.

The obvious question: How different is too different?

One thought I had was to do an F test with null hypothesis the population variances are equal, alternative the population with the biggest sample variance has a bigger population variance than the population with the smallest sample variance. But in Excel, with a 5% significance level, I get a critical value of 3.68. The F scores for both ratios of sample variances are less than this: 100/64 = 1.56 and 144/64 = 2.25. So I guess this isn't the criterion. But why not? And what is?

Another question: what are the populations in this case: three sets each consisting of a hypothetical continuum of infinitely many identical students? Or three copies of the same finite set of actual students, depending on context? Or some ill-defined three copies of the same large, but finite set of all students in history who might conceivably be taught, or have been taught, by these methods, whose population parameters are only approximated by the normal probability measure? Or is it not advisable to think too hard about what population means in such cases?
 
Physics news on Phys.org
  • #2
You have a very thorough and rigorous approach to studying mathematics, so I can't resist asking why you are bothering to study statistics. Applying statistics to anything is a largely subjective and non-rigorous activity!

If a test involving sampling assumes "a normal population" then the bottom line for the population must be that independently drawn samples from it have a normal distribution, so the mathematically simplest way to visualize the population of students would be to visualize an infinite population of them, having a continuum of values. Of course, taking that idea seriously would rule out applying statistics to many real world problems, so your thought of "it is not adviable to think to hard" about the population is the one that is usually applied.
 
  • #3
Okay, thought center deactivated : )

But, philosophy aside, I guess there's some rule of thumb though, at least? I read that the central limit theorem applies for a given, fixed sample size of at least 30, and that the binomial distribution is a good approximation for the hypergeometric when the population is at least 20 times larger than the sample size, and that each cell should have a value of at least 5 for the chi square test to give reasonable results. What rule of thumb is Koosis applying in this case? By what criterion - however rough and subjective - would you decide whether sample variances were too different to apply ANOVA?
 
  • #4
I don't know the answer to question "how different is too different". Tutorials on the web apply the F test, just as you did to investigate the equality of variances.

I doubt the following is Koosis's reason, but it would be an excuse to reactivate the thought center: Bonferoni correction: http://en.wikipedia.org/wiki/Bonferroni_correction
 
  • #5


I would first like to clarify that ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more groups. It does not directly compare variances, although the sample variances are used in the calculation of the F statistic.

In the given scenario, the sample variances are 100, 81, and 64, which are not too different from each other. However, in the second scenario, the variances are 144, 81, and 64, which are significantly different from each other. The question of "how different is too different?" is a valid one, and the answer depends on various factors such as the sample size, the number of groups being compared, and the significance level chosen for the test.

One way to determine if the variances are too different is by conducting an F test, as mentioned in the question. The F test compares the ratio of the largest to the smallest sample variance to a critical value, which is based on the chosen significance level and the degrees of freedom. In this case, the critical value for a 5% significance level with 2 and 8 degrees of freedom is 3.68. If the calculated F value is greater than this critical value, it suggests that the variances are significantly different. However, as mentioned before, the F test is not the primary method for comparing variances.

Another approach is to visually inspect the distributions of the data. If the data from the different groups have a similar spread and shape, then the variances are not too different. On the other hand, if the data from one group has a much larger spread or a different shape compared to the others, then the variances may be significantly different.

Regarding the assumption of normality, it is usually recommended to have a larger sample size (typically >30) for ANOVA to be robust to violations of this assumption. However, if the sample size is small, it is important to check the normality of the data before proceeding with the analysis.

To answer the question about the populations, it is important to define the population in the context of the study. In this case, the population could be considered as three separate groups of students, each taught with a different method. The sample represents a subset of this population, and the results can be generalized to the larger population. It is not advisable to think too hard about the population in such cases, as it
 

1. What is a sample variance?

A sample variance is a measure of how spread out a set of data points are from the mean (average) of the data. It is calculated by taking the sum of the squared differences between each data point and the mean, and then dividing by the number of data points minus one.

2. Why is sample variance important in statistics?

Sample variance is important because it helps us understand the variability of a population based on a sample of data. It is used in many statistical tests, such as ANOVA, to determine if there are significant differences between groups.

3. What is the relationship between sample variance and ANOVA?

ANOVA (Analysis of Variance) is a statistical test that compares the means of two or more groups. Sample variance is used in ANOVA to calculate the F-statistic, which determines if there is a significant difference between the means of the groups being compared.

4. Is there a certain value of sample variance that is considered "too different"?

There is no specific value of sample variance that is considered "too different." The interpretation of sample variance depends on the context of the data and the research question being asked. Generally, a larger sample variance indicates more variability in the data.

5. How can we determine if the sample variances are significantly different in ANOVA?

In ANOVA, the F-statistic is used to determine if the sample variances are significantly different. A larger F-statistic indicates a larger difference between the sample variances, and a smaller p-value (typically <0.05) indicates that the difference is statistically significant.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
438
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
6K
  • Calculus and Beyond Homework Help
Replies
2
Views
583
Back
Top