F-test and One-way ANOVA (in R)?

  • Context: Undergrad 
  • Thread starter Thread starter x^2
  • Start date Start date
  • Tags Tags
    anova
Click For Summary
SUMMARY

This discussion focuses on performing a One-way ANOVA test in R to compare multiple groups of data measurements. The user initially attempted an F-test but encountered an unusually high F-value due to potential misuse of sum-of-squares instead of mean-square values. The correct approach involves using the linear model function lm() followed by the anova() function in R to generate the ANOVA table. It is advised to avoid using Excel for statistical analysis due to its limitations and known issues.

PREREQUISITES
  • Understanding of One-way ANOVA and its application in statistical analysis.
  • Familiarity with R programming, specifically the lm() and anova() functions.
  • Knowledge of statistical concepts such as mean-square and sum-of-squares.
  • Basic data manipulation in R, including handling vectors and factors.
NEXT STEPS
  • Learn how to interpret the ANOVA table output in R.
  • Explore the use of the t.test() function in R for pairwise comparisons.
  • Investigate advanced statistical packages in R for more complex analyses.
  • Review the documentation for R's statistical models, particularly the section on Linear Models.
USEFUL FOR

Statisticians, data analysts, and researchers looking to perform robust statistical analyses using R, particularly those interested in comparing multiple groups of data measurements.

x^2
Messages
19
Reaction score
1
I have five groupings of data measurements, each group with a different quantity of measurements. I want to see if one grouping has higher measurements than the rest, but was told that using multiple t-tests was incorrect and that I should use ANOVA.

I'm using R for the analysis. Exactly how might I go about this? I've looked at a couple tutorials, but the examples are too complex for me to understand.

Using the formulas http://en.wikipedia.org/wiki/F-test#One-way_ANOVA_example" I tried an F-test, but got an F-value of on the order of 10^16 as my "within-group" sum of squares value is on the order of 10^-13.

Could someone show me how I'd run a simple ANOVA test in R given a few columns of data and how to interpret the results to test whether one grouping has statistically higher values?

Thank you,
x^2
 
Last edited by a moderator:
Physics news on Phys.org
I can't help with doing the analysis in R, but I would be able to do it in Excel. I would run the ANOVA calculations just like the example shown in the Wikipedia article.

Did you use the mean-square values, not the sum-of-squares, to compute the F-value? Have you considered the possibility that your F-value really is of order 1016?

If you really have doubts, perhaps you could post your data here. How many total measurements do you have?

EDIT:
As a test, you could try a t-test between just the groups with the highest and lowest average. You seem confident in your ability to do a t-test; see if you get a t-value that is outrageously large like your F-value was.
 
Last edited:
Don't use Excel, for a variety of reasons.
"“Meanwhile, researchers should continue to avoid using the statistical functions in Excel 2007 for any scientific purpose.”
- Yalta (2008), ref 1 below

“... it is not safe to assume that Microsoft Excel’s statistical procedures give the correct answer. Persons who wish to conduct statistical analyses should use some other package.”
- McCullough and Heiser (2008), ref 2 below

"If you need to perform analysis of variance, avoid using Excel, unless you are dealing with extremely simple problems."
- Statistical Services Centre, Univ. of Reading, U.K. (at A, below)

"Excel is of very limited use in the formal statistical analysis of data unless your experimental design is very simple. . . . the "Data Analysis Toolpack" provided with Excel is no easier to use than most statistics packages, has very limited capability, has known bugs and so, on the whole, is not worth bothering with. "
-Neil Cox, ref 7 below

"Enterprises should advise their scientists and professional statisticians not to use Microsoft Excel for substantive statistical analysis. Instead, enterprises should look for professional statistical analysis software certified to pass the (NIST) Statistical Reference Datasets tests to their users' required level of accuracy."

Problems have existed in Excel's statistical analysis from the earliest years, and most (if not all) have not been addressed.

In R: I don't know what you've named your variables, so I'll use these names.
MyNumericalData - this is the vector that contains the measured values
MyCategories - this is the variable that contains the names of the factors (note: it has to be the same length as MyNumericalData)

If you do this:

myfit<-lm(MyNumericalData~MyCategories)

and then do

anova(myfit)

you have your ANOVA table.

If you have the pdf manuals for R installed, look at
"An Introduction to R"
The section on Statistical Models in R, subsection Linear Models
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
6K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 23 ·
Replies
23
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 3 ·
Replies
3
Views
4K