What does the one-way analysis of variance illustrate in this case?

  • Context: Undergrad 
  • Thread starter Thread starter EngWiPy
  • Start date Start date
  • Tags Tags
    Analysis Variance
Click For Summary

Discussion Overview

The discussion revolves around the conceptual understanding of one-way analysis of variance (ANOVA) in the context of measuring the relationship between the categorical variable of drive-wheels (4wd, fwd, rwd) and the continuous variable of price. Participants explore how ANOVA can illustrate differences in price across these categories and the implications of the F-score in this analysis.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that one-way ANOVA serves as a t-test for comparing prices across three groups of drive-wheels.
  • Others argue that correlation is not applicable here since one variable is categorical, suggesting that the term "effect" is more appropriate than "correlation."
  • A participant suggests that a high F-score indicates significant deviation in prices among the groups, implying that the drive-wheels type may affect price.
  • Some participants express uncertainty about interpreting the results, questioning how to articulate the significance of the findings in relation to the dataset's structure and instance counts.
  • There is mention of the need for post hoc tests to make specific comparisons between groups if ANOVA indicates significant differences.
  • One participant notes that ANOVA accounts for unbalanced designs in the dataset.
  • Another participant highlights the importance of reporting the p-value associated with the ANOVA results.

Areas of Agreement / Disagreement

Participants generally agree that one-way ANOVA can indicate significant differences in price based on drive-wheels type, but there is disagreement on the interpretation of these results, particularly regarding the use of the term "correlation." The discussion remains unresolved on how to best articulate the implications of the findings.

Contextual Notes

Participants note that correlation cannot be calculated due to the categorical nature of one variable, and there are discussions about the implications of the dataset's structure, including the number of instances for each drive-wheels type.

Who May Find This Useful

This discussion may be useful for individuals interested in statistical analysis, particularly those exploring the application of ANOVA in comparing categorical and continuous variables in datasets.

EngWiPy
Messages
1,361
Reaction score
61
Hello,

I was reading an example where correlation between two features is to be measured. The two features are drive-wheels and price. The drive-wheels feature has three values: 4wd, fwd, and rwd. Then the example went on to calculate the F-score without much details using Python. Without going into the code's details, conceptually, what does the one-way analysis of variance illustrate in this case? To find the correlation between drive-wheels feature as a whole and the price? and how? Why does high F-score mean strong correlation?

Thanks
 
Physics news on Phys.org
S_David said:
Without going into the code's details, conceptually, what does the one-way analysis of variance illustrate in this case?
The one-way ANOVA is basically a t-test between three groups instead of just two. You are just testing if the price for 4wd is the same as the price for fwd and the same as the price for rwd
 
  • Like
Likes   Reactions: EngWiPy
Dale said:
The one-way ANOVA is basically a t-test between three groups instead of just two. You are just testing if the price for 4wd is the same as the price for fwd and the same as the price for rwd

And how is this related to the correlation?
 
S_David said:
And how is this related to the correlation?
It's not, or rather, I suspect the word correlation may be a bit abused here.
We can only calculate correlation is if we have two numerical variables.
In this case one of the variables is categorical, meaning we cannot calculate a correlation.

We do an ANOVA F-test to tell if at least one of the 3 categories has a significantly different price than the others.
The higher the F-score, the more do the 3 prices deviate from the mean.
 
  • Like
Likes   Reactions: EngWiPy
I like Serena said:
...
The higher the F-score, the more do the 3 prices deviate from the mean.

What does this imply? What is the importance of this information?

I think correlation here means if the price changes significantly with the drive-wheels type. I suspect that if the drive-wheels types have very different mean prices, this implies that the car's price is affected by the drive-wheels type, and there is a "correlation". But if all have the same mean price, then the car's price doesn't depend on the drive-wheels type. Is this a valid interpretation?
 
S_David said:
What does this imply? What is the importance of this information?

I think correlation here means if the price changes significantly with the drive-wheels type. I suspect that if the drive-wheels types have very different mean prices, this implies that the car's price is affected by the drive-wheels, and there is a "correlation". But I am not sure!
It doesn't tell which of the prices deviates. Two of them may still even be the same.
Still, if the ANOVA F-score tells us that at least one of the prices is significantly different from the mean, we can follow up with dedicated t-tests for each pair of categories to tell which categories are actually significantly different in price.
 
  • Like
Likes   Reactions: EngWiPy
S_David said:
... Is this a valid interpretation?
Essentially, yes. With the small point that you shouldn’t call it a “correlation”. In your case price is a numeric variable and drive-wheels is a categorical variable. A correlation is between two numeric variables. For a categorical variable it is called an “effect” (which unfortunately makes it sound like it is identifying a cause and effect relationship, but nevertheless that is the traditional usage)
 
  • Like
Likes   Reactions: EngWiPy
I like Serena said:
It doesn't tell which of the prices deviates. Two of them may still even be the same.
Still, if the ANOVA F-score tells us that at least one of the prices is significantly different from the mean, we can follow up with dedicated t-tests for each pair of categories to tell which categories are actually significantly different in price.

Right, fwd and rwd pair has the highest F-score. Again: what does this mean?
 
Dale said:
Essentially, yes. With the small point that you shouldn’t call it a “correlation”. In your case price is a numeric variable and drive-wheels is a categorical variable. A correlation is between two numeric variables. For a categorical variable it is called an “effect” (which unfortunately makes it sound like it is identifying a cause and effect relationship, but nevertheless that is the traditional usage)
One may, though, use a Polychoric FAnalysis to find actual numerical correlation between underlying latent constructs like in depression, intelligence, etc.
 
  • Like
Likes   Reactions: EngWiPy
  • #10
S_David said:
Right, fwd and rwd pair has the highest F-score. Again: what does this mean?
That is the most significant effect (the effect least likely to be 0)
 
  • Like
Likes   Reactions: WWGD and EngWiPy
  • #11
Dale said:
That is the most significant effect (the effect least likely to be 0)

Yes, but how to translate this in words for the example in hand. Can we say, for instance, the price changes significantly by whether the drive-wheels are front (fwd) or rear (rwd), but doesn't change significantly on average when comparing the price of other pairs, e.g., the average price of fwd and 4wd is almost the same? If so, how to account for the number of instances in the dataset for each type? For example, the fwd and rwd each may have many examples or instances, such that the price varies significantly within each type (and thus the average will be different), while 4wd may have few instances.
 
  • #12
S_David said:
Yes, but how to translate this in words for the example in hand.
Personally, based on what you have described of your statistics, if I were writing a scientific paper on the topic I would only say “drive-wheel had a significant (p=?) effect on price”. That is all that you are statistically justified in saying from an ANOVA. You can report the three group means and the grand mean, but you cannot claim that any specific contrast is significant.

If you wish to make specific comparisons between groups then you would perform a post hoc test (my favorite is Tukey’s Honestly Significant Difference or HSD). That would allow you to make specific comparisons between 4wd and fwd etc.

S_David said:
If so, how to account for the number of instances in the dataset for each type?
The ANOVA automatically accounts for that, but you should report that your design was “unbalanced”
 
  • Like
Likes   Reactions: I like Serena, FactChecker and EngWiPy
  • #13
Dale said:
Personally, based on what you have described of your statistics, if I were writing a scientific paper on the topic I would only say “drive-wheel had a significant (p=?) effect on price”. That is all that you are statistically justified in saying from an ANOVA. You can report the three group means and the grand mean, but you cannot claim that any specific contrast is significant.

If you wish to make specific comparisons between groups then you would perform a post hoc test (my favorite is Tukey’s Honestly Significant Difference or HSD). That would allow you to make specific comparisons between 4wd and fwd etc.

The ANOVA automatically accounts for that, but you should report that your design was “unbalanced”

The p-value was on the order of ##10^{-20}##. Thank you.
 
  • #14
S_David said:
The p-value was on the order of ##10^{-20}##. Thank you.
So I would report that as “drive-wheel had a significant (p<0.0001) effect on price”
 
  • Like
Likes   Reactions: EngWiPy

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 58 ·
2
Replies
58
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 23 ·
Replies
23
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 54 ·
2
Replies
54
Views
6K