What does the one-way analysis of variance illustrate in this case?

In summary: Yes, but how to translate this in words for the example in hand. Can we say, for instance, the price changes significantly by whether the drive-wheels are front (fwd) or rear (rwd), but doesn't change significantly on average when comparing the price of other pairs, e.g., the average price of fwd and 4wd is almost the same?If so, how to account for the number of instances in the dataset for each type?One way to account for the number of instances in the dataset for each type is to use a Chi-squared statistic.
  • #1
EngWiPy
1,368
61
Hello,

I was reading an example where correlation between two features is to be measured. The two features are drive-wheels and price. The drive-wheels feature has three values: 4wd, fwd, and rwd. Then the example went on to calculate the F-score without much details using Python. Without going into the code's details, conceptually, what does the one-way analysis of variance illustrate in this case? To find the correlation between drive-wheels feature as a whole and the price? and how? Why does high F-score mean strong correlation?

Thanks
 
Physics news on Phys.org
  • #2
S_David said:
Without going into the code's details, conceptually, what does the one-way analysis of variance illustrate in this case?
The one-way ANOVA is basically a t-test between three groups instead of just two. You are just testing if the price for 4wd is the same as the price for fwd and the same as the price for rwd
 
  • Like
Likes EngWiPy
  • #3
Dale said:
The one-way ANOVA is basically a t-test between three groups instead of just two. You are just testing if the price for 4wd is the same as the price for fwd and the same as the price for rwd

And how is this related to the correlation?
 
  • #4
S_David said:
And how is this related to the correlation?
It's not, or rather, I suspect the word correlation may be a bit abused here.
We can only calculate correlation is if we have two numerical variables.
In this case one of the variables is categorical, meaning we cannot calculate a correlation.

We do an ANOVA F-test to tell if at least one of the 3 categories has a significantly different price than the others.
The higher the F-score, the more do the 3 prices deviate from the mean.
 
  • Like
Likes EngWiPy
  • #5
I like Serena said:
...
The higher the F-score, the more do the 3 prices deviate from the mean.

What does this imply? What is the importance of this information?

I think correlation here means if the price changes significantly with the drive-wheels type. I suspect that if the drive-wheels types have very different mean prices, this implies that the car's price is affected by the drive-wheels type, and there is a "correlation". But if all have the same mean price, then the car's price doesn't depend on the drive-wheels type. Is this a valid interpretation?
 
  • #6
S_David said:
What does this imply? What is the importance of this information?

I think correlation here means if the price changes significantly with the drive-wheels type. I suspect that if the drive-wheels types have very different mean prices, this implies that the car's price is affected by the drive-wheels, and there is a "correlation". But I am not sure!
It doesn't tell which of the prices deviates. Two of them may still even be the same.
Still, if the ANOVA F-score tells us that at least one of the prices is significantly different from the mean, we can follow up with dedicated t-tests for each pair of categories to tell which categories are actually significantly different in price.
 
  • Like
Likes EngWiPy
  • #7
S_David said:
... Is this a valid interpretation?
Essentially, yes. With the small point that you shouldn’t call it a “correlation”. In your case price is a numeric variable and drive-wheels is a categorical variable. A correlation is between two numeric variables. For a categorical variable it is called an “effect” (which unfortunately makes it sound like it is identifying a cause and effect relationship, but nevertheless that is the traditional usage)
 
  • Like
Likes EngWiPy
  • #8
I like Serena said:
It doesn't tell which of the prices deviates. Two of them may still even be the same.
Still, if the ANOVA F-score tells us that at least one of the prices is significantly different from the mean, we can follow up with dedicated t-tests for each pair of categories to tell which categories are actually significantly different in price.

Right, fwd and rwd pair has the highest F-score. Again: what does this mean?
 
  • #9
Dale said:
Essentially, yes. With the small point that you shouldn’t call it a “correlation”. In your case price is a numeric variable and drive-wheels is a categorical variable. A correlation is between two numeric variables. For a categorical variable it is called an “effect” (which unfortunately makes it sound like it is identifying a cause and effect relationship, but nevertheless that is the traditional usage)
One may, though, use a Polychoric FAnalysis to find actual numerical correlation between underlying latent constructs like in depression, intelligence, etc.
 
  • Like
Likes EngWiPy
  • #10
S_David said:
Right, fwd and rwd pair has the highest F-score. Again: what does this mean?
That is the most significant effect (the effect least likely to be 0)
 
  • Like
Likes WWGD and EngWiPy
  • #11
Dale said:
That is the most significant effect (the effect least likely to be 0)

Yes, but how to translate this in words for the example in hand. Can we say, for instance, the price changes significantly by whether the drive-wheels are front (fwd) or rear (rwd), but doesn't change significantly on average when comparing the price of other pairs, e.g., the average price of fwd and 4wd is almost the same? If so, how to account for the number of instances in the dataset for each type? For example, the fwd and rwd each may have many examples or instances, such that the price varies significantly within each type (and thus the average will be different), while 4wd may have few instances.
 
  • #12
S_David said:
Yes, but how to translate this in words for the example in hand.
Personally, based on what you have described of your statistics, if I were writing a scientific paper on the topic I would only say “drive-wheel had a significant (p=?) effect on price”. That is all that you are statistically justified in saying from an ANOVA. You can report the three group means and the grand mean, but you cannot claim that any specific contrast is significant.

If you wish to make specific comparisons between groups then you would perform a post hoc test (my favorite is Tukey’s Honestly Significant Difference or HSD). That would allow you to make specific comparisons between 4wd and fwd etc.

S_David said:
If so, how to account for the number of instances in the dataset for each type?
The ANOVA automatically accounts for that, but you should report that your design was “unbalanced”
 
  • Like
Likes I like Serena, FactChecker and EngWiPy
  • #13
Dale said:
Personally, based on what you have described of your statistics, if I were writing a scientific paper on the topic I would only say “drive-wheel had a significant (p=?) effect on price”. That is all that you are statistically justified in saying from an ANOVA. You can report the three group means and the grand mean, but you cannot claim that any specific contrast is significant.

If you wish to make specific comparisons between groups then you would perform a post hoc test (my favorite is Tukey’s Honestly Significant Difference or HSD). That would allow you to make specific comparisons between 4wd and fwd etc.

The ANOVA automatically accounts for that, but you should report that your design was “unbalanced”

The p-value was on the order of ##10^{-20}##. Thank you.
 
  • #14
S_David said:
The p-value was on the order of ##10^{-20}##. Thank you.
So I would report that as “drive-wheel had a significant (p<0.0001) effect on price”
 
  • Like
Likes EngWiPy

1. What is One-Way Analysis of Variance (ANOVA)?

One-Way ANOVA is a statistical method used to compare the means of two or more groups. It allows for the identification of significant differences between the groups, and can be used to determine which group(s) are responsible for the differences.

2. When should I use One-Way ANOVA?

One-Way ANOVA is appropriate when you have one independent variable (with two or more levels) and a continuous dependent variable. It is used to test for differences between the means of the groups, and is commonly used in experimental and observational studies.

3. How do I interpret the results of One-Way ANOVA?

The results of One-Way ANOVA will provide a p-value, which indicates the probability of obtaining the observed differences between the groups by chance alone. If the p-value is less than the chosen significance level (usually 0.05), then there is evidence to suggest that the groups have significantly different means. Further post-hoc tests may be conducted to determine which groups are significantly different from each other.

4. What is the difference between One-Way ANOVA and Two-Way ANOVA?

The main difference between One-Way ANOVA and Two-Way ANOVA is the number of independent variables. One-Way ANOVA has one independent variable, while Two-Way ANOVA has two or more independent variables. In One-Way ANOVA, you are comparing the means of multiple groups, whereas in Two-Way ANOVA, you are examining how two or more independent variables interact to affect the dependent variable.

5. What is the assumption of homogeneity of variance in One-Way ANOVA?

The assumption of homogeneity of variance in One-Way ANOVA is that the variances of the dependent variable are equal across all groups. This means that the variability within each group is similar and not significantly different from each other. Violation of this assumption can result in inaccurate conclusions and may require the use of alternative statistical tests.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Quantum Interpretations and Foundations
2
Replies
54
Views
3K
  • Biology and Medical
Replies
5
Views
1K
  • Quantum Interpretations and Foundations
Replies
3
Views
1K
  • Special and General Relativity
Replies
15
Views
1K
  • General Discussion
2
Replies
46
Views
3K
  • Quantum Interpretations and Foundations
Replies
25
Views
1K
  • Quantum Interpretations and Foundations
Replies
2
Views
1K
Replies
69
Views
10K
Back
Top