What does the one-way analysis of variance illustrate in this case?

EngWiPy · Dec 9, 2017

Hello,

I was reading an example where correlation between two features is to be measured. The two features are drive-wheels and price. The drive-wheels feature has three values: 4wd, fwd, and rwd. Then the example went on to calculate the F-score without much details using Python. Without going into the code's details, conceptually, what does the one-way analysis of variance illustrate in this case? To find the correlation between drive-wheels feature as a whole and the price? and how? Why does high F-score mean strong correlation?

Thanks

Dale · Dec 9, 2017

S_David said:

Without going into the code's details, conceptually, what does the one-way analysis of variance illustrate in this case?

The one-way ANOVA is basically a t-test between three groups instead of just two. You are just testing if the price for 4wd is the same as the price for fwd and the same as the price for rwd

EngWiPy · Dec 9, 2017

Dale said:

The one-way ANOVA is basically a t-test between three groups instead of just two. You are just testing if the price for 4wd is the same as the price for fwd and the same as the price for rwd

And how is this related to the correlation?

I like Serena · Dec 9, 2017

S_David said:

And how is this related to the correlation?

It's not, or rather, I suspect the word correlation may be a bit abused here.
We can only calculate correlation is if we have two numerical variables.
In this case one of the variables is categorical, meaning we cannot calculate a correlation.

We do an ANOVA F-test to tell if at least one of the 3 categories has a significantly different price than the others.
The higher the F-score, the more do the 3 prices deviate from the mean.

EngWiPy · Dec 9, 2017

I like Serena said:

...
The higher the F-score, the more do the 3 prices deviate from the mean.

What does this imply? What is the importance of this information?

I think correlation here means if the price changes significantly with the drive-wheels type. I suspect that if the drive-wheels types have very different mean prices, this implies that the car's price is affected by the drive-wheels type, and there is a "correlation". But if all have the same mean price, then the car's price doesn't depend on the drive-wheels type. Is this a valid interpretation?

I like Serena · Dec 9, 2017

S_David said:

What does this imply? What is the importance of this information?

I think correlation here means if the price changes significantly with the drive-wheels type. I suspect that if the drive-wheels types have very different mean prices, this implies that the car's price is affected by the drive-wheels, and there is a "correlation". But I am not sure!

It doesn't tell which of the prices deviates. Two of them may still even be the same.
Still, if the ANOVA F-score tells us that at least one of the prices is significantly different from the mean, we can follow up with dedicated t-tests for each pair of categories to tell which categories are actually significantly different in price.

Dale · Dec 9, 2017

S_David said:

... Is this a valid interpretation?

Essentially, yes. With the small point that you shouldn’t call it a “correlation”. In your case price is a numeric variable and drive-wheels is a categorical variable. A correlation is between two numeric variables. For a categorical variable it is called an “effect” (which unfortunately makes it sound like it is identifying a cause and effect relationship, but nevertheless that is the traditional usage)

EngWiPy · Dec 9, 2017

I like Serena said:

It doesn't tell which of the prices deviates. Two of them may still even be the same.
Still, if the ANOVA F-score tells us that at least one of the prices is significantly different from the mean, we can follow up with dedicated t-tests for each pair of categories to tell which categories are actually significantly different in price.

Right, fwd and rwd pair has the highest F-score. Again: what does this mean?

WWGD · Dec 9, 2017

Dale said:

Essentially, yes. With the small point that you shouldn’t call it a “correlation”. In your case price is a numeric variable and drive-wheels is a categorical variable. A correlation is between two numeric variables. For a categorical variable it is called an “effect” (which unfortunately makes it sound like it is identifying a cause and effect relationship, but nevertheless that is the traditional usage)

One may, though, use a Polychoric FAnalysis to find actual numerical correlation between underlying latent constructs like in depression, intelligence, etc.

Dale · Dec 9, 2017

S_David said:

Right, fwd and rwd pair has the highest F-score. Again: what does this mean?

That is the most significant effect (the effect least likely to be 0)

EngWiPy · Dec 10, 2017

Dale said:

That is the most significant effect (the effect least likely to be 0)

Yes, but how to translate this in words for the example in hand. Can we say, for instance, the price changes significantly by whether the drive-wheels are front (fwd) or rear (rwd), but doesn't change significantly on average when comparing the price of other pairs, e.g., the average price of fwd and 4wd is almost the same? If so, how to account for the number of instances in the dataset for each type? For example, the fwd and rwd each may have many examples or instances, such that the price varies significantly within each type (and thus the average will be different), while 4wd may have few instances.

Dale · Dec 10, 2017

S_David said:

Yes, but how to translate this in words for the example in hand.

Personally, based on what you have described of your statistics, if I were writing a scientific paper on the topic I would only say “drive-wheel had a significant (p=?) effect on price”. That is all that you are statistically justified in saying from an ANOVA. You can report the three group means and the grand mean, but you cannot claim that any specific contrast is significant.

If you wish to make specific comparisons between groups then you would perform a post hoc test (my favorite is Tukey’s Honestly Significant Difference or HSD). That would allow you to make specific comparisons between 4wd and fwd etc.

S_David said:

If so, how to account for the number of instances in the dataset for each type?

The ANOVA automatically accounts for that, but you should report that your design was “unbalanced”

EngWiPy · Dec 10, 2017

Dale said:

Personally, based on what you have described of your statistics, if I were writing a scientific paper on the topic I would only say “drive-wheel had a significant (p=?) effect on price”. That is all that you are statistically justified in saying from an ANOVA. You can report the three group means and the grand mean, but you cannot claim that any specific contrast is significant.

If you wish to make specific comparisons between groups then you would perform a post hoc test (my favorite is Tukey’s Honestly Significant Difference or HSD). That would allow you to make specific comparisons between 4wd and fwd etc.

The ANOVA automatically accounts for that, but you should report that your design was “unbalanced”

The p-value was on the order of ##10^{-20}##. Thank you.

Dale · Dec 10, 2017

S_David said:

The p-value was on the order of ##10^{-20}##. Thank you.

So I would report that as “drive-wheel had a significant (p<0.0001) effect on price”

What does the one-way analysis of variance illustrate in this case?

1. What is One-Way Analysis of Variance (ANOVA)?

2. When should I use One-Way ANOVA?

3. How do I interpret the results of One-Way ANOVA?

4. What is the difference between One-Way ANOVA and Two-Way ANOVA?

5. What is the assumption of homogeneity of variance in One-Way ANOVA?

Similar threads

Hot Threads

Recent Insights