Comparing Proportions: Choosing the Right Statistical Test for Your Data

Chas3down · May 26, 2013

Sorry, no subforum for statistics so I posted it here..

Homework Statement

So, if I have a given list of proportions

n = 64
.11
.14
.16
.14
.13
.16
.16

and I want to compare it to another group of percentages

n = North American Average
.18
.19
,20
.18
.14
.06
.05

What type of test would I use?

Mandelbroth · May 26, 2013

Chas3down said:

So, if I have a given list of proportions and I want to compare it to another group of percentages, what type of test would I use?

[I condensed your question for clarity and to save space]

What kind of comparison are you looking for? I'm thinking it's just a two-sample z-test for proportions.

Chas3down · May 26, 2013

Mandelbroth said:

[I condensed your question for clarity and to save space]

What kind of comparison are you looking for? I'm thinking it's just a two-sample z-test for proportions.

Just to see if there is a significant difference between the proportions or not.

I am comparing car colors from my school against the National Average. The first group (n=64) is 64 car colors I got from my school parking lot. The second group of proportions, is that of the North American Average.

I like Serena · May 26, 2013

Try a goodness-of-fit chi-square test.

Ray Vickson · May 26, 2013

Chas3down said:

Just to see if there is a significant difference between the proportions or not.

I am comparing car colors from my school against the National Average. The first group (n=64) is 64 car colors I got from my school parking lot. The second group of proportions, is that of the North American Average.

Validity, etc., depends on the nature of your data.

For instance, how do you condense information about car colors into a proportion? Typically, a proportion like 0.11 could be thought of as the number of 'yes' vs. 'no' answers to some type of question. How does a car's color fit into that type of scheme? The point is that if the numbers you show represent some type of highly 'massaged' figures, their distribution may be an artifact of your data-aggregation method and not reflective of reality. So: show us how you obtained those figures.

Chas3down · May 26, 2013

Well, I collected 64 car color samples (Chose 64 so it is a large enough number so I could generalize for my entire student parking lot) then had for example 10 black cars and did 10/64 to get the proportion of students who drive black color cars to school on that day.

Mandelbroth · May 26, 2013

Chas3down said:

Well, I collected 64 car color samples (Chose 64 so it is a large enough number so I could generalize for my entire student parking lot) then had for example 10 black cars and did 10/64 to get the proportion of students who drive black color cars to school on that day.

How formal do you want your test to be? If you really want a formal test, state between what wavelengths of light you consider to be blue, etc. There are other things to consider, but that might be better if you want a boolean ("yes/no") answer. For example, "red" can be very subjective. You could argue that all pink cars are red.

Mandelbroth said:

[I condensed your question for clarity and to save space]

What kind of comparison are you looking for? I'm thinking it's just a two-sample z-test for proportions.

Excuse my haste. You can use a two-sample z-test, but it would be weird since you are working with more than one proportion. Use a chi-squared based test for goodness of fit. That should work better.

Chas3down · May 26, 2013

Mandelbroth said:

How formal do you want your test to be? If you really want a formal test, state between what wavelengths of light you consider to be blue, etc. There are other things to consider, but that might be better if you want a boolean ("yes/no") answer. For example, "red" can be very subjective. You could argue that all pink cars are red.Excuse my haste. You can use a two-sample z-test, but it would be weird since you are working with more than one proportion. Use a chi-squared based test for goodness of fit. That should work better.

It is pretty informal.

Okay, yeah, I was thinking of using a chi-squared test, thanks for your thoughts.

Chas3down · May 26, 2013

hmm.. quick addition..

Would I do a chi squared GOF for proportions the same as I would for non-proportions?

just a summation of ((Observed proportion - Expected proportion)^2 / Expected Proportion) to get my chi squared value? And my degrees of freedom would just be number of cars i used to get my data -1 ?

I like Serena · May 26, 2013

Not quite.

The summation should be of ((Observed frequency - Expected frequency)^2 / Expected frequency).
You converted the frequencies to proportions, but you really need the frequencies.

The degrees of freedom is the number of colors - 1.

Chas3down · May 26, 2013

I like Serena said:

Not quite.

The summation should be of ((Observed frequency - Expected frequency)^2 / Expected frequency).
You converted the frequencies to proportions, but you really need the frequencies.

The degrees of freedom is the number of colors - 1.

Oh gotcha, so I should convert the average car color proportions to frequencies out of 64?

I like Serena · May 26, 2013

Chas3down · May 26, 2013

I like Serena said:

Yep.

Alright thanks, really a big help.. one final question, I should round all my expected car colors to whole numbers correct? Because you can't have 5.3 cars..

I like Serena · May 26, 2013

No. The expected cars should remain fractional.

Chas3down · May 26, 2013

I like Serena said:

No. The expected cars should remain fractional.

Okay, only reason i thought it would be the other way was because my ti-84 would only accept whole numbers or else it would error out... time to do it by hand.

Really a big help, thanks a lot man.

I like Serena · May 26, 2013

Huh? I'd expect your ti-84 to required whole numbers for the observed frequencies, which should indeed be whole, but not for the expected frequencies.

Chas3down · May 26, 2013

/facepalm.. thanks put it in wrong lol.

I like Serena · May 26, 2013

Comparing Proportions: Choosing the Right Statistical Test for Your Data

Homework Statement

Similar threads

Hot Threads

Prove that the integral is equal to ##\pi^2/8##

Solving the wave equation with piecewise initial conditions

Area of loop in x-y plane

Calculating radius of gyration of plane figure about x-axis

Solve this problem that involves induction

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective