Population proportion from a sample estimate

1. Jul 26, 2015

I am interested to know which proper statistical test to use to know the population proportion from a sample taken from the population.
For example, a sample of 20 people with 7 persons prefer the red color and 13 prefer the blue color. Which of the following methods should be used to conclude about whether there is a real color preference in the population.
1) binomial distribution with a two tails test assuming a success rate of 0.5 and calculating the sum of probability of x=o to 7 then compare this with 0.05.
2) calculate the population proportion from the sample proportion (7/20) using z- test of single population proportion and check whether 0.5 is included within the range of calculated proportion.

2. Jul 26, 2015

FactChecker

The Chi-squared goodness of fit test is easy to apply. In your example, if there was no preference you would expect on average 10 in each red and blue group. Putting this in the test like at http://vassarstats.net/csfit.html gives a probability 0.2636. This means that if there really was no preference, the odds of that result or one more biased occurring is about one in every 4 trials.

3. Jul 26, 2015

Stephen Tashi

Those words suggest that you want to estimate the population proportion rather than hypothesis test whether the population proportion is different than 0.5.

4. Jul 26, 2015

I think that both statments have the same interpretation. Knowing the population propotion from a sample proportion can tell whether it statistically differes from 0.5.

5. Jul 26, 2015

So again what is the optimal test:
I have probably 4 tests now each with different result.
1) Binomial distribution with the trial x=7/20, success rate of 0.5 and the number of sample n=20.
2) z-test of population proportion from a sample proportion.
3) Comparing two sample proportions (7/20) and (13/2) using t-test of sample proportion.
4) Chi-Square for goodness of fit.

All of those tests gave non-statistically significant resulted which mean the null hypothesis of equal proportion is not rejected.

6. Jul 26, 2015

I used the link to calculate the test statistics and I found p=0.0099 which is statistically significant. However, when I used excel I got test statistics of 0.057 which not statistically significant. However, Excel didnot give me the critical value so I am not sure 0.057 is the test statistcs value or it is p-value.

7. Jul 26, 2015

FactChecker

One of the well known statistics packages is R. It is open source and well documented. The Chi-squared goodness of fit test is described in tutorials like I don't know if it is the best you can get, but no one would criticize you for using R. The R command for your example is:
chisq.test(c(7,13), p=c(0.5,0.5))

which gives this result:
Chi-squared test for given probabilities
data: c(7, 13)
X-squared = 1.8, df = 1, p-value = 0.1797

This is significantly smaller than the answer from the web site link. That may be because there is not enough data for a valid Chi-squared test. Or one of them may do some corrections that the other does not.

8. Jul 27, 2015

Stephen Tashi

No, they don't have the same interpretation in mathematical statistics.

I don't understand your language. You can't "know" the population proportion just from a sample proportion.

Perhaps you mean "Comparing a sample proportion to an assumed population proportion of 0.5 ....".

9. Jul 27, 2015

I am interest too in the reasoning aspect of the solution. For example, why Chi-Square is more usueful in solving this question than binomial or z-test? I known that many statistical problems may have many solutions but in my case what is the advantage of going for one test over the other?

10. Jul 27, 2015

No I mean, if the population proportion is unknown, we can still calculate it from a sample proportion using z-test.

11. Jul 27, 2015

If z=(p±π)/√p(1-p)/n then, π= p±z√p(1-p)/n where p is the sample proportion, π is the population proportion which is unknown, n is the sample size.

12. Jul 27, 2015

Stephen Tashi

By that definition of z, you can't calculate it without already knowing $\pi$.

I think you are using the terminology "z test"when you mean "z statistic".

Perhaps you are thinking about "confidence intervals" for an estimate, not about hypothesis testing.

13. Jul 27, 2015

Yes I mean the confidence interval where the population proportion lies. In this case the denominator will be the standard error not the standard deviation. So still sample proportion can be used to derive the population proportion.

So the null hypothesis that the sample proportion is not different from a population proportion of 0.5 will use the concept of standard deviation while calculating an estimate about the population proportion from a sample one without knowing it would gives the confidence intervals for that estimate. So which one is more appropriate in my case? And can we face a situation where the null hypothesis is not rejected (no difference between the sample and population proportions) while an estimate of the population proportion has a confidence interval not including 0.5?

Last edited: Jul 27, 2015
14. Jul 27, 2015

FactChecker

I think you could use either one. I don't know which would be more powerful of if there would be any difference in this case. In general, the Chi-squared allows you to test if a sample fits a distribution with several possible categories, not just two. The other tests would not apply in that case.

15. Jul 28, 2015

Stephen Tashi

I think we need to clarify exactly what you want to do. For example, do you want to publish a paper in a scientific journal? Or use your results to invest in the stock market? Or design some machinery?

Statistics is a very technical topic and the concept of "confidence intervals" has one meaning for the-man-in-the-street and very different technical meaning in statistics. How far do you need to progress in the technical understanding of statistics to accomplish your goal? You might have to meet a higher standard to publish in a scientific journal than a personal standard you'd use to pick your own stock market investments.

16. Jul 28, 2015

.
I never though that it is that much complicated. My question was very simple one, Does 13/20 indicated that the preference of blue color is higher than the red color?

If you are facing this problem, how will you solve it?

17. Jul 28, 2015

Stephen Tashi

Now you are posing the question as hpothesis test instead of problem involving confidence intervals.

First I would try to clarify the exact question that I want to ask. I'll attempt to state your question. You want do do a hypothesis test that the population proportion is different than 0.5. You have several alternative statistics that can be used. You want to know which statistic is "optimal".

One concept of "optimal" is the concept of the "most powerful". In your problem, the "power" of a statistical test at a given true value of the population proportion is the probability that the null hypothesis ( which is that the population proportion is 0.5) is rejected. For example, the power of the z-test at an assumed true population proportion of 0.633 would be the probability that the z-test rejects the null hypothesis when 0.633 is the actual proportion. To illustrate this conceptually, you can imagine doing a Monte-Carlo simulation to estimate the power. You would repeatedly simulate drawing 20 individuals from a population where the true proportion (favoring a color) is 0.633. You would apply to z-test to each of these batches of 20 individuals and see what fraction of times the z-test correctly rejects the null hypothesis.

We can probably look up material about the relative power of various tests of population proportions. Does the "power" of tests describe what you want to know about?

18. Jul 28, 2015

If I understand you correctly, this means that the population proportion of 0.633 in this example is known and we would like to know whether a sample proportion of 0.5 is drawn from that population by applying z-test. Probably it will be rejected at 0.05 confidence level and this means the z-test in powerful in this case because p-value will be less than 0.05.

My case is the opposite, I don`t know the population proportion, I only know a sample proportion which is 13/20 and I would like to know whether the population proportion is 0.5 as 13/20 of my sample probably comes by chance only.

Now if I follow you then I assume that the population proportion is 0.5 and I would like to know whether 13/20 of my sample is drawn from this population by applying z-test. Then 0.5 is an assumed value not a true value but I can still use it to calculate the standard deviation √π(1-π). In this case I will not include n ( the sample size) in my calculation. Am I right?

Last edited: Jul 28, 2015
19. Jul 29, 2015

I think to make one small correction by including the sample size, n, in the denominator to represent the standard error.
so Ho: population proportion is 0.5.
H1: population proportion ≠ 0.5.

20. Jul 29, 2015

Stephen Tashi

Yes
No
We don't want to know the probability of drawing a sample where the proportion is exactly 0.5.

We want to know the probability that a specific statistical test correctly rejects the null hypothesis that the population proportion is 0.5.

For example, "What is the probability that a z-test with a significance level of 0.05 correctly rejects the null hypothesis that the population proportion is 0.5".

To completely describe the power of the z-test, we plot a curve of its power for the full range of possible true population proportions. 0.633 is just one x-coordinate on this curve.

To compare the power of two tests, we plot their power curves on the same graph and see which curve is higher. We hope that one curve is always higher than the other, in which case the statistic with the higher curve is always better at rejecting the null hypothesis correctly, no matter which true value of the population proportion we use.

It may be that the curves cross, in which case you face a subjective choice.