Population proportion from a sample estimate

Adel Makram · Jul 26, 2015

I am interested to know which proper statistical test to use to know the population proportion from a sample taken from the population.
For example, a sample of 20 people with 7 persons prefer the red color and 13 prefer the blue color. Which of the following methods should be used to conclude about whether there is a real color preference in the population.
1) binomial distribution with a two tails test assuming a success rate of 0.5 and calculating the sum of probability of x=o to 7 then compare this with 0.05.
2) calculate the population proportion from the sample proportion (7/20) using z- test of single population proportion and check whether 0.5 is included within the range of calculated proportion.

FactChecker · Jul 26, 2015

The Chi-squared goodness of fit test is easy to apply. In your example, if there was no preference you would expect on average 10 in each red and blue group. Putting this in the test like at http://vassarstats.net/csfit.html gives a probability 0.2636. This means that if there really was no preference, the odds of that result or one more biased occurring is about one in every 4 trials.

Stephen Tashi · Jul 26, 2015

Adel Makram said:

to know the population proportion from a sample taken from the population.

Those words suggest that you want to estimate the population proportion rather than hypothesis test whether the population proportion is different than 0.5.

Adel Makram · Jul 26, 2015

Stephen Tashi said:

Those words suggest that you want to estimate the population proportion rather than hypothesis test whether the population proportion is different than 0.5.

I think that both statements have the same interpretation. Knowing the population propotion from a sample proportion can tell whether it statistically differes from 0.5.

Adel Makram · Jul 26, 2015

So again what is the optimal test:
I have probably 4 tests now each with different result.
1) Binomial distribution with the trial x=7/20, success rate of 0.5 and the number of sample n=20.
2) z-test of population proportion from a sample proportion.
3) Comparing two sample proportions (7/20) and (13/2) using t-test of sample proportion.
4) Chi-Square for goodness of fit.

All of those tests gave non-statistically significant resulted which mean the null hypothesis of equal proportion is not rejected.

Adel Makram · Jul 26, 2015

FactChecker said:

The Chi-squared goodness of fit test is easy to apply. In your example, if there was no preference you would expect on average 10 in each red and blue group. Putting this in the test like at http://vassarstats.net/csfit.html gives a probability 0.2636. This means that if there really was no preference, the odds of that result or one more biased occurring is about one in every 4 trials.

I used the link to calculate the test statistics and I found p=0.0099 which is statistically significant. However, when I used excel I got test statistics of 0.057 which not statistically significant. However, Excel didnot give me the critical value so I am not sure 0.057 is the test statistcs value or it is p-value.

FactChecker · Jul 26, 2015

One of the well known statistics packages is R. It is open source and well documented. The Chi-squared goodness of fit test is described in tutorials like I don't know if it is the best you can get, but no one would criticize you for using R. The R command for your example is:
chisq.test(c(7,13), p=c(0.5,0.5))

which gives this result:
Chi-squared test for given probabilities
data: c(7, 13)
X-squared = 1.8, df = 1, p-value = 0.1797

This is significantly smaller than the answer from the website link. That may be because there is not enough data for a valid Chi-squared test. Or one of them may do some corrections that the other does not.

Stephen Tashi · Jul 27, 2015

Adel Makram said:

I think that both statements have the same interpretation.

No, they don't have the same interpretation in mathematical statistics.

Knowing the population propotion from a sample proportion can tell whether it statistically differes from 0.5.

I don't understand your language. You can't "know" the population proportion just from a sample proportion.

Perhaps you mean "Comparing a sample proportion to an assumed population proportion of 0.5 ...".

Adel Makram · Jul 27, 2015

I am interest too in the reasoning aspect of the solution. For example, why Chi-Square is more usueful in solving this question than binomial or z-test? I known that many statistical problems may have many solutions but in my case what is the advantage of going for one test over the other?

Adel Makram · Jul 27, 2015

Stephen Tashi said:

No, they don't have the same interpretation in mathematical statistics.

I don't understand your language. You can't "know" the population proportion just from a sample proportion.

Perhaps you mean "Comparing a sample proportion to an assumed population proportion of 0.5 ...".

No I mean, if the population proportion is unknown, we can still calculate it from a sample proportion using z-test.

Adel Makram · Jul 27, 2015

If z=(p±π)/√p(1-p)/n then, π= p±z√p(1-p)/n where p is the sample proportion, π is the population proportion which is unknown, n is the sample size.

Stephen Tashi · Jul 27, 2015

Adel Makram said:

If z=(p±π)/√p(1-p)/n then, π= p±z√p(1-p)/n where p is the sample proportion, π is the population proportion which is unknown, n is the sample size.

By that definition of z, you can't calculate it without already knowing [itex] \pi [/itex].

I think you are using the terminology "z test"when you mean "z statistic".

Perhaps you are thinking about "confidence intervals" for an estimate, not about hypothesis testing.

Adel Makram · Jul 27, 2015

Stephen Tashi said:

By that definition of z, you can't calculate it without already knowing [itex] \pi [/itex].

I think you are using the terminology "z test"when you mean "z statistic".

Perhaps you are thinking about "confidence intervals" for an estimate, not about hypothesis testing.

Yes I mean the confidence interval where the population proportion lies. In this case the denominator will be the standard error not the standard deviation. So still sample proportion can be used to derive the population proportion.

So the null hypothesis that the sample proportion is not different from a population proportion of 0.5 will use the concept of standard deviation while calculating an estimate about the population proportion from a sample one without knowing it would gives the confidence intervals for that estimate. So which one is more appropriate in my case? And can we face a situation where the null hypothesis is not rejected (no difference between the sample and population proportions) while an estimate of the population proportion has a confidence interval not including 0.5?

FactChecker · Jul 27, 2015

I think you could use either one. I don't know which would be more powerful of if there would be any difference in this case. In general, the Chi-squared allows you to test if a sample fits a distribution with several possible categories, not just two. The other tests would not apply in that case.

Stephen Tashi · Jul 28, 2015

Adel Makram said:

Yes I mean the confidence interval where the population proportion lies.

I think we need to clarify exactly what you want to do. For example, do you want to publish a paper in a scientific journal? Or use your results to invest in the stock market? Or design some machinery?

Statistics is a very technical topic and the concept of "confidence intervals" has one meaning for the-man-in-the-street and very different technical meaning in statistics. How far do you need to progress in the technical understanding of statistics to accomplish your goal? You might have to meet a higher standard to publish in a scientific journal than a personal standard you'd use to pick your own stock market investments.

Adel Makram · Jul 28, 2015

Stephen Tashi said:

I think we need to clarify exactly what you want to do. For example, do you want to publish a paper in a scientific journal? Or use your results to invest in the stock market? Or design some machinery?

Statistics is a very technical topic and the concept of "confidence intervals" has one meaning for the-man-in-the-street and very different technical meaning in statistics. How far do you need to progress in the technical understanding of statistics to accomplish your goal? You might have to meet a higher standard to publish in a scientific journal than a personal standard you'd use to pick your own stock market investments.

.
I never though that it is that much complicated. My question was very simple one, Does 13/20 indicated that the preference of blue color is higher than the red color?

If you are facing this problem, how will you solve it?

Stephen Tashi · Jul 28, 2015

Adel Makram said:

.
My question was very simple one, Does 13/20 indicated that the preference of blue color is higher than the red color?

Now you are posing the question as hpothesis test instead of problem involving confidence intervals.

If you are facing this problem, how will you solve it?

First I would try to clarify the exact question that I want to ask. I'll attempt to state your question. You want do do a hypothesis test that the population proportion is different than 0.5. You have several alternative statistics that can be used. You want to know which statistic is "optimal".

One concept of "optimal" is the concept of the "most powerful". In your problem, the "power" of a statistical test at a given true value of the population proportion is the probability that the null hypothesis ( which is that the population proportion is 0.5) is rejected. For example, the power of the z-test at an assumed true population proportion of 0.633 would be the probability that the z-test rejects the null hypothesis when 0.633 is the actual proportion. To illustrate this conceptually, you can imagine doing a Monte-Carlo simulation to estimate the power. You would repeatedly simulate drawing 20 individuals from a population where the true proportion (favoring a color) is 0.633. You would apply to z-test to each of these batches of 20 individuals and see what fraction of times the z-test correctly rejects the null hypothesis.

We can probably look up material about the relative power of various tests of population proportions. Does the "power" of tests describe what you want to know about?

Adel Makram · Jul 28, 2015

Stephen Tashi said:

One concept of "optimal" is the concept of the "most powerful". In your problem, the "power" of a statistical test at a given true value of the population proportion is the probability that the null hypothesis ( which is that the population proportion is 0.5) is rejected. For example, the power of the z-test at an assumed true population proportion of 0.633 would be the probability that the z-test rejects the null hypothesis when 0.633 is the actual proportion.

If I understand you correctly, this means that the population proportion of 0.633 in this example is known and we would like to know whether a sample proportion of 0.5 is drawn from that population by applying z-test. Probably it will be rejected at 0.05 confidence level and this means the z-test in powerful in this case because p-value will be less than 0.05.

My case is the opposite, I don`t know the population proportion, I only know a sample proportion which is 13/20 and I would like to know whether the population proportion is 0.5 as 13/20 of my sample probably comes by chance only.

Now if I follow you then I assume that the population proportion is 0.5 and I would like to know whether 13/20 of my sample is drawn from this population by applying z-test. Then 0.5 is an assumed value not a true value but I can still use it to calculate the standard deviation √π(1-π). In this case I will not include n ( the sample size) in my calculation. Am I right?

Adel Makram · Jul 29, 2015

I think to make one small correction by including the sample size, n, in the denominator to represent the standard error.
so H_o: population proportion is 0.5.
H₁: population proportion ≠ 0.5.

Stephen Tashi · Jul 29, 2015

Adel Makram said:

If I understand you correctly, this means that the population proportion of 0.633 in this example is known

Yes

and we would like to know whether a sample proportion of 0.5 is drawn from that population by applying z-test.

No
We don't want to know the probability of drawing a sample where the proportion is exactly 0.5.

We want to know the probability that a specific statistical test correctly rejects the null hypothesis that the population proportion is 0.5.

For example, "What is the probability that a z-test with a significance level of 0.05 correctly rejects the null hypothesis that the population proportion is 0.5".

To completely describe the power of the z-test, we plot a curve of its power for the full range of possible true population proportions. 0.633 is just one x-coordinate on this curve.To compare the power of two tests, we plot their power curves on the same graph and see which curve is higher. We hope that one curve is always higher than the other, in which case the statistic with the higher curve is always better at rejecting the null hypothesis correctly, no matter which true value of the population proportion we use.

It may be that the curves cross, in which case you face a subjective choice.

Adel Makram · Jul 29, 2015

What I observed is that sample and the population proportions can be used interchangeably. The reason is because z-test in our case is a 2-tailed test, so interchanging π(population proportion) with p( sample proportion) will not affect the result. For example, having π=0.5, p=13/20 is equivalent to π=13/20 and p=0.5.

Having said that, then the probability of rejecting H₀ of π=0.5 at 0.05 confidence interval is equivalent to the probability of having π lies out side the range of 2 SD (standard deviation) from the point estimate of p=13/20.

RUber · Jul 29, 2015

Adel Makram said:

What I observed is that sample and the population proportions can be used interchangeably. The reason is because z-test in our case is a 2-tailed test, so interchanging π(population proportion) with p( sample proportion) will not affect the result. For example, having π=0.5, p=13/20 is equivalent to π=13/20 and p=0.5.

Having said that, then the probability of rejecting H₀ of π=0.5 at 0.05 confidence interval is equivalent to the probability of having π lies out side the range of 2 SD (standard deviation) from the point estimate of p=13/20.

The two are not exactly interchangeable. Note that the standard deviation is based on your null hypothesis. If you are testing the null hypothesis that your population proportion is .5, then you apply the hypothesized variance to your test...i.e. .25.
If you are hypothesizing that your population proportion is 13/20, and seeing how likely an outcome of 10/20 would be, then you would apply the hypothesized variance of 13/20 * 7/20 = 91/400. Although these variances are close, they are noticeably different and that has everything to do with your null hypothesis.

Assuming the null hypothesis of even odds, then you have the larger standard deviation, and the range of plus/minus 2 (or 1.96) standard deviations would be the standard normal approximation.
Additionally, you might want to add in the discrete data considerations. If you only have 20 trials, then your possible options are discrete, e.g. 12/20, 13/20, 14/20.
So, if you want to exclude 13/20 entirely, then 12.5/20 should be outside of your tolerance, since on the continuous curve, anything in [12.5, 13.5) gets rounded to 13.
If you compare these results to the binomial method, you should have similar outcomes, since n = 20 is getting close to sufficiently large to make the normal approximation.
Without having additional information about special applications, I would say that the standard normal approximation to the binomial distribution should be acceptable.
One way I get a (2 tailed) p value of about .264, another I get about .263.

RUber · Jul 29, 2015

Adel Makram said:

I used the link to calculate the test statistics and I found p=0.0099 which is statistically significant. However, when I used excel I got test statistics of 0.057 which not statistically significant. However, Excel didnot give me the critical value so I am not sure 0.057 is the test statistcs value or it is p-value.

I am not sure you used the calculator correctly. When I put your data into the same tool, I got the same results as FactChecker in post #2 - which was entirely in line with both the normal approximation and the discrete binomial p values in my last post.

So for this data set, I am not sure you could say that one test is better than another, since they are principally based upon the same assumptions.

Stephen Tashi · Jul 29, 2015

I took a brief glance at statistical opinions on the web about this problem - the summary;

The binomial distribution is preferred to the normal distribution for sample sizes as small as 20, just because the normal distribution isn't an accurate approximation.

One dimensional chi-square tests have a low power.

Population proportion from a sample estimate

What is "population proportion from a sample estimate"?

How is the population proportion from a sample estimate calculated?

Why is population proportion from a sample estimate important?

What are some potential limitations of using population proportion from a sample estimate?

How can the accuracy of population proportion from a sample estimate be improved?

Similar threads

Hot Threads

Recent Insights