Probability question: testing whether a population is 50/50

• Curl
In summary, the conversation discusses conducting an experiment on a population of red and blue cards to determine if the population has an even split of 1:1 ratio. The experiment involves randomly pulling out 125 cards and observing that there are 50 red and 75 blue cards. The conversation then delves into the concept of "confidence" and how it can be used to estimate the true ratio of red to blue cards in the population. However, without making assumptions about the population, it is not possible to make any specific conclusions about the ratio.
Curl
Say I have a "large" population (larger than 1000) of red cards and blue cards which I want to find if it is evenly split (1:1 red:blue ratio).
So I do an experiment and pull out 125 cards at random, and find that I have 50 red and 75 blue. Based on this experiment, what can I say about the red:blue ratio in the population? What is the "confidence" that the ratio is NOT 1:1?
I'm unsure what "confidence" means here, I can find for example the probability of getting more than 75 blue cards out of 125 by using the normal approx. to the binomial distribution but I'm not sure what that says about the population.

Can anyone give me some help on how to analyze this kind of data? How much information can we extract from this one experiment?

Hey Curl.

The confidence refers a way to describe the probability you wish to use for estimating the parameter.

The parameter is potentially with any sample between 0 and 1 (not including the two only on special occasions) and a finite sample won't give you enough information to conclude specifically the absolute value of the parameter.

More confidence allows for more taking into account more possibilities of the parameter in a statistical manner.

Basically you should apply the Normal approximation and test the hypothesis that 0.5 is in your confidence interval: If it is then you retain the hypothesis and if not you fail to retain it.

This is the same kind of thing that comes up in elections polling. Often instead of assuming the split is even (which can also be done) we can find the confidence interval of the observed split.

$$\text{C.I.}=p \pm z_{1-\alpha/2}\sqrt{\frac{p(1-p)}{n}}$$
where p (probability) is 75/125
n=125
z is the desired confidence 1.96 for 95%

Assuming the probability is equal we could consider the probability of 75/125
$$\left(\frac{1}{2}\right)^{125} \binom{125}{50}\sim 0.00587$$
more importantly the chance of drawing less than 75 is 98.4%

Curl said:
Based on this experiment, what can I say about the red:blue ratio in the population?
Without some assumptions about the population, you can't say anything except for the trivial fact that the population contains at least 50 red and 75 blue cards and so that establishes some bounds for the ratio.

Other posters have given you methods to produce some numbers. The methods work by assuming information about the population. The numbers they produce are often misinterpreted by laymen. I'll focus my post on the conceptual aspects.

Two divisions of statistics are "hypothesis testing" and "estimation".

Hypothesis Testing

Typical statistical "hypothesis" testing involves making a specific enough assumption about the population to compute the probability distribution for some statistic of the sample. For example, if the statistic is the ratio of red to blue cards in the sample, the assumption that the the population has the same number of red cards as blue cards is specific enough to let you compute the probability distribution of this ratio in samples. The assumption that the ratio is *not* 1:1 in the population is not specific enough to let you compute the distribution of that statistic.

Hypothesis testing is a procedure. You make a sufficiently specific assumption (a "null hypothesis") to know the probability distribution of some statistic. You define an "acceptance region" for the statistic. If the statistic computed from the observed data falls within the "acceptance region" you "accept" the hypothesis. Otherwise you "reject it". The quantitative behavior of the procedure is specified by the probability that the statistic would fall outside of the acceptance region if the null hypothesis were true.
(i.e. that the hypothesis testing would make the wrong decision if the null hypothesis were true.)

Hypothesis testing isn't a proof of something and it does not find the probability that the null hypothesis is true or the probability that it is false. Hypothesis testing is just a procedure that has been found to be empirically useful in many real life situations.

Estimation

Estimation refers to using some function of the sample data to estimate a parameter of the distribution of the population.

The technical definition of "confidence" refers to the scenario of "estimation" , not to "hypothesis testing". The numerical calculations in computing confidence are often the same as those used in hypothesis testing, but the interpretation of the numbers is different.

An empirical version of "confidence" in estimation is illustrated by the following:

Imagine there is a lab, to which you send samples. The lab reports an estimate of some property of the sample ( e.g. its mass). The report is given as an interval (e.g. 9.75 to 10.25 milligrams). If you have a way of doing more precise measurements on the same sample that determine its "true" mass, you can note whether the true mass is within the interval reported by the lab. By accumulating data on how often the lab was correct, you can quantify your "confidence" in the lab. If the interval reported by the lab contains the true mass of the sample 95% of the time, you can say that the lab gives a "95% confidence" interval for the true mass.

It's important to note that you cannot apply this "confidence" number to one particular lab report. For example if the lab reports an interval of "100.5 to 102.0 grams" , you cannot assert that there is a 0.95 probability that the true mass of that sample is in the interval 100.5 to 102.0 grams. For example, suppose the lab uses different measuring instruments on small samples and large samples. One of their instruments might be more reliable than the other. The 0.95 probability is not based on analyzing the behavior of the lab in enough detail to account for such a situation. It is only based on data about how often the lab was correct or incorrect.

A typical statistical version of "confidence" is analogous to the above example. You assume the population comes from a specific family of distributions (e.g. a binomial, or a gaussian). You pick a particular algorithm that computes an estimate of one of the parameters of the distribution in the form of an interval. You compute the probability that the algorithm produces an interval containing the true value of the parameter. This probability is the "confidence" associated with the estimate. (It is often possible to compute this probability only knowing the family of distributions that are involved. You don't need to assume a specific numbers for the true value of the population's distribution parameters.)

Just as in the empirical example, if you are using an algorithm that produces 95% confidence intervals, then you cannot claim that there is a 95% probability the the true value of the parameter is in one particular interval. For example if you are using a algorithm that works with 95% confidence to estimate the ratio of red to black cards and the algorithm produces the interval ( 0.47, 0.49) from your sample data, you can't claim that there is a 0.95 probability that population ratio is in that interval.

Math problems involving the same formulas can be posed in various ways, by changing which values are "given" and which are solved for. The common way to pose a "confidence interval" problem is state the estimation algorithm and the desired confidence as givens (e.g. 95%) and to solve for the number of samples needed to produce intervals that will give the estimate that level of confidence. That's the approach other posters have suggested.

Bayesian Statistics

The above methods are those of "frequentist" statistics, the type of statistics taught in most introductory courses. Essentially, frequentist statistics only tells you numbers that characterize the probability of the data given some assumption about the population distribution. It doesn't tell you the probability that some fact about the population is true given the observed data. (There is a difference in meaning between Pr(A|B) and Pr(B|A) and the two need not be numerically equal.)

If you want to solve for something like "The probability that the ratio of red to black cards in the population is in the interval (0.47, 0.49) given the observed data" then you have to assume a scenario where there is something probabilistic about how the population ratio came into being. If you don't assume such a scenario, there isn't enough given information to solve for such a probability.

Bayesian Statistics involves making assumptions about how the population parameters were selected from some distribution, called the "prior distribution". If you want to compute the answer to the question in the previous paragraph, you'll have to use Bayesian statistics.

Last edited:
Nice explanation, Stephen Tashi, you've been accepted into my 'Favorites' links.

Yes, very good Stephen. Just some elaboration on Bayesian stats...
The Bayesian approach requires you to say what you would have guessed about the ratio before you did the experiment. How likely was it to be 50:50?, 60:40?, and so on. I.e. an entire probability distribution for the ratio. The result of the experiment revises that curve to give you a new distribution. The more data you collect, the less your original assumptions matter - within reason.
Soapbox alert. Classical hypothesis testing hides the need to do this by making you pick an acceptance region. In my view, hiding it this way too often leads to poor choices of that.

Still, Stepen Tashi, the Chi-Squared test does not seem to fall in either of your categories, since the test, as I understand it, makes no assumption about the
distribution of any sample statistic.

Why all this talk about defective scales and unknown distributions? We know this is a binomial distribution. The only trouble is conditional probability, but that cannot be eliminated. No amount of trials can confirm a model, but we can become increasingly sure that if the model is wrong we have observed an unlikely event.

Bacle2 said:
Still, Stepen Tashi, the Chi-Squared test does not seem to fall in either of your categories, since the test, as I understand it, makes no assumption about the
distribution of any sample statistic.

A common form of the chi-square test involves computing a statistic that depends on the difference between the observed frequency and the "theoretical" frequency. So when you assume a certain "theoretical" frequency, you make an assumption about the distribution of the population.

lurflurf said:
Why all this talk about defective scales and unknown distributions? We know this is a binomial distribution.
It is "a binomial distribution" the sense of being from that family of distributions. We don't know which particular binomial distribution it is.

The only trouble is conditional probability, but that cannot be eliminated. No amount of trials can confirm a model, but we can become increasingly sure that if the model is wrong we have observed an unlikely event.

I don't know which defective scales and conditional probably you are talking about - and which model.

It's interesting to me that answers to statistical questions, on this and other math forums, are often granted a kind of exception to the standards of mathematics that are applied to other questions. Incorrect or imprecise statements in questions (and answers) about linear algebra, topology, and real analysis are usually set straight by some interested party. With statistics, we often see the original posters imprecise question quickly interpreted as specific kind of textbook statistics problem and answered that way.

I prefer to get the original question clarified. I don't claim this approach has any bottom line superiority. I suspect that most people who ask imprecise statistical questions, would eventually settle for some textbook way of posing their problem. It takes a very mathematically sophisticated mind to tranlsate practical problems into questions involving probability. Approaching real world problems also requires a tolerance for detail and complication. Most questioners aren't likely to exert that much effort.

A person who posted in the physics section and asked "What is the best way to build a perpertual motion machine to provide electricity to lower my electric bills?" would get a predictable reception. He would be set straight (and perhaps not so gently) about the conservation of energy and the difference between perpetual motion and the motion that produces usable work. A person in the math section asking a question like "How many random samples from an urn would I need to be sure that an urn contains exactly as many black balls as white balls?" is asking a question that is just as intellectually outrageous as the question about perpetual motion. However, it's common to hear this and similar outrageous questions in the field of statistics, so I suppose the questioners should be given more slack.

Last edited:
Sorry for my previous nonsensical comment. I was thinking of Mann-Whitney's U-test, which is non-parametric, and I mistakenly wrote about the Chi-squared . It is too late to edit, so I am just writing a correction.

True, the Mann-Whitney U-test doesn't assume that two populations have any specific distribution. The null hypothesis is that the two populations have the same (unspecified) distribution. However, this does amount to assuming a specific distribution for the statistic (the rank sum).

1. How do you test if a population is 50/50?

There are a few different statistical tests that can be used to determine if a population is 50/50. One common approach is to use a chi-square test, which compares the observed frequencies in the sample to the expected frequencies under a 50/50 distribution. Another option is to use a t-test, which compares the means of two groups to determine if they are significantly different from each other.

2. What is the significance level for testing a 50/50 population?

The significance level, also known as alpha, is the probability of incorrectly rejecting the null hypothesis (i.e. incorrectly concluding that the population is not 50/50). The most commonly used significance level is 0.05, which means that there is a 5% chance of making a type I error (incorrectly rejecting the null hypothesis) when the population is actually 50/50.

3. Can you use a one-tailed or two-tailed test for a 50/50 population?

A one-tailed test is appropriate when the research hypothesis specifies the direction of the difference between the two groups being compared. For example, if the hypothesis is that one group has a larger proportion than the other, a one-tailed test can be used. However, if the hypothesis is simply that there is a difference between the two groups, a two-tailed test should be used. In the case of testing a 50/50 population, a two-tailed test is typically used.

4. How do you interpret the results of a 50/50 population test?

The results of a 50/50 population test will typically include a p-value, which represents the probability of obtaining the observed results or more extreme results if the null hypothesis (i.e. the population is 50/50) is true. If the p-value is less than the significance level (usually 0.05), the null hypothesis is rejected and it can be concluded that the population is not 50/50. If the p-value is greater than the significance level, the null hypothesis cannot be rejected and it can be concluded that the population is 50/50.

5. Are there any assumptions or limitations when testing a 50/50 population?

Yes, there are some assumptions and limitations that should be considered when testing a 50/50 population. These may include the sample size, the distribution of the data, and the independence of the observations. It is important to carefully choose an appropriate test and ensure that the necessary assumptions are met in order to obtain accurate results.

Replies
7
Views
947
Replies
3
Views
2K
Replies
8
Views
5K
Replies
14
Views
2K
Replies
11
Views
1K
Replies
1
Views
1K
Replies
8
Views
2K
Replies
3
Views
11K
Replies
30
Views
3K
Replies
4
Views
1K