Probability question: testing whether a population is 50/50

Curl · Jan 5, 2013

Say I have a "large" population (larger than 1000) of red cards and blue cards which I want to find if it is evenly split (1:1 red:blue ratio).
So I do an experiment and pull out 125 cards at random, and find that I have 50 red and 75 blue. Based on this experiment, what can I say about the red:blue ratio in the population? What is the "confidence" that the ratio is NOT 1:1?
I'm unsure what "confidence" means here, I can find for example the probability of getting more than 75 blue cards out of 125 by using the normal approx. to the binomial distribution but I'm not sure what that says about the population.

Can anyone give me some help on how to analyze this kind of data? How much information can we extract from this one experiment?

chiro · Jan 5, 2013

Hey Curl.

The confidence refers a way to describe the probability you wish to use for estimating the parameter.

The parameter is potentially with any sample between 0 and 1 (not including the two only on special occasions) and a finite sample won't give you enough information to conclude specifically the absolute value of the parameter.

More confidence allows for more taking into account more possibilities of the parameter in a statistical manner.

Basically you should apply the Normal approximation and test the hypothesis that 0.5 is in your confidence interval: If it is then you retain the hypothesis and if not you fail to retain it.

lurflurf · Jan 5, 2013

This is the same kind of thing that comes up in elections polling. Often instead of assuming the split is even (which can also be done) we can find the confidence interval of the observed split.

[tex]\text{C.I.}=p \pm z_{1-\alpha/2}\sqrt{\frac{p(1-p)}{n}}[/tex]
where p (probability) is 75/125
n=125
z is the desired confidence 1.96 for 95%

Assuming the probability is equal we could consider the probability of 75/125
[tex]\left(\frac{1}{2}\right)^{125} \binom{125}{50}\sim 0.00587[/tex]
more importantly the chance of drawing less than 75 is 98.4%

Bacle2 · Jan 6, 2013

You can also do a quick Chi-squared:

http://www2.lv.psu.edu/jxm57/irp/chisquar.html

Stephen Tashi · Jan 6, 2013

Curl said:

Based on this experiment, what can I say about the red:blue ratio in the population?

Without some assumptions about the population, you can't say anything except for the trivial fact that the population contains at least 50 red and 75 blue cards and so that establishes some bounds for the ratio.

Other posters have given you methods to produce some numbers. The methods work by assuming information about the population. The numbers they produce are often misinterpreted by laymen. I'll focus my post on the conceptual aspects.

Two divisions of statistics are "hypothesis testing" and "estimation".

Hypothesis Testing

Typical statistical "hypothesis" testing involves making a specific enough assumption about the population to compute the probability distribution for some statistic of the sample. For example, if the statistic is the ratio of red to blue cards in the sample, the assumption that the the population has the same number of red cards as blue cards is specific enough to let you compute the probability distribution of this ratio in samples. The assumption that the ratio is *not* 1:1 in the population is not specific enough to let you compute the distribution of that statistic.

Hypothesis testing is a procedure. You make a sufficiently specific assumption (a "null hypothesis") to know the probability distribution of some statistic. You define an "acceptance region" for the statistic. If the statistic computed from the observed data falls within the "acceptance region" you "accept" the hypothesis. Otherwise you "reject it". The quantitative behavior of the procedure is specified by the probability that the statistic would fall outside of the acceptance region if the null hypothesis were true.
(i.e. that the hypothesis testing would make the wrong decision if the null hypothesis were true.)

Hypothesis testing isn't a proof of something and it does not find the probability that the null hypothesis is true or the probability that it is false. Hypothesis testing is just a procedure that has been found to be empirically useful in many real life situations.

Estimation

Estimation refers to using some function of the sample data to estimate a parameter of the distribution of the population.

The technical definition of "confidence" refers to the scenario of "estimation" , not to "hypothesis testing". The numerical calculations in computing confidence are often the same as those used in hypothesis testing, but the interpretation of the numbers is different.

An empirical version of "confidence" in estimation is illustrated by the following:

Imagine there is a lab, to which you send samples. The lab reports an estimate of some property of the sample ( e.g. its mass). The report is given as an interval (e.g. 9.75 to 10.25 milligrams). If you have a way of doing more precise measurements on the same sample that determine its "true" mass, you can note whether the true mass is within the interval reported by the lab. By accumulating data on how often the lab was correct, you can quantify your "confidence" in the lab. If the interval reported by the lab contains the true mass of the sample 95% of the time, you can say that the lab gives a "95% confidence" interval for the true mass.

It's important to note that you cannot apply this "confidence" number to one particular lab report. For example if the lab reports an interval of "100.5 to 102.0 grams" , you cannot assert that there is a 0.95 probability that the true mass of that sample is in the interval 100.5 to 102.0 grams. For example, suppose the lab uses different measuring instruments on small samples and large samples. One of their instruments might be more reliable than the other. The 0.95 probability is not based on analyzing the behavior of the lab in enough detail to account for such a situation. It is only based on data about how often the lab was correct or incorrect.

A typical statistical version of "confidence" is analogous to the above example. You assume the population comes from a specific family of distributions (e.g. a binomial, or a gaussian). You pick a particular algorithm that computes an estimate of one of the parameters of the distribution in the form of an interval. You compute the probability that the algorithm produces an interval containing the true value of the parameter. This probability is the "confidence" associated with the estimate. (It is often possible to compute this probability only knowing the family of distributions that are involved. You don't need to assume a specific numbers for the true value of the population's distribution parameters.)

Just as in the empirical example, if you are using an algorithm that produces 95% confidence intervals, then you cannot claim that there is a 95% probability the the true value of the parameter is in one particular interval. For example if you are using a algorithm that works with 95% confidence to estimate the ratio of red to black cards and the algorithm produces the interval ( 0.47, 0.49) from your sample data, you can't claim that there is a 0.95 probability that population ratio is in that interval.

Math problems involving the same formulas can be posed in various ways, by changing which values are "given" and which are solved for. The common way to pose a "confidence interval" problem is state the estimation algorithm and the desired confidence as givens (e.g. 95%) and to solve for the number of samples needed to produce intervals that will give the estimate that level of confidence. That's the approach other posters have suggested.

Bayesian Statistics

The above methods are those of "frequentist" statistics, the type of statistics taught in most introductory courses. Essentially, frequentist statistics only tells you numbers that characterize the probability of the data given some assumption about the population distribution. It doesn't tell you the probability that some fact about the population is true given the observed data. (There is a difference in meaning between Pr(A|B) and Pr(B|A) and the two need not be numerically equal.)

If you want to solve for something like "The probability that the ratio of red to black cards in the population is in the interval (0.47, 0.49) given the observed data" then you have to assume a scenario where there is something probabilistic about how the population ratio came into being. If you don't assume such a scenario, there isn't enough given information to solve for such a probability.

Bayesian Statistics involves making assumptions about how the population parameters were selected from some distribution, called the "prior distribution". If you want to compute the answer to the question in the previous paragraph, you'll have to use Bayesian statistics.

Bacle2 · Jan 6, 2013

Nice explanation, Stephen Tashi, you've been accepted into my 'Favorites' links.

haruspex · Jan 6, 2013

Yes, very good Stephen. Just some elaboration on Bayesian stats...
The Bayesian approach requires you to say what you would have guessed about the ratio before you did the experiment. How likely was it to be 50:50?, 60:40?, and so on. I.e. an entire probability distribution for the ratio. The result of the experiment revises that curve to give you a new distribution. The more data you collect, the less your original assumptions matter - within reason.
Soapbox alert. Classical hypothesis testing hides the need to do this by making you pick an acceptance region. In my view, hiding it this way too often leads to poor choices of that.

Bacle2 · Jan 9, 2013

Still, Stepen Tashi, the Chi-Squared test does not seem to fall in either of your categories, since the test, as I understand it, makes no assumption about the
distribution of any sample statistic.

lurflurf · Jan 9, 2013

Why all this talk about defective scales and unknown distributions? We know this is a binomial distribution. The only trouble is conditional probability, but that cannot be eliminated. No amount of trials can confirm a model, but we can become increasingly sure that if the model is wrong we have observed an unlikely event.

Stephen Tashi · Jan 9, 2013

Bacle2 said:

Still, Stepen Tashi, the Chi-Squared test does not seem to fall in either of your categories, since the test, as I understand it, makes no assumption about the
distribution of any sample statistic.

A common form of the chi-square test involves computing a statistic that depends on the difference between the observed frequency and the "theoretical" frequency. So when you assume a certain "theoretical" frequency, you make an assumption about the distribution of the population.

Stephen Tashi · Jan 9, 2013

lurflurf said:

Why all this talk about defective scales and unknown distributions? We know this is a binomial distribution.

It is "a binomial distribution" the sense of being from that family of distributions. We don't know which particular binomial distribution it is.

The only trouble is conditional probability, but that cannot be eliminated. No amount of trials can confirm a model, but we can become increasingly sure that if the model is wrong we have observed an unlikely event.

I don't know which defective scales and conditional probably you are talking about - and which model.

It's interesting to me that answers to statistical questions, on this and other math forums, are often granted a kind of exception to the standards of mathematics that are applied to other questions. Incorrect or imprecise statements in questions (and answers) about linear algebra, topology, and real analysis are usually set straight by some interested party. With statistics, we often see the original posters imprecise question quickly interpreted as specific kind of textbook statistics problem and answered that way.

I prefer to get the original question clarified. I don't claim this approach has any bottom line superiority. I suspect that most people who ask imprecise statistical questions, would eventually settle for some textbook way of posing their problem. It takes a very mathematically sophisticated mind to tranlsate practical problems into questions involving probability. Approaching real world problems also requires a tolerance for detail and complication. Most questioners aren't likely to exert that much effort.

A person who posted in the physics section and asked "What is the best way to build a perpertual motion machine to provide electricity to lower my electric bills?" would get a predictable reception. He would be set straight (and perhaps not so gently) about the conservation of energy and the difference between perpetual motion and the motion that produces usable work. A person in the math section asking a question like "How many random samples from an urn would I need to be sure that an urn contains exactly as many black balls as white balls?" is asking a question that is just as intellectually outrageous as the question about perpetual motion. However, it's common to hear this and similar outrageous questions in the field of statistics, so I suppose the questioners should be given more slack.

Bacle2 · Jan 12, 2013

Sorry for my previous nonsensical comment. I was thinking of Mann-Whitney's U-test, which is non-parametric, and I mistakenly wrote about the Chi-squared . It is too late to edit, so I am just writing a correction.

Stephen Tashi · Jan 12, 2013

True, the Mann-Whitney U-test doesn't assume that two populations have any specific distribution. The null hypothesis is that the two populations have the same (unspecified) distribution. However, this does amount to assuming a specific distribution for the statistic (the rank sum).

Probability question: testing whether a population is 50/50

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect