Detecting whether a bag contains fair coins

1. Jun 28, 2012

Sunnymilk

This is actually a problem from some biological data, but can be more generally described as follows:

I have a bag containing n coins. Each coin has a heads and a tails. After the coin has been flipped some number of times, it vanishes. The number of times the coin may be flipped varies from coin to coin.

I flip all the coins in the bag until they've each vanished, and carefully record each outcome. I know that the bag is either filled with fair coins, or is filled with coins that have a bias toward heads or tails. Each of the coins may have a different bias.

My question is this: if it is the latter case, how might I reject the possibility that it is in fact a bag of fair coins?

2. Jun 29, 2012

haruspex

It is not in general possible to answer the question of whether a coin is fair. You must first choose some threshold and accept a degree of misjudgment. E.g. if you decide to regard a coin producing 10 consecutive results the same as unfair then you will wrongly convict one coin in 500. More problematic is the question of how unfair is unfair? Is a coin that (if you could somehow know this) produces an average 501 heads for every 1000 throws unfair?
Worst of all, the probability of an outcome given a hypothesis is not the same as the probability of the hypothesis given the outcome. If, by experience, you know that only one coin in a million exceeds your fairness test threshold then you should need a lot of convincing that a given coin is unfair.
So you need:
- a threshold bias for considering a coin unfair
- an a priori belief of what proportion of coins exceed that threshold
- a threshold for how sure you want to be about your decisions

3. Jun 29, 2012

Simon Bridge

Welcome to PF.
I think the math you need comes under "Beyesian analysis".
Haruspex is right - especially since you don't know how much any bias may be.

4. Jun 29, 2012

viraltux

Hi Sunnymilk!

I myself would recommend you a goodness of fit test for this problem. Since you say you have two scenarios; fair coins vs non fair (being either all non fair towards tails or all non fair toward heads) You can simply use a Pearson χ2 test.

Do the following:

1. Count all the times the coins have been flipped; we call this number N.
2. Count all the heads that you have obtained; we call this number H.
3. Count all the tails that you have obtained; we call this number T.
4. Calculate $X^2_H = \frac{(H - N/2)^2}{N/2}$
5. Calculate $X^2_T = \frac{(T - N/2)^2}{N/2}$
6. We have that $S = X^2_H + X^2_T → \chi^2_1$ (one degree of freedom since knowing H fixes T)
7. Set a critical $χ^2_1$ value C for a confidence level to reject the $H_0$ (a 95% confidence level C would be 3.841459)
8. If S>C you say you reject $H_0$ at your given confidence level.
9. If S≤C you say you cannot reject $H_0$ at your given confidence level.

And that would be it

5. Jun 29, 2012

haruspex

As I read the OP, the coins can have different biases, and I take that to mean they may even have opposite biases.

6. Jun 29, 2012

D H

Staff Emeritus
I'm not sure that this is the case. The OP said "Each of the coins may have a different bias," which might be construed as some coins are biased toward heads, others toward tails, yet others more or less fair. This could change the shape of the distribution but not the mean.

Then there's the added twist of flipping the coins until they vanish. What if biased coins live longer? Shorter? What if the bag contains a propensity of coins biased toward heads, but the coins biased toward tails tend to have a longer life than those biased towards heads?

If these interpretations are valid, a simple chi square test just isn't going to cut it. You'll have to look at a rejection test based on the shape of the distribution as well (and I suspect even that can be masked by a sufficiently devious distribution of biases and vanishing characteristics). The experiment as described in the OP won't give the distribution. It just yields a single point, the fraction of tosses that were heads.

One option, if it's possible, is to re-create the bag multiple times so as to develop a experiment-based reconstruction of the distribution. Then do a combination of rejection tests on the mean, variance, and perhaps higher moments (but that gets dicey). Another option is to arbitrarily partition the contents of the bag into multiple groups and use the statistics for those groups as a means to estimate the distribution. Yet another is to use something like bootstrap or jackknife.

7. Jun 29, 2012

haruspex

The question there is whether the experimenter knows that we have moved on to the next coin. If so, we will get some indication of bias per coin, for any that survive the first toss. Even if each only lasts two tosses, after a sufficiently large number of coins it might be possible to form an opinion. E.g. if you hardly ever see HT or TH you might conclude it's a biased collection.

8. Jun 29, 2012

D H

Staff Emeritus
And even that might not help. Extreme example: Suppose each coin vanishes after the very first flip (no info from multiple flips) and suppose a newly filled bag always contains N coins with heads on both sides and N coins with tails on both sides. A single run of the experiment might make you think the bag is fair. Getting exactly 50% heads on multiple runs would give a clue that the coins are not fair.

9. Jun 29, 2012

viraltux

I summon Sunnymilk to clarify if when coins are not fair they are all not fair towards heads or tails!!

(Will this be one of these questions where we will end up learning biology to be able to answer? Don't miss the next chapter...)

Last edited: Jun 29, 2012
10. Jun 29, 2012

Sunnymilk

Hi, and thanks for the replies!

They are not all biased in the same way, otherwise I could just pool all the results and see how likely the outcome is under a simple binomial distribution, I believe. This is what really makes the problem tricky.

Here's a rather inelegant solution I had thought of, though this is definitely not my field of expertise:
Let's say each coin i has f_i flips. We'd expect that the number of heads for each coin, if fair, would be f_i / 2. One thing you could do is to make some metric for each outcome's deviation from the expected value, add that up to get a total deviation, and then do a monte carlo simulation with fair coins to determine if the observed deviation is unlikely given the simulated deviations.

To be specific, let's say the total heads count for coin i is h_i, and let's use a naive deviation metric, d_i = abs(h_i - f_i/2). We call the total deviation D = sum(d_i). Now, if the coins are biased, we would expect D to be greater than it would be if they were fair, so we do few thousand simulations, maintaining the same values for f_i, but simulating h_i as if the coins were all fair. Now we have a distribution of Ds from the simulations, and if the observed D is sufficiently unlikely given that distribution, we would reject the possibility that the original bag contains fair coins.

I think this should work in principle for any reasonable measure of the deviation of each coin, but there are probably smarter ones that would result in a distribution that would better detect the biased bag.

Last edited: Jun 29, 2012
11. Jun 29, 2012

viraltux

All right, I see... I think this test then looks promising for your problem Kruskal-Wallis one-way analysis of variance (the Montecarlo thingy you mention not so much)

Maybe you could give us some details about the background problem; I am just always curious.

Last edited: Jun 29, 2012
12. Jun 29, 2012

chiro

Hey Sunnymilk and welcome to the forums.

As pointed out, you will not be able to say with any certainty whether the coin is fair or not.

With the frequentist approach, what you can do is get an interval for the parameter for your binomial and decide whether something is fair or not by supplying your definition of what is 'biased' and what is 'unbiased'.

But as pointed out, you say that each coin may have a different bias which makes things complicated.

As a result, you will want to consider things the distribution of the net bias (the sum of all bias of the coins), since every coin contributes equally to the final result.

If the net-bias is large enough in magnitude then you can use this in hypothesis testing as evidence whether to accept or fail to accept evidence that you have a biased (or unbiased coin).

If you specify a bias for every-coin, then you can factor this in directly into the hypothesis testing procedure where you consider Bernoulli-trials each with their own individual probabilities.

If you want to do the above, then you can create a probability density function through what is known as a Probability Generating Function, which is a way to generate a PDF for a discrete process much like the one you have in your situation.

You can then build on these results to generate an appropriate hypothesis test for accepting or failing to accept a hypothesis and take it from there.

As a simplification, you might want to consider cases where each coin has the same magnitude in bias (biased towards heads or tails), because the test becomes a lot easier.

13. Jun 29, 2012

Sunnymilk

Ok, I have a couple of questions:

Regarding the Bayesian approach (about which I know relatively little). My understanding is that, given some P(fairness), I can compute P(fairness | #heads) for each coin; or perhaps I'd have to find P(fairness | #heads,#flips) for each coin. How would I combine these probabilities to make a decision about the entire bag?

For the PGF, it seems very straightforward to produce an appropriate PGF from the multivariate equation in the wikipedia article, but I'm not sure where to go from there. Is there a resource you could point me to which would explain how to go from a PGF to a hypothesis test?

Since asked, here is the actual problem in a nutshell:

I have some cells in an environment containing two elements, let's say A and B, with whom they like to interact. These elements are readily available to every cell. I can observe them over a period of time and watch as each cell makes up to 7 interactions with these elements (though as few as 1 for many of the cells).

The conventional hypothesis is that the cells are indiscriminate about which element they interact with. That is, any cell at any moment is equally likely to interact with A as B, and that this is independent of any prior interactions it has made.

We believe that we've observed strong tendencies for many of the cells to prefer A or to prefer B, but lack a good way to quantify that observation.

Last edited: Jun 29, 2012
14. Jun 29, 2012

viraltux

I like the real problems soooo much more than those that people translate into urns or coins...

Seems to me you only need to estimate a proportion in this problem, that is, what proportion of cells interact with A compared to B, which makes your problem waaaay easier.... Unless the nutshell was way too small. Is this all that there is to the problem?

15. Jun 29, 2012

Sunnymilk

But the bias goes both ways, there should be roughly the same number that prefer B as prefer A, and that preference is not exclusive; a cell can mostly like A, but dabble in B.

16. Jun 29, 2012

haruspex

That's why I wrote "if each lasts two tosses".

17. Jun 29, 2012

viraltux

This would be the null hypothesis:
What you can measure, and therefore compare, are the number of interactions of one cell; this goes from one to around seven, and what element they interact with. (e.g. cell1 = AAABAAA, cell2 = ABAB, cell3 = AB, ... so on)

So for every interaction length, from one to seven, the cells should follow a Binomial with p=0.5 and n=1..7. To check this hypothesis hold you can use a Pearson χ2 goodness of fit test.

You mention you have reasons to believe some cells heavily favor either A or B beyond what we would expect out of pure chance, the χ2 test will detect that situation immediately and will fail if you have too many cells showing unexpected amounts of As or Bs.

The conventional hypothesis you mention allows that sometimes some cells might favor all As or all Bs by pure chance, so there is no such thing here as a bias from the cell but just the outcome of a Binomial random variable.

Having "roughly the same number that prefer B as prefer A" is a consequence of the cells following a Binomial distribution which p is 0.5

18. Jun 29, 2012

haruspex

Hi Sunnymilk,
After the clarifications, looks to me that your coin model is a fair representation, so I'll stick with that.
There is a trade-off between how much bias you want to claim and how confidently you want to be able to claim it. That's why I suggested you pick a threshold bias, but I guess it would be better to try to derive that from the data.
In principle, the (Bayesian) process goes something like this:
- Pick an a priori distribution for the bias. E.g. you might consider the bias as equally likely anything from 0 to 1 (0 says no bias, 1 says each coin is 100% biased). But since you have received wisdom to overturn, a more conservative distribution would pass review better.
- For 'each' possible bias (analytically this would be integration, but I think numerical methods will be needed here, so quantise), compute the likelihood of the observations.
- Multiply the a priori likelihood of the bias by the likelihood of the observation to get its new relative likelihood.
This gives you a new probability distribution for the bias. Pick some threshold and find the area (total probability) beyond it.
The above approach could instead be used as a way of finding the the threshold bias you want to claim. Then textbook tests (others on the thread will know these better than I do) should allow you to compute confidence levels.

19. Jun 30, 2012

Sunnymilk

Hello and thanks again for the advice,

This certainly sounds like the simplest solution. I tried it out with simulated data where each coin/cell had a bias drawn from a uniform distribution and a similarly chosen number of flips for each item. Unfortunately, even though many of the cells in the simulation have heavy biases, the test did not detect them. Meanwhile the silly montecarlo scheme I had describes does. Maybe there are just too few observations per coin (for most of them) for this to work?

I tried this too, once again using simulated data as described above and using a uniform a priori bias distribution. This seems like it will work, but I fear my advisor's distrust of all things Bayesian may make it a hard sell.

20. Jun 30, 2012

viraltux

No I don't think so, you must be doing something [strike]wrong[/strike].... mmmmm

Edit: Actually, I think I know where the problem is; the length of the interaction will totally affect the test, as a matter of fact, a test for cells interacting only once makes not sense since, in this case, the scenario all cells unbiased vs all cell uniformly biased is indistinguishable since both will yield a 50% of As and Bs, so no test will be able to discriminate one situation from the other for cells with just one interaction.

So you are right, the test will not detect anything for very short lengths and rightly will do so, to measure this phenomena you should stick to the longest sequences and ignore short ones to allow the characteristic bell shape of the Binomial to form and thus increase the sensitivity of the test.

The standard theory says:

So let's focus in just one particular interaction length n, and we take A=1 and B=0. If the standard hypothesis is true then 0 and 1 will be equally likely and we have Cells~Binomial(n,0.5), OK? Now, in this scenario the Cells histogram will look like a Gaussian distribution (mountain like) with mean in n*0.5.

Yet, in your alternate hypothesis for this scenario you say that cells are all uniformly biased (as you do in you simulation) which, overall, it is true that you will get roughly the same number of 0s and 1s (or As and Bs) but the distribution for a particular length is going to look nothing like a Gaussian, it will be something quite flat due to the uniformity in the probability.

So any goodness of fit test should detect this right away. (Edit: for big enough sequences to allow the bell shape to form.)

Note: Your Montecarlo method works because in your experiment the variance for an Uniform like distribution is bigger than the variance for a Binomial like distribution.

Last edited: Jun 30, 2012
21. Jun 30, 2012

haruspex

Viraltux, the problem with the method you suggest is that (if I've understood it) it won't work as is with a large number of cells each only surviving a few tests. As I mentioned earlier, even if each only survives two, with a sufficiently large number of cells you should still be able to draw a firm conclusion. (In the coin-tossing analogy, suppose you hardly ever saw a head and a tail from the same coin.) You suggest only using data from cells that last long enough, but that might be throwing away a lot of perfectly good data.
I expect there's a way to adapt your procedure to handle this.

22. Jun 30, 2012

Stephen Tashi

For a Bayesian analysis, you need more than an assumption about P(fairness) since "non-fair" doesn''t provide enough information to compute the probability of observing the data. An example of an adequate hypothesis is: Assume each cell $j$ has a probability $p_j$ of interacting with elements of type A (when it interacts) and a probability of $1 - p_j$ of interacting with elements of type B and assume that $p_j$ is drawn independently from a uniform distribution on [0,1] for each cell.

Based on the observed data, you would compute the probability of events such as:
$1/2 - \epsilon \le p_j \le 1/2 + \epsilon$ for each $p_j$ for the $\epsilon$ of your choice. That would be a representation of the idea that all the $p_j$ are approximately 1/2.

The above scenario may not be biologically plausible. Bayesian analysis encourages experts to use their expert knowledge in formulating the assumptions instead of being lobotomized by mathematical models that happen to be lying around. You may be able to formulate a better model of how cells with preferences for one element or another are generated.

Bayesian statisitics is often done by using Monte-Carlo simulations. (I appreciate that there is a cultural resistance to both these methods in some academic circles.)

It's good that you are doing Monte-Carlo simulations to test the effectiveness of statistical tests. You'll notice that in doing such simulations of the "non-fair" situation, you face the problem of picking a model for for generating the population of non-fair coins, just as you would face it when you use Bayesian statistics.

----

It isn't clear to me whether you have a few cells and vast number of elements A and B and whether a unit of element A is "destroyed" when it interacts with the cell. Apparently the conditions are such that $p_j$ can be assumed constant.

How do the cells "vanish"?

23. Jul 1, 2012

viraltux

Well, I was being pragmatic in the sense that long sequences are the one that will show more easily the bias and if the test detects that then the OP can present results. Besides, in this way we also discriminate cells than live longer interactions from the cells living shorter interactions in case there might be qualitative differences among them.

But if it turns out that long sequences are scarce then you are right; we need to use all the data we have. The shorter the interactions the harder to detect the uniform bias though (and in fact for interactions on length one is not possible) and yes, there are ways to use all the data in this procedure:

Let's imagine we have these cells:

C0: B
C1: AB
C2: AAB
C3: BBAABABA
....

And let's assume we don't have much data (specially long interaction cell data) so we want to use all we have to detect any bias.

Assuming that cells dying soon are not qualitative different from those having long interactions we could treat the problem as a missing data one, that is:

C0: BXXXXXXX
C1: ABXXXXXX
C2: AABXXXXX
C3: BBAABABA
....

What we fill the X with is test dependent since it has to be test-neutral, that is, we substitute the X values in ways that will not favor any outcome.

For this problem and the Pearson's χ2 test it would be fairly simple since it only deals with expected values vs observed values, so we don't even need to substitute anything, only keep the proportions.

24. Jul 1, 2012

Sunnymilk

The elements are essentially infinitely available and not measurably consumed by the interactions. Observing the interactions, however, is extremely costly and labor intensive, and they occur rarely and spontaneously on top of that. As such, we can't really control how many observations we get from each cell, and it's important to us that we try to avoid throwing out data if at all possible.