# Test whether a n-sided coin is fair?

Hi all,

I've been looking for a general formula to check whether a random number generator is random. Basically, I need something similar to:

http://en.wikipedia.org/wiki/Checkin...a_coin_is_fair [Broken]

, but for a n-sided coin.

Suppose we have a random generator of integers between 1 and "n". I need to obtain a formula of "n" telling me how many samples I need before being able to tell with certain confidence and within acceptable error that the generator is "fair" - that is, whether the generator is behaving randomly or not.

Does anyone know anything that could help?

Thanks in advance

Last edited by a moderator:

## Answers and Replies

Related Set Theory, Logic, Probability, Statistics News on Phys.org
CRGreathouse is correct, the way to test that your random number generator is "fair" in the sense that all n outcomes are equally likely is to use a chi-square test.

But if you want to check that the generator "behaves randomly", that gets into the huge subject of testing random number generators. Google "testing random number generators" and you will get many hits.

many thanks to both of you.

chi-square test seems to be what I am looking for. However, after reading all day, I can't seem to figure the answer to the following 2 questions:

1. How many observations "N" do I need to be able to confirm or reject the null hypothesis with certain confidence (e.g 95%) for a given sample size "k"? Isn't it that for lower "k" (e.g k=2 for coin) I need less observations than for higher "k" (e.g. k=6 for dice)? I can't seem to be able to get the exact formula for "N" as a function of "k" and the required confidence.

2. The second formula I need to derive is to get confidence level as a function of "k" and "N". That is, given the same number of observation, I need to be able to determine the level of confidence for rejecting or accepting the null hypothesis for different "k". E.g., for k=2 I should be more confident with the same number of observations than with k=6.

Thanks again :)

The question of needed sample size is addressed by the "power" of the test. The following document shows how to compute the power of the chi-squared test:

www.stat.psu.edu/~dhunter/asymp/lectures/ANGELchpt07.pdf[/URL]

Note that you must have an alternative proportion of outcomes to propose before you can compute the power. For example, suppose you have a 6-sided die. You propose that instead of proportions of (1/6, 1/6, 1/6, 1/6, 1/6, 1/6), the actual proportions are (.2, .2, .2, .2, .1, .1). You propose 100 trials. You can then compute the power of the test-- the probability of rejecting the null hypothesis. If the power isn't as high as you want, you will need more trials.

Note also that the power computation requires computation of the value of a non-central chi-square distribution, so you will probably need some statistical software for the computation. As far as I know, you can't, for example, find a non-central chi-square value using Microsoft Excel.

Last edited by a moderator:
thanks awkward, I'll check it out :)

awkward,

I've been trying to figure it out for a whi;e, but I am still a bit confused. I am by no means a mathematician, so I got lost reading different sources using different terms, notations, etc.

I was looking at the lecture notes example you provided. Finding the non-centrality parameter (ncp) seems straightforward. I am not exaclty sure what happens next.

If I get it right, to obtian the power of a test against an alternative, we calculate the non-central cdf according to the formula as written here http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution" [Broken] using the following parametrs:

- "x" is the chi-square quantile obtained using the pre-specified confidence level (0.01 in the example) and degrees of freedom (2)
- "k" is degrees of freedom (2)
- \lambda is the ncp as calculated before

Is that correct?

Finally, now that we found the cdf value (0.965), what does it tell us? How do we interpret it? I mean, the conclusion should be something like:

If we take 200 samples, the probability we accept the null hypothesis when in reality alternative is the truthful is 96.5%?

A bit confused here :D

Thanks for your help

Last edited by a moderator:
If the value of the non-central chi-square distribution is 0.965, that means that the probability of rejecting the null hypothesis under those conditions (also know as the "power of the test") is 0.965.

Thanks! How about the formula and parameters used to calculate the power? I will just copy paste the portion of my previous post related to this question:

If I get it right, to obtian the power of a test against an alternative, we calculate the non-central cdf according to the formula as written here http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution using the following parametrs:

- "x" is the chi-square quantile obtained using the pre-specified confidence level (0.01 in the example) and degrees of freedom (2)
- "k" is degrees of freedom (2)
- \lambda is the ncp as calculated before

Is that correct?

Yes, given the example in the paper, that is correct.

awkward,

Sorry to bring this from the dead, but I've been working this out and got some weird results.

Here's an example. Let's say we test a 100-sided dice. The null hypothesis is that each side comes out with the uniform (1/100) probability - that is the dice i fair.

Now, I define an alternative hypothesis - the first 50 values come out with probability 1/100 + 9/1000 and the last fifty come out with probability 1/100 - 9/1000. Then using the notation in the paper, we derive:

\delta = rootsq(n) * (90/1000, 90/1000, ... , 90/1000, - 90/1000, -90/1000, ... , -90/1000)

Then, the ncp becomes:

\lambda = 81 * n

Now, lets say one needs 95% level of significance, and we check what is the power of the test with ONLY ONE sample, that is n=1:

> qchisq(.95,99)
[1] 123.2252
> 1-pchisq(.Last.value, 99, ncp=81)
[1] 0.9968736

This does not make any sense! We get power of 99.7% with a single toss of a 100-sided coin ???????

Something's fishy (possibly my brain :ddd). What is wrong in here?

thanks )

I think you need to re-read the formula for the non-centrality parameter given in the paper:

$$\delta^T diag(p^0)^{-1} \delta$$

It might help to do a smaller example first, where you can actually write out the vectors and matrix-- a four-sided coin, say.

 Oops, I take it back, you need to check your calculation of \delta. It still might help to work a smaller example first, though [/edit]