Chronos said:
3 sigma corresponds to a 99.73% probability a process output will fall within a certain range of values. For an analog output [measured numerically], it follows a gaussian distribution. For a discrete output [pass fail], it follows a binomial distribution. Sigma is the term used to express the probability an output will fall within certain limiting values. 3 sigma is generally considered a reasonable probability a process output is, or is not random. In some applications, 3 sigma is considered pretty sloppy and higher confidence intervals are demanded [e.g. particle physics]. You can characterize the probability any particular outcome, or series of outcomes is, or is not random, but, never with certainty.
I'm sorry, but I still don't think you've defined your terms very well, and I still have a hard time understanding what you're trying to say. It
sounds like you're saying a coin showing heads 1100 times in a row only has a 99.73% chance to be non-random. If so, you're right that it's surprising and counter-intuitive, but I think intuition is actually correct here!
Let me explain myself more precisely. (After another couple iterations, I'll probably come to understand what you're saying.)
In my view, we've got two models.
- \text{fair} means that the coin is fair: when flipped, it gives heads (H) with probability 0.5 and tails (T) with probability 0.5.
- \text{biased} means that the coin has some other probability, p, to yield H. Since we don't know what that probability is a priori, we'll assign p a uniform distribution from 0 to 1.
We go and observe some data, and then we compute the probability for each model. That's going to depend on our
prior probabilities for each model, P(\text{fair}) and P(\text{biased}). Different people can have different priors, so we'll factor them out for now and focus on the
likelihood -- i.e., the probability each model assigns to the data you actually saw. In our case, this data was simply \text{H}^N: we saw heads (H) N times in a row.
For the \text{fair} model, the likelihood is simple:
<br />
P(\text{H}^N | \text{fair}) = \frac{1}{2^N}.<br />
For the \text{biased} model, we consider the likelihood for a
given parameter value p; then we integrate over all
possible values of p:
<br />
\begin{align}<br />
P(\text{H}^N | \text{biased}) &= \int\limits_0^1 p^N \,\,dp \\<br />
&= \frac{1}{N + 1}<br />
\end{align}<br />
Note that P(\text{H}^N | \text{biased}) drops very slowly compared to P(\text{H}^N|\text{fair}), which sinks like a stone. This means that our belief shifts towards the \text{biased} model very rapidly, as we keep observing heads-and-only-heads.
Now let's set N=1100. The odds that the coin is fair are given by
<br />
\begin{align}<br />
\text{Odds}(\text{fair} | \text{H}^N) &= \frac{P(\text{fair})}{P(\text{biased})} \frac{P(\text{H}^N | \text{fair})}{P(\text{H}^N | \text{biased})} \\<br />
&= \frac{1 - P(\text{biased})}{P(\text{biased})}\frac{N + 1}{2^N}<br />
\end{align}<br />
When the odds are tiny compared to 1, the probability basically equals the odds. I'm going to go out on a limb and assume that \text{Odds}(\text{fair} | \text{H}^N) is indeed tiny.
To be concrete, let's take equal priors, so that P(\text{biased}) = 0.5. Then we have P(\text{fair} | \text{H}^{1100}) \approx \frac{1101}{2^{1100}} \approx 1.2 \times 10^{-329}: this is basically zero. Even if you think the coin has only 1:10^{100} prior odds of being biased -- a prior which strains credibility beyond the breaking point -- you will still have only P(\text{fair} | \text{H}^{1100}) \approx 1.2 \times 10^{-229} -- again, basically zero.
This is
way beyond 3 sigma.
Have I missed something?