# Just what the heck is probability anyway?

1. Apr 18, 2014

### csullens

Why do I feel like I am about to be fed to a pack of hungry wolves? I take it as a foregone conclusion that anyone who bothers to answer this question is on an entirely different level of math knowledge than me...however even lay people should be able, and encouraged, to ask question here right? I have a fairly competent math background (completed about 75% of my physics degree before switching majors), but to this day I still manage to baffle myself regarding the essential nature of probabilities. Especially when I try to explain it to someone else. So please...someone try and explain to me exactly what a probability is.

So just what exactly is meant by the statement "the probability of heads is 0.5?" How can one verify empirically such a statement? Or how can that statement lead to predictions about what a set of coin flips will look like? With any given set, no matter how large, how can you know that you are not observing an unlikely series? Aren't all series of coin flips equally likely? H-H-H is no more or less likely than H-T-T or H-T-H right? So how can H-H-H...100 times in a row be considered to be unlikely? Isn't just as likely as any other set of outcomes?

I know this isn't a well formulated question, but I hope you can see the difficulty I have and I hope a bit of discussion will help enlighten me. Thanks for taking the time...

2. Apr 18, 2014

### micromass

This is a very good question. But I fear that nobody really know an answer to this. Probability is one of those intuitive notions that we can't really explain.

So indeed, if you throw a coin $100$ times, then our intuition expects around $50$ tails. Of course, this isn't quite correct. We might have $0$ tails or $100$! The probability (that word again) that this happens is very small though.

I think probability can only really be understood with infinities. That is, with limits. So let's say you throw a coin $n$ times. Let $f(n)$ be the amount of tail. For example, if we throw 100 times and 43 is tail, then $n= 100$ and $f(100) = 43$. The idea is that if $n$ is large enough, then $f(n)$ is very close to $n/2$. In mathematical language, we write this as

$$\lim_{n\rightarrow +\infty} \frac{f(n)}{n} = \frac{1}{2}$$

Probability doesn't necessarily say anything about "finite" occurences. Even if we throw 1000000 times, we might still have 0 tails. But rest assured, if we keep throwing, then eventually we will get close to 50%. The problem is that we don't know when this will happen.

Nevertheless, "finite" occurences can still be well approximated by probability. So if we throw 1000000 times then we can't exactly be absolutely sure we have around 50% tails, but will almost always be the case.

So we can't give precise definition of probability. In math we solve this by introducing probability axiomatically, for example through the Kolmogorov axioms. It turns out (and can be proven) that this axiomatic approach is a good model for our intuition.

3. Apr 18, 2014

### gopher_p

For emphasis ...

I think the best answer for what it means to say that the probability of a coin flip landing heads is 50% is something along the lines of "the real-world phenomenon of flipping a fair coin is best modeled (we think) as a binomial distribution with p=1/2". That's not to say that the real-world coin has to follow the "rules" of the model. It's that we think that the model is the best mathematical way to understand that system.

4. Apr 18, 2014

### jbriggs444

You can find some food for thought at: http://en.wikipedia.org/wiki/Probability_interpretations

Sometimes I think of probability as a particular sort of lack of knowledge about a physical system. We have only a general idea about the initial conditions that go into a particular flip of a coin. But we know that the result will be "heads" or "tails" and we know that, in some sense, the set of initial conditions that will lead to an outcome of "heads" and the set of initial conditions that will lead to an outcome of "tails" are equally large.

(Disregarding the possibility of coins landing on edge, a two-headed coins, loaded coins, etc).

In mathematical treatments, the notion of largeness of sets gets formally fleshed out with measure theory -- the probability of an outcome is given by the "measure" of the [possibly infinite] set of conditions that correspond to that outcome.

5. Apr 18, 2014

### jasonRF

This is a hard question - unfortunately most probability books seem so sweep this all under a rug. In general I tend to think along the lines of gopher_p above (EDIT: and jbriggs444 - the comment about initial conditions is spot on of course).

One book that discusses some of your questions head-on is "Probability Theory: the logic of science" by Jaynes. He was a physicist, so you will likely enjoy some of his writing. A legal free copy of a draft of the book can be found at
http://omega.albany.edu:8008/JaynesBook.html

You may enjoy chapter 10, which analyzes the physics of the coin flipping experiment (and other "random" experiments").

jason

Last edited: Apr 18, 2014
6. Apr 18, 2014

### 22990atinesh

When I was in high School I also use to think why we read these courses like Calculus, Probability, etc and what the heck they actually mean. But as I started digging the things after my College, What I figure out is that If you haven't told the practical use of the course then its really hard to build the interest for that course. But if you understand how to actually use it, then believe me you'll start learning things at more greater pace. Enough of this lecture now, lets talk about your issue of Probability.

We can relate probability with Chances or expectations. It doesn't mean "exact" its just mean "expectations". Suppose when I flip a Coin 100 times and say that the

then it doesn't mean head will turn up exactly 50 times out 100 flips. It just mean we expect total heads turns up somewhere close 50. When you do practical then you'll find it's right. Just try to flip a coin hundred times, If you are unlucky then It may even possible that out 100 flips u didn't get head in any flip. But as you repeat this experiment more and more number of times then you'll see P(H) is close to 50.

In Engineering and Scientific studies most of the time we come accross this situation. So Probability is the mathematical way for dealing with those situations.

Last edited by a moderator: Apr 18, 2014
7. Apr 18, 2014

### csullens

Thanks for all the replies. Definitely gives me something to think about. Thanks for not just pasting the definition of probability as well. I am a health professional and the way that probability comes up in my profession is generally less in the "axiomatic" way and more in terms of how reliable empiric evidence is. A study on a drug may come out showing that the drug lowers blood pressure and reduces death by some amount. And that study will have a p-value of something like 0.988. When it comes to assigning probability based on evidence, I think I get more confused and it taints the whole concept of probability in my mind.

Suppose that we don't know the probability of heads, but instead do an experiment by flipping a coin and recording the results. On the one hand I understand that the experiment is likely (there's that word - it's hard no to be circular) to show a frequency near 0.5 for heads. And the more times I flip the coin, the closer the frequency should generally be to 0.5. But after any arbitrary number of flips, do I really have any information about the probability of future flips? After all, any particular series of flips is equally likely to occur. And the series H-H-H...a million times in a row can't be said to be any less likely to occur than any other series of flips. And there is no number of observations that I can make which will save me from this doubt.

If after a large number of flips I find the frequency of heads to be 0.487, then by statistical method I can make a statement like "the measured probability of heads is 0.487 +/- something, and the real probability of heads is 99% likely to fall into that range." But that seems circular to me. Because now I am back to the original question, what does it mean to have a 99% confidence interval? And I'm sort of back where I started. Thanks for the conversation guys.

8. Apr 18, 2014

### micromass

No, you don't have any information about the future flips. So even if you had 100 heads in a row, the following still has equal chance on head or tail. This is the gambler's fallacy: http://en.wikipedia.org/wiki/Gambler's_fallacy
This is why I brought up infinities in my post. Probability only makes a statement about infinite occurences, not about a mere finite number of trials.

Still, it forms a very reliable approximation to reality, even with very little trials. It's just that you can't claim absolute certainty with a finite number of trials. The more trials you do, the more certainty, but never absolute!

Well yeah, you'll have to rely on your intuition sooner or later. Probability, or "likely" is not something you can define.

What I always like to do is to compare probabilities to real-life statements. For example, let's say I have a friend in NY. Then my chance of winning the lottery is the chance of me knocking on a random door in NY and my friend opening it. That puts a lot of perspective on things.

Even though people have a intuitive idea of probability, their intuition is always a little off: http://www.psychologytoday.com/articles/200712/10-ways-we-get-the-odds-wrong It's a tricky little thing.

9. Apr 18, 2014

### homeomorphic

It's hard to say what is meant exactly, but there are two possible outcomes, and one of the is heads, so you could say it's the number of successes over the number of outcomes. When you are dealing with discrete probabilities like this, it's a little easier because it's sort of a matter of counting things up and finding a ratio, but sooner or later you still have to assign probabilities by saying each outcome is "equally likely" or something like that, and that's where the philosophical trouble comes in.

Repeated experiment, hypothesis testing, central limit theorem, blah, blah, blah. There's a weird thing about repeated experiments, though: you could think of a repeated experiment as a single experiment. Therefore, the idea of using repeated trials to define probability is logically shaky. Say I flipped the coin 1000 times. Consider that experiment number 1. Then, say experiment number 2 is that I flip it another 1000 times. Who's to say that the results of experiment number 2 will have any relation to those in experiment number 1? A priori, there's no reason to expect that there is any consistent pattern or that some well-defined probability distribution will arise. However, physically, we sort of know something about the process of flipping a coin, and that's why we can get away with this sort of thing in practice. Only based on physical intuition can we conclude that our mathematical assumptions of independence and identical distribution are okay. If those assumptions are okay, we have our central limit theorem and everything should work. Of course, there are still problems, like if you say 5% is statistically significant, then 5% of the time, you are still going to make an error when you say something is not just due to chance.

That's a good observation, but you seem to be missing something here. If you just look at outcomes, they are all equally likely. But if you look at other questions like "what is the probability of getting 100 heads out of 1000 flips?", there are many different outcomes that give you 100 heads. Getting 500 heads is more likely. The sequences making up these two events are all equally likely, but in the case of the 500, there are more of them.

10. Apr 18, 2014

### chogg

For a clear, thorough, opinionated, Bayesian answer to your question, you can peruse the following free textbook:
http://uncertainty.stat.cmu.edu/

I find it helpful to think of probability as a measure of (un)certainty. Others insist it can only be defined in terms of limiting frequencies. The uncertainty view accommodates all these cases, but it also handles perfectly sensible questions which the frequency-only view cannot. (For example: "What is the probability that it will rain tomorrow?".)

Actually, your best bet would be to read "Understanding Uncertainty", by the late Dennis Lindley.
https://www.amazon.com/Understanding-Uncertainty-Dennis-V-Lindley/dp/0470043830

Last edited by a moderator: May 6, 2017
11. Apr 18, 2014

### bahamagreen

Uncertainty seems to be a primary part of it.

Before you flip the coin, you assign heads p=.5

After you flip the coin, you have a result... say, heads.

How do you characterize the probability figure after the fact?

Now that you know, do you think it was actually p=1 all along (but you just didn't know yet)... because that is what happened? And so p changed from .5 to 1?

Or do you continue to think that p=.5 describes heads for that flip, even after the fact?

Or do you say that p is only meaningful for things that have not yet happened? If so, does this make probability different from other fundamental physics because it is not time reversible?

12. Apr 19, 2014

### FactChecker

There are a few points to make:
1) You say, all exact sequences are equally likely. As @homeomorphic points out, there is only one exact sequence that gives 100 heads whereas there are many sequences that give 50 heads. So 50 heads is much more likely. The supposed independence of coin flips also makes all heads much less likely (see point 3)

2) If you saw a coin flip that gave 100 heads in a row, even though that is just as likely as any other sequence, you should doubt that it is a fair coin. A Chi-squared goodness of fit test will show that it is very unlikely that this sequence came from a 50/50 fair coin. Another sequence that is more typical for a fair coin will give a much higher probability in the Chi-squared test.

3) To say that there were 100 independent flips of a fair coin says a lot more than 50/50 probability. The independence implies a lot about zero correlation. So there are many statistical tests where the series of all heads is much less likely than other random sequences. In those tests, the sequence of alternating heads and tails would also score low. Any of the tests can be used to show that either the all-heads coin was not fair, or that you have witnessed something so rare that no one else has ever seen it.

Last edited: Apr 19, 2014
13. Apr 22, 2014

### csullens

I understand the point you are making. But to clarify, if you observe a repeating event you do in fact acquire information about future events. You may not know with certainty what the next coin flip will be, but by observing event frequencies you can estimate a probability of future event frequencies. It requires assuming that the future events will be similar to past events. An assumption which I'm not sure is legitimate, but empirically it seems to work. I understand the gambler's fallacy, but the interesting thing to me is the sort of dichotomy of, on the one hand not knowing what the next coin flip will be, but on the other-hand being able to make predictions of what the frequency distribution is likely to look like in the future based on past observations. So in that sense you do have some information about future flips.

14. Apr 22, 2014

### chogg

That's a very good point. We can think of this in terms of starting with two different models. Let's assume the probability of heads is the same for every flip(*); call it $p$. micromass has assumed we know a priori that $p=0.5$; in many scenarios, this is very reasonable. Your model has some uncertainty about $p$. Maybe it's a uniform distribution from 0 to 1; absent any other information, this seems like a good choice if you think the coin might be biased.

I actually explored precisely this scenario in a recent post. micromass's model is the one I called $\text{fair}$; yours (if you choose a uniform distribution) is the one I called $\text{biased}$.

You can actually combine them using a mixture model. Then, for any sequence $S$ of coin flips, you will have some posterior probability for each model. In other words, just as you say: the coin flips will influence your beliefs about future coin flips.

To be perfectly thorough, you want to compute $P(\text{H}|S)$, the probability to see "heads" next, given the sequence $S$ of flips so far. This is given as
$$P(\text{H}|S) = P(\text{fair}|S) \frac{1}{2} + P(\text{biased}|S) \int\limits_0^1 p P(p|\text{biased},S)\,\,dp$$
(Obviously, $P(\text{fair}|S) + P(\text{biased}|S) = 1$.)

Interestingly, the upshot is the opposite of what we'd see if the Gambler's Fallacy were true. If you've seen a lot of tails, you shouldn't expect you're "due" for heads; you should expect tails slightly more than heads! (Assuming you assign nonzero probability to the coin being biased.)

(*) This assumption would fail if you had a skilled, mischievous coin flipper, who could choose the outcome every time.

15. Apr 22, 2014

### csullens

That's a very interesting point you make about the gambler's fallacy! It's true, the more you see heads, the more you ought to expect heads due to the fact that the coin is more and more likely to be biased. I'm not sure I understand the need to Fair vs Biased calculation though. Could you not simply look at the frequency of heads, say 0.45, and then say that the p for H is 0.45 +/- something. Or construct a confidence interval centered about 0.45? Of course, I suppose if you have reason to believe that the coin really is fair, then that ought to modify your confidence in the observed P. Thanks for the replies...