# Concept of probability

1. Dec 1, 2004

### Cheman

I have always thought of the cocept of probability as a bit wierd and thought I could post up a few questions to do with the basic concepts of probability and see how they can be mathematically proved. Here goes:

1) The method of measuring probabability we use, with 0< or = P < or = 1; is this just the method mathemeticians have chose to use? ie - could we say that 0 and 1 are defined as such, but we could have come up with a different kind of probability measuring system.

2) Is there a way to mathematically prove the formula P(x) = Num. ways x can occur/ all possible outcomes?

3) Why do these probabilities will relate to real life? ie - if we conduct and experiment with a dice, once we have completed a large number of throws we would expect 1/6 to be 1s, 1/6 to be 2s, etc? Is there a way to prove why this would be the case mathematically?

4) Is there a way to mathematically prove the formula P(x or y) = P(x) + P(y)?

That will do for now. ;-) I know in may at first glance seem basic, but when we learn prob you are always just told that the equation works and that this is what therefore happens in real life - it is never prooved or explain. Please could someone do that for me.

2. Dec 1, 2004

### Galileo

From a mathematical point of view, it's convenient to start with a measure theory interpretation of probability. Here P(A+B)=P(A)+P(B) (if A and B are disjoint sets) is taken as an axiom, just like $P(\Omega)=1$ where $\Omega$ is the set of all possible outcomes (the sample space).

From a more practical point of view, probability is defined by a frequency interpretation:
There has to exist some procedure to construct a sampling sequence (list of possible outcomes) of arbitrary length, $x_1,x_2,...,$ where $x_i \in \Omega$ (for example, repeatedly shaking dice count the number of eyes).
Then:
$$P(x)=\lim_{N \to \infty}\left(\frac{\mbox{The number of times } x_n=x \mbox{ for } n=1,...,N}{N}\right)$$
provided the limit exists for all $x\in \Omega$.
By extending this definition for subsets of the sample space you can easily deduce why P(X+Y)=P(X)+P(Y).
Also, from this latter definition (which was probably developed earlier) it's clear why $0 \le P(x) \le 1$.

Last edited: Dec 1, 2004
3. Dec 1, 2004

### mathman

2) No. In physical situations, the events may have unequal probabilities. For example: crooked roulette wheel - not all outcomes equally probable.
3) The mathematical proof (law of large numbers) assumes that the dice are fair, not crooked.
4) As Galileo noted, only if x and y are disjoint events.

4. Dec 2, 2004

### mathwonk

nothing in mathematics ever proves anything about real ,life, since real life stubbornly declines to stick only to cases satisfying our hypotheses.

5. Dec 4, 2004

### tongos

all i could say is that it just makes sense

6. Dec 7, 2004

### Cheman

Yeh... but why in terms of maths does it?!?!?

7. Dec 10, 2004

### Bartholomew

I just covered this in my probability class a couple weeks ago. The law of large numbers is proved by the binomial formula. If each trial has a probability of success p and the number of trials is n, the variance of a random variable X which has the binomial distribution is n * p * (1-p) and the mean is n * p. So the variance of the proportion of successes is VAR(X / n) = VAR(X) / n^2 = n * p * (1-p) / n^2 = p * (1-p) / n so that the standard deviation is (p *(1-p) / n) ^ 2. The mean of X / n is n * p / n or p. Let c be some constant, arbitrarily small. By Chebyshev's theorem the probability that the proportion of successes to failures lies between -c and c is at least 1-(1/(c/st.dev(X))^2) As n goes to infinity, st.dev(X) goes to 0, c/st.dev(X) goes to infinity, 1/(c/st.dev(X))^2 goes to 0, and the whole formula goes to 1. This can be interpreted as saying that as you run an increasing number of trials, the probability that the proportion of successes to total trials differs from the expected mean p by more than any number c can be made arbitrarily low.

8. Dec 19, 2004

### the number 42

I find probability difficult to understand, but I sometimes get concerned when a small p value is taken as proof-positive. I know that the .05 level used in psychology is arbitrary (though it probably makes sense) and Cohen points out that research in psychology has traditionally lacked power, making nonsense of the .05 level when used in such studies. When a researcher says that 'compared to chance' their results are significant, I'm starting to wonder what they mean. You hear things like '50 people score above chance on a card-guess study, which is evidence for psi'; what exactly are we comparing their results to?

9. Dec 25, 2004

### Cheman

So is the way in which we measure probability just simply the best way we feel it can be estimated or is its proof based in complex statistics that explains the large number law and how it relates to the P= number of wanted outcomes/ number of possible outcomes equation? I finds probability quite a complex concept since it appears to b quite an iffy subject - although I wish to study quantum mechanics and chemsitry further so i need to overcome my problems with it. :rofl: I can use all the equations, etc, I just don't see how they and their concepts realte to real life.

Thanks.

10. Dec 25, 2004

### robert Ihnot

I think it was Lagrange who formalized this study with the observation that events are separated into equally probable outcomes. This is a simple axiomatic kind of thing that requires no deep study or complicated results.

Thus if it is equally probable that a six sided die can come up showing from 1 to 6 pips, then the probability of a given outcome is 1/6. Where is the Law of Large Numbers, where the Gaussian curve? WE DON'T NEED ANY OF THAT YET.

So you have to look at that yourself, remember THERE IS AN ASSUMPTION about equally probable events. THIS STEP YOU MUST TAKE YOURSELF.

Now as we go further, it was de Movre, author of "Doctrine of Chances" in 1718 who looked at the chances of events such as, "At least one six be thrown in 6 throws of a die."

While it is easy to see that the possibility of six being thrown is 1/6, this advanced problem was baffling for some time because it was not always clearly recognized that the total probability of an event is 1. In the above problem, since there is 5 chances out of 6 of not throwing a six in a given trial, the answer is:

1-(5/6)^6 = .67 approximately 2/3.

Can this be tested? Of course, it can be tested, and today by computers, but it still rests of the fundamental assumption that we can recognize equally probable events, as well as the realization that the total of the various probabilities must be 1.

Last edited: Dec 25, 2004
11. Dec 26, 2004

### matt grime

The concepts relate to real life because "they work". Why do the equations of (quantum, whatever) mechanics relate to real life? It's not because of some divine power (don't take this as a religious point, hear me out) that is moving everything according to some laws we happen to have found they're using. We have observed and modelled accordingly. Strictly speaking tha strong law of large numbers is a consequence of the axioms of probability, and nothing to do with "real life" behaving like that. Probability theory doesn't cause anything to occur in real life, and nor does real life cause anything to occur in probability theory.

It's a set of rules we've agreed on that model real life reasonably well.

12. Dec 27, 2004

### Dr.ThinkDeep

First you have to understand that you cannot prove a hypothesis.
What you can do is use a hypothesis to make a prediction about an experiment, then do the experiment and see whether to experiment matches the prediction.
If they match,this proves nothing because there are so many possible alternative hypotheses that would make a sufficiently close prediction.

Only if the experiment not match the prediction, it makes sense to measure the probability of a similar discrepancy or a larger one than the one you have found. If this probability is small, i.e. small p, you have reason to reject the hypothesis.

The hypothesis that you try to reject is commonly called H0 and in most cases it chosen to predict the absence of an effect or a difference, because based on this absence it is easiest to compute the probability for finding differences. This probability density distribition for some difference parameter is more narrowly peaked for larger numbers (~N^(-1/2)).