# Understanding probability, is probability defined?

1. Aug 4, 2013

### bobby2k

I have taken a course in probability and statistics, and did well, but still I feel that I do not grasp the core of what holds the theory together. It is a little weird that I should use a lot of theory when I do not get the simple building block of the theory.

I am basically wondering if probability is defined in some way?

In the statistics books I have looked in, probability is not defined, but at the beginning of the book, they give a describtion of how we can look at probability, and this is usually the relative frequency model, but they never define it to be this?

These steps is what I seem to see in a statistics books, do they seem fair?

1. Probability is described in terms of events, outcomes and relative frequency, but never defined.
2. A lot of theory is then built regarding probability.
3. Then with the help of Chebychevs inequality, we are able to show that the relative frequency model is correct. That is, if the probability for an event is p, and X is a bernoulli random variable, then mean(X) will converge to p.

Do you see my problem? If we say that the probability for an event is p, then we can show that the relative frequency of the of the event in the long run is p. In order to show this, we used all the theory of linear combinations, variance etc.. But this means that the relative frequency model is a consequence of our theory, correct?

I mean, we can not say that the probability is the relative frequency, then develeop a lot of theory, and then prove that p equals the relative frequency, then we are going in a circle?

2. Aug 4, 2013

### Stephen Tashi

If you look closely about what theorems in probably say about relative frequency, they only talk about the probability of relative frequency taking on a certain value. They may have wording such as "the probability approaches 1 as the number of trials approaches infinity", but this is still not a guarantee that relative frequency will behave in a certain manner - it just probably will.

Probability theory not circular in the way that you describe, but it is circular in that results of probability theory are results about probabilities of things, not guarantees of actual outcomes.

In the axiomatic statement of probability theory, probability is not defined in terms of relative frequency. It is defined abstractly as a "measure". If you look closely at the axiomatic development of probability theory ( the high class approach, not the approach taken in elementary texts) you will find that there isn't any discussion of whether an event actually happens or not. There isn't even any assumption that you can take a random sample - there are only statements that random variables of various kinds (which people think of as representing random samples) have certain distributions.

The mathematical theory of probability does not describe any way to measure "probability" in the same way that physical theories describe how to measure a quantity like "mass" or "force". It is not clear whether probability has any physical reality. If it does then it is rather mysterious. Consider how the probability of an event changes. If a prize is placed "at random" behind one of 3 doors, the probability of it being behind the second door is 1/3. If we open the first door and the prize is not there then the probability of it being behind the second door changes to 1/2. Does this involve a physical change in the doors? Does the probability change from 1/3 to 1/2 instantaneously or does it go from 1/3 up to 1/2 in a finite amount of time? The mathematical theory of probability does not deal with such questions. A person who applies probability theory may tackle them, but mathematically he is on his own.

3. Aug 4, 2013

### economicsnerd

^ This.
The most popular formalism for probability consists of (i) states of the world, (ii) events (where an event is just a collection of states), and (iii) a number attached to each event, which is just called the "probability" of said event.

The formalism exists even without any interpretation.

Indeed, one popular interpretation of probability is the frequentist one. The (strong) law of large numbers---the theorem to which you alluded here---suggests that the formalism somehow agrees with the frequentist interpretation. It suggests that, if somebody really wants to think of probability in terms of long-run frequencies, then then it usually won't lead them astray in doing rigorous study of probability theory.

4. Aug 5, 2013

### bobby2k

So, probability in it's core, is just a measure of likelihood?, with 0 not happening, 1 certain to happen, and if P(A) > P(B), then it is more likey that the event A happens over B? We can say no more of probability as it is defined at the bottom of all the theory?

It may sound stupid, but I still feel that there is a gap between saying that probability is a measure of something, to us beeing able to calculate probabilities, make confindence-intervals and all that stuff.

From what I see we can do is this:

1. definie probability as a measure of likelihood as you said
2. define events, outcomes etc
3. define random variables, both continous and discrete etc.
4. define probability distribution functions for the random variables
5. define expected values and variance
6. calculate expected values of linear combinations, show that the law of large numbers etc. holds(chebychev)

If I do the things in this list, I run into a problem at step 5. If I do not allready have the relative frequency model in the back of my mind, step 5 does not make any sence. I mean when I learned to understand the expected value, I thought of the probability as relative frequencies for expected values to make sense(it was the avarage in the long run, for this to work, we have to look at probability as frequencies). But I can not really do this, because this comes in step 6, after expected value have been defined. How is this explained?

Thanks for your time guys, this is really important for me to understand.

Last edited: Aug 5, 2013
5. Aug 5, 2013

### Stephen Tashi

I did not say probability is defined as a "measure of liklihood". I just said it was defined as a "measure". A "measure" in mathematics is an abstraction of ideas connected with the physical ideas of length, area, volume etc. When we apply probability theory, we think of probability as a tendency for a thing to happen - but that thought is not expressed in the axioms of probability theory.

An attempt to define probability as "tendency for something to happen" or a "liklihood" merely offers undefined words such as "tendency" or "liklihood" for the undefined word "probability". Such a definition has no mathematical content. (As a matter of fact, the word "liklihood" has a technical definition in probability and statistics that is different than the man-in-the-street's idea of what liklihood means.)

You apparently are seeking a formulation of probability theory that somehow guarantees some connection between the mathematics of probability and applications to real world problems. There is no such mathematical theory. Applications of any sort of math to the real world involve assuming certain math is a correct model. There is no mathematical proof that mathematics applies to the real world. There is no mathematical proof or definition that says probability is a frequency of occurrence. The only connection between probability theory and observed frequency is that probability theory tells you about the probability of various frequencies.

The expectation of a random variable can be thought of as the average of taking infinitely many independent samples of the random variable, but such a thought is a way of thinking about how to apply probability theory. It isn't part of the mathematical theory of probablity.

6. Aug 5, 2013

### bobby2k

Thanks, I still have some follow-ups, I hope that's ok, I am getting closer to the end though.

Does this step by step seem fair then:

1. Probability is a "measure" but undifined. However we say that it is a measure of how likely something will happen.
2. We define the basic probability axioms, these are mathematical.(P(S)=1 etc.)
3. We define dependent and indepent variables. We define that the measure of two independents events are going to happen, to be the product of those two individual measures.
4. We define expected value and variance mathematically, we don't give them any other meaning.
5. Since we know have defined the measure of indepentent events, if X is a bernoulli random variable, we get that the measure of mean(x) beeing close to p, approces one as the number of events goes to infinity. All this is still only matehmatical, and all it means is that the measure goes to 1.

Then we start assuming things:
6. Lets say there is a price between one of three doors. Since we assume that it is equally likely that each door has the price, P(door 1 has price)=1/3. Still, this is just a meassure of how likely the price is there.
7. Then we choose a door many times, and count how many times we are correct. Now in our real physical world, we assume that the it is equally likely to get the price each time, no matter what we get the previous times. Now we adopt the mathematical model, we say that since they are physically indepentent, we assume that their probabilities can me multiplied. Then we get that the probability that we will guess correct 1/3 of the times approces 1, as the number of trials goes to infinity.

What more do I need to do/assume, to be able to say that the relative frequency of the number of correct guesses will approch 1/3? Is it ok to say that since one axiom defines probability to be maximum 1, then we can say that it is extremely likely that the relative frequency will approch 1/3?

7. Aug 5, 2013

### jostpuur

One of the most important lessons in philosophy is that nearly nothing can be defined properly. Everytime you define something, you use some other concepts in the definition, and the definitions of the used concepts become new problems.

The concept of probability is one of those eternal philosophical problems. It seems intuitive, but cannot be defined.

Mathematicians have rigor definitions for measures and random variables, but these definitions don't give answers to what probability is. In the mathematical approach, the intuitive idea of probability is assumed accepted in the beginning, and the theory is then developed with rigor mathematical definitions into which some intuitive interpretations are attached.

Science is not only about knowing as much as possible, but also about knowing what you don't know.

8. Aug 5, 2013

### jostpuur

Keep in mind that likelihood has its own meaning in statistical inference. Likelihood and probability are different things, and in fact probability is needed in the definition of likelihood.

http://en.wikipedia.org/wiki/Likelihood_function

So likelihood should not be used rhetorically when attempting to define probability.

9. Aug 5, 2013

### Stephen Tashi

Can you explain what the goal of these steps is supposed to be?

You aren't paying attention to the previous posts. It doesn't do any good, mathematically, to say that "probabllity" is a measure of "how likely" something is to happen. The idea of "how likely" contains no more information that the word "probability".

You are correct that the basics of probability theory are implemented as definitions.

I think you want to phrase that in terms of N independent realizations of X and in terms of the mean of those realizations, not in terms of the single random variable X.

Limits of things involving probabilities are complicated to state exactly. They are more complicated that the limits used in ordinary calculus. To make your statement precise, you'll have to study the various kinds of limits involved in probability theory.

Again, limits of probabilities are complicated. If the number of trials is not a multiple of 3, the fraction that are bernoulli "successes" can't be exactly 1/3. So the limit of the probability of getting exactly 1/3 successes doesn't approach 1 as the number of trials approaches infinity. To express the general idea that you have in mind takes more complicated language.

You can't say that it "will" by any standard assumptions of probability theory. If you express your idea precisely, you can say "it probably will".

Realize that when you say "extremely likely", you aren't saying anything that has mathematical consequences. You are just using words that make you feel psychologically more comfortable. There is no mathematical definition for "extremely likely" except in terms of "probability".

Look at the formal statement of the weak and strong laws of large numbers and look at the sophisticated concepts of limits that are used ("convergence in probability" and "almost sure convergence").

You aren't going to get around the fact that probability theory provides no guarantees about the observed frequency of events, or about the limits of observed frequencies except for those theorems that say something about the probability of those frequencies. You are presenting your series of steps as if the goal is to say something non-probabilistic about observed frequencies or to prove that "probability" amounts to some kind of observed frequency. This is not the goal of probability theory.

10. Aug 5, 2013

### bobby2k

Thanks for still beeing in the thread, really appreciate it!
My goal in the steps is to have a pathway from the basic building-blocks to the more complex usage, and results. For instance I really liked that the theory of integration and derivation, can be built from the 10 basic axioms+the axiom of the least upper bound. It is very interesting when you see the complex theorems beeing built with starting with these axioms and then making the exterme value theorem, intermediate value theorem etc. and then going on. I want to see something similar in probability theory, but it is difficult.

Ok, I get that we can say that the probability of those frequencies goes to 1. But what does this mean then? That it is "probable" that the frequencies will behave like this?

11. Aug 5, 2013

### Stephen Tashi

As I said, if you want to know what it means, you have to deal with the various ways that limits involving probabilities are defined. To say "the probability of those frequencies goes to 1" is not a precise statement. (In fact the probability of observing a frequency of successes exactly equal to he probability of success for a bernoulli random variable that is the subject of a large number of independent trials goes to zero as the number of trials increases.) If you want to understand what probability theory says about the limiting probability of observed frequencies, you have to be willing to deal with the details of how the various limits are defined.

12. Aug 5, 2013

### bobby2k

Can you reccomend a good book so that I will be able to learn what I need, to understand what I want?
I'd like it to be not so long, and easy to read if possible. I have not taken real analysis yet(or measure theory), but I have read about logic and set theory on my own, so I can del with that if the book contains it.

13. Aug 5, 2013

### Stephen Tashi

I didn't encounter the various types of limits used in probability theory until I took graduate courses, so I can't recommend a book. I'll keep my eyes out for something online that explains the various types of limits ( usually referred to as types of "convergence of sequences of functions").

Perhaps some other forum member knows a good book.

14. Aug 6, 2013

### jostpuur

bobby2k, it looks like you feel like looking for something, that cannot be found (at our time at least). IMO you have already understood the essential. You only need to calm down, take a step back, and try to see the big picture.

Yes, you are seeing a real problem with a circular thinking. If you define (or attempt to define) probability with frequencies, and then use probability concept to prove some basic frequency results, you are going in a circle.

Some of the basic results related to the frequencies are important, so I wouldn't speak bad about them. But if we are discussing attempts to define probability (which is philosophy IMO), the circular thinking should be recognized.

(The choice of words is confusing, but you can tell what's the point in the quote)

It means that we have assumed the probability concept defined and accepted, and then we have proven something technical / mathematical about probabilities of some sequences.

15. Aug 6, 2013

### Stephen Tashi

Probability theory is more of a tangle than single variable calculus. (In fact, I've read that developing probability theory is one of the main reasons that calculus was extended to include ideas like Stieltjes integration and the more general idea of "measures".)

In my opinion, mathematical topics have a "flat" and simple character when they involve the interaction between one kind of thing and a distinct kind of thing. For example, in introductory calculus, you study the limit of a function in the situation where the limit is a number. When a mathematical subject begins to study the interaction of a thing with the same kind of thing, it takes on the complexities of the "snake eating its tail" sort. For example, in real analysis you study the situation where the limit of a sequence of functions is another function. It turns out that this type of limit can be defined in several different non-equivalent ways, so even the definition of limits becomes complicated.

In probability theory, if you think of the object of study as a single "random variable" then the situation appears "flat". However, as soon as you begin to study anything involving several samples from that random variable, you introduce other random variables. Typically you have one random variable (with it's associated mean, variance etc.) and you have some sampling procedure for it. The sample value is itself a random variable. (Technically you aren't supposed to say things like "the mean of the sample 'is' 2.35" since "2.35" is only a "realization" of the sample mean. Of course both non-statisticians and statisticians say such things!) Since the sample mean is a random variable, it has its own mean and variance. The variance of the sample is also a random variable and has its own mean and variance. There is even ambiguity about how the quantity "the sample variance" is defined.

justpuur says to be calm. I'll put it this way. As you study math, you will find calm quiet areas where complex things are developed from simple things. However, there are also many turbulent places where things are developed from the same kind of thing. Don't get upset when this happens. Don't get upset because theorems in probability theory only tell you about probabilities.

16. Aug 6, 2013

### jostpuur

For example, in calculus, beginner students often feel like there is something about infinitesimals that they have not understood, but which could be understood (which is true). Then they also observe that they are unable to solve some technical calculation problems. Then, from the point of view of the beginner student, it might make sense to contemplate on the infinitesimals, because it seems like it could be, that better understanding of infinitesimals could lead to improved capability to solve technical calculation problems.

Then it takes some time for the student to learn that actually better understanding of infinitesimals will not improve capability to solve technical calculation problems, but anyway, it seemed a reasonable idea from the point of view of a beginner.

In this thread bobby2k began with quite philosophical touch (IMO), asking about circular definitions with sequences and so on. But then...

Ok so now bobby2k is only wanting to learn the definitions for sake of pratical applications?

These are elements of the thread:

There are some philosophical problems related the definition of probability.

There are complicated probability problems, whose mathematical treatment isn't obvious (at least not to everyone, beginners, us...)

Perhaps better understanding of the definitions would lead to better capability to solve technical problems?

Well there's no way to know in advance what turns out to be useful and what useless. You have to keep open mind, and remember not to get stuck in ideas to don't seem to lead anywhere.

17. Aug 6, 2013

### atyy

The idea that probability is relative frequency is not part of the mathematical structure of probability theory. The mathematical theory just defines abstract mathematical concepts with names like measure and expectation. When we say that probability is relative frequency, we are interpreting the mathematics and giving the abstract concepts operational meaning, so that the mathematics has the possibility of being used to describe and predict the results of experiments.

It is the same with geometry. Points and lines are abstract concepts. When you think of a point as the mark you make with a pencil on paper, then you are interpreting the mathematics.

In both cases, the mathematics exists without the science. Probability theory exists without relative frequency, and geometry exists without pencil and paper. The idea of probability as relative frequency, or that a point is something you draw with a pencil on paper are additional things you add so that you can pass from mathematics to science.

In Bayesian interpretations of probability, probability is not necessarily relative frequency. Some Bayesian interpretations, like de Finetti's are beautiful, if impractical to carry out exactly. Others are very practical and powerful, for example in providing evidence for a positive cosmological constant like in http://arxiv.org/abs/astro-ph/9812133 and http://arxiv.org/abs/astro-ph/9805201.

18. Aug 8, 2013

### bobby2k

Thanks for your patience guys. I was maybe not clear enough in my question. Maybe a better formulation would have been, "why does probability theory work, when the intuiative(relative frequency) part of probability, is not defined in the axioms?".

You may say that, it was not meant as a deep question. But it isn't really to do better in practical applications. Because just by having the intuitive explanation in the back of my mind, I can solve all the problems. But it is more rewarding to understand why we can use this to solve the problems.

I think my main problem is/was that I struggle to see where we go from the math to making assumptions and making a model. It seems alot easier in physics, there you may model a car with friction, it is clear what is the mathematical model, and what is the real thing. I mean, I thought that flipping coins hypothetically could be part of the mathematics, but it seems as though we have moved out of the probability world, even though it is just hypothetical.

I think I have finally wrapped my head around what many of you say. That the CLT only says something about what the probability of an event is, nothing more about what probability really is.

I made a picture to try and communicate how I view it now. Have I understood it?:

Last edited: Aug 8, 2013
19. Aug 8, 2013

### bahamagreen

A strange thing about probability is that it is not like other fundamental theories - it is not time reversible even at the smallest scale.

An event (to happen at a particular time) is said to have a probability before that time, but afterwards, what happens to that probability? It disappears or changes into a certainty?

That is, one can only use probabilities to describe things in the unknown future, not the certain past. One does not continue to say that the probability of a past event is 1/3 or 1/5; after the event the historical probability would have to be 1 or 0... but what if you don't know yet?

It gets more mysterious when a past event would seem to have either occurred or not, but we have not checked it yet to know which way it came out... can there be a probability for the person who has not checked, and a certainly for one who has?

20. Aug 10, 2013

### bobby2k

That is some very interesting points bahamagreen. :)

But do you guys think that at the basic level, my picture above describes the interaction with probability and using probability in an adequate way? I am eager to get closure. :)

21. Aug 10, 2013

### Stephen Tashi

No. People who apply probability theory (correctly know that probabilities do not represent relative frequencies. And you haven't yet dealt with the meaning of "mean(x) -> p".

To apply math, you need "understanding", not "closure".

22. Aug 10, 2013

### bobby2k

But is the text I have marked in red in my statistics book here then wrong? It deals with interpreting a confidence interval, but it may as well be interpreting the probability of a confidence interval.

23. Aug 10, 2013

### Stephen Tashi

Yes. It's wrong. The problem is the the statement that "A will occur 95% of the time". If you want to say something like "I'd be willing to bet that A occurs about 95% of the time" or "We will do calculations assuming A occurs 95%" of the time, those statements could be called an "interpretation".

As I pointed out before, if an event has probability .95 of occurring and you conduct a large number of independent trials, the probability that the event happens with a relative frequency of exactly .95 in the trials approaches zero as the number of trials approaches infinity.

By the way, from that page, it looks like your textbook is about to make an important point regarding confidence intervals. Do you understand how to interpret confidence intervals correctly?

24. Aug 10, 2013

### atyy

I'm not sure whether it is "exactly" ok, but it looks fine to me. My feeling is that one shouldn't lose sleep over this.

I don't think the problem is related only to probability theory. What is an electron? It is a particle that is deflected by an electric field. What is an electric field? It is a thing that deflects electrons. There is no problem if we consider electron and electric field as mathematical objects, but what happens if I give you an unidentified particle and ask you to show me that it is an electron?

So to connect mathematics with physics, it seems we always need some circularity. We accept as useful the mathematics and the interpretation as long as their predictions are consistent with observation.

From David O. Siegmund's Britannica article:
"Insofar as an event which has probability very close to 1 is practically certain to happen, this result justifies the relative frequency interpretation of probability. Strictly speaking, however, the justification is circular because the probability in the above equation, which is very close to but not equal to 1, requires its own relative frequency interpretation. Perhaps it is better to say that the weak law of large numbers is consistent with the relative frequency interpretation of probability."

Last edited: Aug 10, 2013
25. Aug 16, 2013

### bobby2k

My main problem when I started the thread was that I thought that the law of large numbers somehow validated that we could look at probabilities as relative frequencies(since both contained relative frequencies). But as we can read in the link you give says that the law of large numbers is consistent with the relative frequency interpretation. So this means that if we choose to use the relative frequency interpretation, then the mathematical theory seems "fair" to use?

But how would you connect the relative frequency interpretation to the axiomatic mathematical theory? I do not mean a specific connection in the sense of a theorem that guarantees something. But some connection there must surely be? And the only connection I see is that when people assume probabilities represent relative frequencies, then the axiomatic theory seems fair. Do you agree that the connection comes only when you choose how to view a probability?

I have to admit that I interpret them as the book writes, that if the confidence level is 0.95, then about 95 percent of confidence intervals in the long run will contain the parameter, because of this we can be "confident" that the paramter is in the interval we made, even if we just make one.
I have done some research about the subjects you have talked about and the thing is that they are part of the upper courses in the bachelor, and a master in mathematics in for example stochastic analysis. At my school atleast people taking even a master in statistics does not learn about these subjects(measure theory etc.) And think about how many people learn statistics(echonomists, social sciences etc.), surely not all of these will have learn about the advanced mathematical theory. This must mean that there is an adequate way to understand probability, without a master in mathematics?