# How to Think About Probability?

1. Aug 15, 2012

I'm not sure how to think about probability when it's applied to something like statistical mechanics. My thinking might be off since I haven't taken a formal thermodynamics class yet.

I guess the example I will use is a room filled with two gases. The gases will disperse randomly. The probability of the gases randomly separating to different parts of the room is very small. I read before that you would have to wait until the end of the universe essentially for this to happen. Does that mean it will happen? Will the system the probability describes breakdown before that happens? I guess I'm asking if there is a non-zero probability does that mean it must happen somewhere or is it just that it is possibility and there has to be a sufficient amount of time for it to be seen? Sorry if this is a naive question.

2. Aug 15, 2012

### Stephen Tashi

You must be thinking of an event with non-zero probability that is given repeated independenet chances of happening.

There is no guarantee that an event with a non-zero probability (less than 1) will happen even if given such repeated chances.

The important theorems in probability that deal with probabilities of 1 or 0 have to do with taking limits of probabilities. What the value of such limits means in physical reality is not a matter of mathematics, it is a matter of how you choose to interpret letting something "approach infinity" (or whatever is involved in trying to interpret the limit) as a real world situation.

Probability theory does use terms like "almost surely", and "converges with probability 1", but these terms have technical mathematical meanings that are different from ordinary speech.

There is a circular quality to mathematical theorems about probability. They don't say that a thing will or won't happen. They don't even say how frequently events will happen. Instead they tell you things about the probability that an event will happen or the probability than an event will happen with a certain observed frequency. (If you haven't thought about the difference between the observed fequency of an event and the probability of an event, you should devote some time to that subject.)

It is clear that probability is a useful tool in analyzing some situations, but it is unknown whether probability has any physical reality. I suppose Quantum physicists are convinced it does. But consider a macrosopic type of textbook problem:

A contestant is to pick one of 3 doors to open. A prize has been put behind "a randomly selected" one of the doors. What is the probability that the prize is behind the first door? It's 1/3. Suppose the third door is opened and the prize is not behind it. What is the probability that the prize is behind the first door? Its 1/2.

How can the change in that probability be explained as a change in physical reality? Is the probability a physical property of the first door ? -something with the same standing as the mass or size of the door? Should we deny that the conclusions about the probability in this problem are valid? Are the probabilities something that exist in the mind of the contestant? All these are complicated metaphysical issues and there are diverse opinions about them. Mathematics side steps those issues. To do mathematical probability theory you must assume the probabilities exist and that they follow certain axioms.

The best way to think about probability is that it is a mystery, but it's a mystery that mathematics treats in an orderly fashion.

3. Aug 15, 2012

### chiro

Hey radrmd216 and welcome to the forums.

Your question is not naive at all.

The important thing to be aware of is if a probability is related to some process, then one needs to ask what that process corresponds to. If the probability corresponds to representing all outcomes of the process unconditionally, then a non-zero probability means that if the process is done an infinite amount of times then that outcome must occur with certainty.

If you have conditional probabilities (for example having one event happen changes the probability of the next event happening and so on), then it means that you have a distribution that is different depending on what has happened somewhere else. In these cases, you may end up in a situation where before it would be possible to get anything and now one particular value is now impossible to obtain: the simplest model of this kind of process is known as a Markov Chain or a Markov Process.

Distributions that represent a population represent a couple of things.

One interpretation is that it's the limit if you performed a process or experiment infinitely many times and calculated the proportion of getting each outcome with respect to the number of times you did the process (which tends to infinity).

Another interpretation is a "belief" in that the distribution is not empirical and reflects some kind of "assumed knowledge" or other properties based on a non-empirical nature.

You also have to be aware that distributions don't always correspond to processes per se: they can be entirely mathematical for entirely mathematical reasons. For example we have a lot of distributions that are used in statistics for the sake of statistics and not to describe anything that corresponds to a real phenomenon. So these distributions are the result of theoretical mechanisms and not something that relates necessarily to a real process.

So the bottom line is that "it depends", but ultimately if you are dealing with a distribution that represents the entirety of the outcomes of a particular process that does not change at all with the realizations that occur in the process, then if you have a non-zero probability there somewhere it will without exception occur when the time of the running process is infinite.

4. Aug 15, 2012

Does the probability that an event occurs with a certain observed frequency mean that when modeling the probability of the event it is assumed the event will occur at some point because it is an observed frequency?

chiro, thanks for the welcome. From what you and Stephen said it seems like you have to think about probability in a mathematical context and then see how well it translates to real world phenomena. Is that the best way to work with probabilities for science related things?

This question is for anybody that wants to answer. Could I estimate how long it would take for two gases to completely separate? I will probably have a few more questions.

5. Aug 16, 2012

### chiro

For this problem you should tell us whether over time, the chance to disperse changes (i.e. increases) or remains the same. I'm guessing that it increases but you need to say whether this is the case.

Firstly if it doesn't change, then the answer is "I don't know". What you could do though is you is look at the probability distribution for the area that they separate and then calculate the inverse of that probability. This gives you a rough estimate of how long you would "expect" to wait for at least one event provided all the events form a proper random sample. It doesn't "gaurantee" anything but it's a good indicator of the lower limit that you would expect to wait given that the process really does produce proper independent results from all the others.

Mathematically you calculate this as P(Getting separation on nth trial|Not getting an separation before) and this is defined with conditional probabilities. This gives you a probability of finally getting a separation. In fact you can use what is called the geometric distribution for this, but the above is from first principles.

If the chances of getting separation increase in some way you need to put this in your distribution at say the nth unit of time. Once you have this distribution you can use the same kind of idea above.

6. Aug 16, 2012

### Stephen Tashi

It isn't clear what to me what you are asking. If we think of independently tossing 10 fair coins there is a certain probability for each possible observed frequency of heads, such as 0/10, 1/10, 2/10,...10/10. If you are doing a simulation of such an experiment, you don't put any deterministic statements in the comptuer code that force a particular frequency such as 0/10 or 5/10 to occur a certain number of times when the experiment is repeated over and over again.

If you are using a probabilistic theory, it won't tell you how long. The most it will tell you is the probability that it will take a given length of time for each given length of time that you ask it about. As I said, probability theory tells you about probabilities. You could ask about the "average" or "expected" length of time. That's a specific number, but it isn't a certain event.

(If you want to know the specifics of gases, you should post in the section of the forum where statistical physicists hang out. )

7. Aug 16, 2012

I think I understand what you are saying. Probability does not tell me anything other than what the math defines as the probability. So if I observe an experiment for a given amount of time, all I can say is that I will see a certain event with a certain probability. So if I want to relate probability to the the frequency I observe something I have to take factor in time and make other assumptions outside of probability theory. Is that correct? Sorry if I am missing something obvious.

I'm studying polymer science so I'm not really looking at specifics for gases. I would like to learn more about mathematics because I could bring a similar approach to different situations. It would also let me know why I'm doing something instead of just knowing in a certain situation this is what I do. I'm in a materials science and engineering program so mainly its just learn enough of the math to be able to calculate something. There is so much information to possibly learn, but only only a finite amount of time and a lot of deadlines.

8. Aug 16, 2012

### Stephen Tashi

I can't think of a reasonable scenario where a person would make assumptions outside of probability theory in order to claim certainty about how frequently a probabilistic event will occur. If a process is giving the event an independent chance to occur every second, then increasing time would increase the probability of the event happening at least once since it is given more chances.

Some people talk about what will happen in "infinite time". However, when you apply this to reality, this is a case of taking a limit of a probability as time approaches infinity.

On the other hand, people do use observed frequences to estimate probabilities - the key word is "estimate". They use the frequencies as probabilities when that's the only information they have about the probabilities.

Unfortunately there are many ways to misinterpret the concepts of probability theory. I think people who only take an introductory statistics course are usually doomed to a lifetime of misunderstanding probablity theory due to the terminology employed in statistics.

9. Aug 17, 2012

### NegativeDept

This question contains (at least) two huge and ambitious questions which have been provoking huge arguments for centuries!

"How should we interpret probabilities in the real world?" is probably the single best way to start an argument among probabilists and/or statisticians. If you're in a room with at least one Bayesian and one frequentist, there's a high probability they'll disagree.

"Why don't oxygen molecules spontaneously move to the other side of the room and suffocate me?" is another legendary old paradox. The simple answer is: "Nothing prevents that from happening, but it is a low-probability event." The detailed answer requires an interpretation of probability and entropy, which leads us back to the first argument.

10. Aug 17, 2012

### chiro

For the first question, the first thing that needs to be asked is whether the probability corresponds to some physical process or to something that is a mathematical expression or something that has been derived without reference to a real process.

This has to do with the whole Platonic ideas that are discussed especially in areas like physics were there is a debate whether mathematics corresponds to any kind of reality.

If the process corresponds to something physical then you can start to ask whether the probability refers to something that has been measured or deduced from things that have a physical equivalence.

If this is the case then the interpretation is very clear: the data and hence the distribution corresponds to a form of measurement that has a direct interpretation in the context of the process and this definition should be un-ambiguous.

If this is not based on a derivation of explicit data, then you are getting into the subjective nature of probability where a distribution may reflect a "belief" of something whether or not it is based on things that are correct or incorrect both partially or not.

When it comes to things like statistical and sampling distributions, these are mathematical ones that deal with results that do not correspond to any kind of explicit data: they are mathematical creations that have their own purpose and have nothing to do with the description of any process in any kind of physical context.

Now for my opinion on your first question: How do you interpret probabilities in the real world? Well the first thing you need to make distinct is whether the distribution corresponds to a physical, measurable, tangible process or whether it corresponds to a mathematical abstraction for mathematical and statistical purposes.

If it corresponds to a physical process then the distribution will correspond to a specific attribute of that process with regards to describing the chance of a particular realization of that process in the context of what that distribution represents actually occuring.

The above situation should always relate to the appropriate physical characteristics of the process in the right way with respect to some realizeable attribute that is non-ambiguous.

If it doesn't relate to a process that is tangible, measureable (and quantifiable), and physically describeable then we are talking about a completely different scenario. What this relates to is something completely different even if it has some of the partial qualities in the above case. Subjective priors, and other such distributions may have assumptions that are logical and grounded within observation and expert knowledge, but they are completely different from something is purely measureable, tangible, and unambiguously physically describeable.

When it comes to things that don't relate in any way to a real process (like a lot of the statistical distributions), then one needs to make the connection to the underlying physical process that is involved within the estimation if such a connection exists, or to the underlying context of the problem at hand.

With regards to your second question, the key thing again is note the three criteria: measureability (and appropriate quantification), tangibility and an unambiguous, crystal clear description of the process.

An important thing to also make note of is how such a condition is derived. Derivations come from assumptions, and one thing about science is that a lot of science is inductive.

Inductive reasoning is basically an attempt to extrapolate the properties of systems that are larger than the domain of observations being used and analyzed to make such an extrapolation.

The thing with extrapolation is that because you are going beyond the scope of your data, you have to be ready to admit that your extrapolation no matter how simple/beautiful/carefully reasoned/supported/etc it is, there is a chance that it is wrong and ultimately not realizeable in the form that corresponds to the realizations of the process that has been derived through inductive reasoning.

There is actually a simple example that explains the idea and it deals with a common problem which is the estimation of a common population parameter: the mean. Let's for the moment make the problem very basic: we have samples from a known distribution (normal) with a known variance but unknown mean.

Now if you want to estimate the distribution of the population mean, you take the sample mean as your point estimate and then your variance becomes the variance divided by the number of observations in your sample. We assume that we have a random sample (i.e. each observation is independent from every other).

Now the distribution of our population mean under these assumptions covers the entire real line. As long as we have a finite number of samples, it will always cover the entire real line.

Here now are some situations to think about:

Let's say the true mean is 0.

Situation 1: We have a billion samples that are completely skewed to the left where the sample mean of these samples is -1000.

Situation 2: We have another billion samples that are completely skewed to the right where the sample mean of these new samples is +2000.

Situation 3: We know take a google-plex of samples and then finally obtain a sample mean of this bunch to be pretty close to 0.

Now the thing about the above is that in this situation is that in the first two situations our normal inference using say 95% or even 99% intervals would have rejected our population mean to be 0 and they were wrong.

We made an assumption, and in this even highly unlikely case, we made an inference that was completely off the mark in the first two situations. The final situation was right and with respect to the distribution made sense.

This kind of example illustrates the issue with certainty in the context of both realizations/probability and of inference: Both of these are also under uncertainty.

With regards to inductive reasoning, we may have taken our situations 1 and 2 and extrapolated something that was completely skewed and wrong by a long shot and with inductive inference, this is ultimately the price for doing such. There's absolutely nothing at all wrong with this, but it's important to be aware of the implications that can happen when using such techniques as they are often used in science to formulate theories and hypotheses (and to an extent "laws").

Ultimately apart from this, the even more critical point is to define all of these circumstances unambiguously: in the Maxwell example, it would be better to make an attempt to describe it mathematically without ambiguity and then think of how the definition relates to the kind of example mentioned above.