NegativeDept said:
This question contains (at least) two huge and ambitious questions which have been provoking huge arguments for centuries!
"How should we interpret probabilities in the real world?" is probably the single best way to start an argument among probabilists and/or statisticians. If you're in a room with at least one
Bayesian and one
frequentist, there's a high probability they'll disagree.
"Why don't oxygen molecules spontaneously move to the other side of the room and suffocate me?" is another legendary old paradox. The simple answer is: "Nothing prevents that from happening, but it is a low-probability event." The detailed answer requires an interpretation of probability and entropy, which leads us back to the first argument.
For the first question, the first thing that needs to be asked is whether the probability corresponds to some physical process or to something that is a mathematical expression or something that has been derived without reference to a real process.
This has to do with the whole Platonic ideas that are discussed especially in areas like physics were there is a debate whether mathematics corresponds to any kind of reality.
If the process corresponds to something physical then you can start to ask whether the probability refers to something that has been measured or deduced from things that have a physical equivalence.
If this is the case then the interpretation is very clear: the data and hence the distribution corresponds to a form of measurement that has a direct interpretation in the context of the process and this definition should be un-ambiguous.
If this is not based on a derivation of explicit data, then you are getting into the subjective nature of probability where a distribution may reflect a "belief" of something whether or not it is based on things that are correct or incorrect both partially or not.
When it comes to things like statistical and sampling distributions, these are mathematical ones that deal with results that do not correspond to any kind of explicit data: they are mathematical creations that have their own purpose and have nothing to do with the description of any process in any kind of physical context.
Now for my opinion on your first question: How do you interpret probabilities in the real world? Well the first thing you need to make distinct is whether the distribution corresponds to a physical, measurable, tangible process or whether it corresponds to a mathematical abstraction for mathematical and statistical purposes.
If it corresponds to a physical process then the distribution will correspond to a specific attribute of that process with regards to describing the chance of a particular realization of that process in the context of what that distribution represents actually occurring.
The above situation should always relate to the appropriate physical characteristics of the process in the right way with respect to some realizeable attribute that is non-ambiguous.
If it doesn't relate to a process that is tangible, measureable (and quantifiable), and physically describeable then we are talking about a completely different scenario. What this relates to is something completely different even if it has some of the partial qualities in the above case. Subjective priors, and other such distributions may have assumptions that are logical and grounded within observation and expert knowledge, but they are completely different from something is purely measureable, tangible, and unambiguously physically describeable.
When it comes to things that don't relate in any way to a real process (like a lot of the statistical distributions), then one needs to make the connection to the underlying physical process that is involved within the estimation if such a connection exists, or to the underlying context of the problem at hand.
With regards to your second question, the key thing again is note the three criteria: measureability (and appropriate quantification), tangibility and an unambiguous, crystal clear description of the process.
An important thing to also make note of is how such a condition is derived. Derivations come from assumptions, and one thing about science is that a lot of science is inductive.
Inductive reasoning is basically an attempt to extrapolate the properties of systems that are larger than the domain of observations being used and analyzed to make such an extrapolation.
The thing with extrapolation is that because you are going beyond the scope of your data, you have to be ready to admit that your extrapolation no matter how simple/beautiful/carefully reasoned/supported/etc it is, there is a chance that it is wrong and ultimately not realizeable in the form that corresponds to the realizations of the process that has been derived through inductive reasoning.
There is actually a simple example that explains the idea and it deals with a common problem which is the estimation of a common population parameter: the mean. Let's for the moment make the problem very basic: we have samples from a known distribution (normal) with a known variance but unknown mean.
Now if you want to estimate the distribution of the population mean, you take the sample mean as your point estimate and then your variance becomes the variance divided by the number of observations in your sample. We assume that we have a random sample (i.e. each observation is independent from every other).
Now the distribution of our population mean under these assumptions covers the entire real line. As long as we have a finite number of samples, it will always cover the entire real line.
Here now are some situations to think about:
Let's say the true mean is 0.
Situation 1: We have a billion samples that are completely skewed to the left where the sample mean of these samples is -1000.
Situation 2: We have another billion samples that are completely skewed to the right where the sample mean of these new samples is +2000.
Situation 3: We know take a google-plex of samples and then finally obtain a sample mean of this bunch to be pretty close to 0.
Now the thing about the above is that in this situation is that in the first two situations our normal inference using say 95% or even 99% intervals would have rejected our population mean to be 0 and they were wrong.
We made an assumption, and in this even highly unlikely case, we made an inference that was completely off the mark in the first two situations. The final situation was right and with respect to the distribution made sense.
This kind of example illustrates the issue with certainty in the context of both realizations/probability and of inference: Both of these are also under uncertainty.
With regards to inductive reasoning, we may have taken our situations 1 and 2 and extrapolated something that was completely skewed and wrong by a long shot and with inductive inference, this is ultimately the price for doing such. There's absolutely nothing at all wrong with this, but it's important to be aware of the implications that can happen when using such techniques as they are often used in science to formulate theories and hypotheses (and to an extent "laws").
Ultimately apart from this, the even more critical point is to define all of these circumstances unambiguously: in the Maxwell example, it would be better to make an attempt to describe it mathematically without ambiguity and then think of how the definition relates to the kind of example mentioned above.