I Getting the probabilities right

AI Thread Summary
The discussion revolves around understanding the accuracy of probability estimates based on experimental outcomes. It highlights that measuring a small number of trials, such as 10, does not provide sufficient evidence to confirm or refute the assumed probabilities, which can only be reliably assessed through an infinite number of measurements. Statistical tools like Pearson's Chi-squared test and Bayesian methods, such as the Beta distribution, can help analyze the fit of experimental data to theoretical probabilities. Even with larger sample sizes, the credible intervals for probability estimates remain broad, indicating uncertainty in exact values. Ultimately, the reliability of probability estimates is contingent on understanding the underlying processes and potential biases in data collection.
entropy1
Messages
1,232
Reaction score
72
TL;DR Summary
When are we sure we have the right probabilities?
If we have a jar with 3 blue balls and 7 white balls, we say that the probability of blindly getting a blue ball out of that jar is 30%. If we have a jar with 2 blue balls and 8 white balls, we say that the probability of blindly getting a blue ball out of it is 20%.

Now if we carry out 10 measurements out of a physical process that has 30% probability of yielding a blue outcome and 70% probability of yielding a white outcome, and we measure 2 blue outcomes and 8 white outcomes, can we then say that the probability of getting a blue outcome is 20% instead of 30%? In other words, when are we sure we have the right probabilities?
 
Physics news on Phys.org
entropy1 said:
In other words, when are we sure we have the right probabilities?
After you have done an infinite number of measurements.
 
The question is a basic question in statistical theory. Google "statistical theory" to get references.
 
entropy1 said:
[...]
In other words, when are we sure we have the right probabilities?
Dale said:
After you have done an infinite number of measurements.
Your probability estimates: blue 3/(3+7)= 0.3 and white 7/(3+7)=0.7 appear correct given sampling with replacement, meaning each independent event samples an identical number of balls, ten. Each trial has two possible outcomes since the balls are identical except for color: pick blue with probability Pb=0.3 or pick white Pw=0.7. Check sum: Pb+Pw=0.3+0.7=1.

Ten trials, an arbitrary number, neither affirms or contradicts the probability estimates.
 
There is plenty of experimental evidence that the theory of probability is correct. That includes hundreds of millions of gambling results, coin flips, and drawing colored balls out of a jar. The main question is whether the conditions of the theory are satisfied. There are many ways that bias can occur and some are very subtle.

Your example begs the question of whether and when the experimental results start to disprove the assumed probabilities. One way to answer that is by using Pearson's Chi-squared goodness of fit test. It allows you to measure how well an experimental sample fits the proposed distribution. (see also Fisher's exact test)
 
entropy1 said:
Summary: When are we sure we have the right probabilities?

If we have a jar with 3 blue balls and 7 white balls, we say that the probability of blindly getting a blue ball out of that jar is 30%. If we have a jar with 2 blue balls and 8 white balls, we say that the probability of blindly getting a blue ball out of it is 20%.

Now if we carry out 10 measurements out of a physical process that has 30% probability of yielding a blue outcome and 70% probability of yielding a white outcome, and we measure 2 blue outcomes and 8 white outcomes, can we then say that the probability of getting a blue outcome is 20% instead of 30%? In other words, when are we sure we have the right probabilities?
Short answer: you are treating things as though a 30% (or 70%) probability says results have to yield exactly those values. You are ignoring the natural variation in results that come along with this type of experiment.
 
FactChecker said:
Your example begs the question of whether and when the experimental results start to disprove the assumed probabilities. One way to answer that is by using Pearson's Chi-squared goodness of fit test. It allows you to measure how well an experimental sample fits the proposed distribution. (see also Fisher's exact test)
There is also the Bayesian approach. The conjugate prior for this type of scenario is the Beta distribution. Prior data can be incorporated as Beta(a,b) where a is the number of prior observed successes (white) and b is the number of prior observed failures (blue).
 
  • Like
Likes Klystron and FactChecker
So there must be a right way to establish the probability of an outcome just based on the outcomes already given right? However, we can't guarantee that the underlying physical process doesn't change if we don't have knowledge about it. So is that an issue that causes the uncertainty of establishing a probability just based on outcomes in principle? That we for instance don't have knowledge about the underlying process?
 
Last edited:
entropy1 said:
So there must be a right way to establish the probability of an outcome just based on the outcomes already given right? However, we can't guarantee that the underlying physical process doesn't change if we don't have knowledge about it. So is that an issue that causes the uncertainty of establishing a probability in principle just based on outcomes? That we for instance don't have knowledge about the underlying process?
That does, but the issue is more fundamental than that. Suppose that we know for certain in advance that the underlying physical process does not change, but we don't know the probability in advance. All we can do is to observe the process and measure the frequency of "success" or "failure".

Let's say that we do 10 trials and observe 7 successes and 3 failures. We would want to take that and say "the probability is 0.7", but is it? What if the probability were 0.6, then it is still easily possible to get 7 successes in 10 trials. Or, what if the probability were 0.7000000001, then the most likely outcome for 10 trials would be 7 successes.

If we actually work it out, we find that for 10 trials with 7 successes there is a very wide range of probabilities that are compatible with that data. In fact, the 95% credible interval ranges from about p=0.40 to about p=0.92. So while p=0.70 is quite likely, merely having 7 successes in 10 trials is not very strong evidence that p=0.70.

trials_10.gif


So let's collect 10 times the data, and suppose that after collecting 100 results we find exactly 70 successes. Surely we can be confident now! In fact, we are substantially more sure of our estimate at this point. But even with 100 trials, the 95% credible interval goes from about 0.61 to about 0.79, which is still pretty broad even after 100 trials.

trials_100.gif


We can continue to collect data, and we will continue to narrow that 95% credible range, but it will never go to 0 and we will never be certain of the exact probability for any finite amount of data.
 
  • Like
  • Informative
Likes entropy1 and Klystron
  • #10
If you don't know the underlying process, you may be able to collect enough data to get a good estimate of the averages, variation, probabilities, and trends. That is a complicated subject. But then you must be careful about the conclusions that you draw. The estimates you get from the data are for that exact situation. It is difficult to draw valid theoretical conclusions because you will not know what biases have led to your estimates.
One example of the problem is this:
Suppose you want to measure the velocity of radioactive particles being emitted from some material. Suppose you have a container of the material and put a measurement device at the top to measure the velocity of every particle coming out of the container. That will be a biased measurement because you are only measuring the particles with enough speed to make it out of the container. So your measurement is a correct measurement of the particles coming out of the container, but not a correct measurement of all particles being emitted by the material.
 
  • Like
Likes entropy1 and Dale
Back
Top