# Getting the probabilities right

• I
• entropy1
In summary, the probability of getting a blue ball out of a jar with 3 blue balls and 7 white balls is 30%. The probability of getting a blue ball out of a jar with 2 blue balls and 8 white balls is 20%.
entropy1
TL;DR Summary
When are we sure we have the right probabilities?
If we have a jar with 3 blue balls and 7 white balls, we say that the probability of blindly getting a blue ball out of that jar is 30%. If we have a jar with 2 blue balls and 8 white balls, we say that the probability of blindly getting a blue ball out of it is 20%.

Now if we carry out 10 measurements out of a physical process that has 30% probability of yielding a blue outcome and 70% probability of yielding a white outcome, and we measure 2 blue outcomes and 8 white outcomes, can we then say that the probability of getting a blue outcome is 20% instead of 30%? In other words, when are we sure we have the right probabilities?

entropy1 said:
In other words, when are we sure we have the right probabilities?
After you have done an infinite number of measurements.

The question is a basic question in statistical theory. Google "statistical theory" to get references.

entropy1 said:
[...]
In other words, when are we sure we have the right probabilities?
Dale said:
After you have done an infinite number of measurements.
Your probability estimates: blue 3/(3+7)= 0.3 and white 7/(3+7)=0.7 appear correct given sampling with replacement, meaning each independent event samples an identical number of balls, ten. Each trial has two possible outcomes since the balls are identical except for color: pick blue with probability Pb=0.3 or pick white Pw=0.7. Check sum: Pb+Pw=0.3+0.7=1.

Ten trials, an arbitrary number, neither affirms or contradicts the probability estimates.

There is plenty of experimental evidence that the theory of probability is correct. That includes hundreds of millions of gambling results, coin flips, and drawing colored balls out of a jar. The main question is whether the conditions of the theory are satisfied. There are many ways that bias can occur and some are very subtle.

Your example begs the question of whether and when the experimental results start to disprove the assumed probabilities. One way to answer that is by using Pearson's Chi-squared goodness of fit test. It allows you to measure how well an experimental sample fits the proposed distribution. (see also Fisher's exact test)

entropy1 said:
Summary: When are we sure we have the right probabilities?

If we have a jar with 3 blue balls and 7 white balls, we say that the probability of blindly getting a blue ball out of that jar is 30%. If we have a jar with 2 blue balls and 8 white balls, we say that the probability of blindly getting a blue ball out of it is 20%.

Now if we carry out 10 measurements out of a physical process that has 30% probability of yielding a blue outcome and 70% probability of yielding a white outcome, and we measure 2 blue outcomes and 8 white outcomes, can we then say that the probability of getting a blue outcome is 20% instead of 30%? In other words, when are we sure we have the right probabilities?
Short answer: you are treating things as though a 30% (or 70%) probability says results have to yield exactly those values. You are ignoring the natural variation in results that come along with this type of experiment.

FactChecker said:
Your example begs the question of whether and when the experimental results start to disprove the assumed probabilities. One way to answer that is by using Pearson's Chi-squared goodness of fit test. It allows you to measure how well an experimental sample fits the proposed distribution. (see also Fisher's exact test)
There is also the Bayesian approach. The conjugate prior for this type of scenario is the Beta distribution. Prior data can be incorporated as Beta(a,b) where a is the number of prior observed successes (white) and b is the number of prior observed failures (blue).

Klystron and FactChecker
So there must be a right way to establish the probability of an outcome just based on the outcomes already given right? However, we can't guarantee that the underlying physical process doesn't change if we don't have knowledge about it. So is that an issue that causes the uncertainty of establishing a probability just based on outcomes in principle? That we for instance don't have knowledge about the underlying process?

Last edited:
entropy1 said:
So there must be a right way to establish the probability of an outcome just based on the outcomes already given right? However, we can't guarantee that the underlying physical process doesn't change if we don't have knowledge about it. So is that an issue that causes the uncertainty of establishing a probability in principle just based on outcomes? That we for instance don't have knowledge about the underlying process?
That does, but the issue is more fundamental than that. Suppose that we know for certain in advance that the underlying physical process does not change, but we don't know the probability in advance. All we can do is to observe the process and measure the frequency of "success" or "failure".

Let's say that we do 10 trials and observe 7 successes and 3 failures. We would want to take that and say "the probability is 0.7", but is it? What if the probability were 0.6, then it is still easily possible to get 7 successes in 10 trials. Or, what if the probability were 0.7000000001, then the most likely outcome for 10 trials would be 7 successes.

If we actually work it out, we find that for 10 trials with 7 successes there is a very wide range of probabilities that are compatible with that data. In fact, the 95% credible interval ranges from about p=0.40 to about p=0.92. So while p=0.70 is quite likely, merely having 7 successes in 10 trials is not very strong evidence that p=0.70.

So let's collect 10 times the data, and suppose that after collecting 100 results we find exactly 70 successes. Surely we can be confident now! In fact, we are substantially more sure of our estimate at this point. But even with 100 trials, the 95% credible interval goes from about 0.61 to about 0.79, which is still pretty broad even after 100 trials.

We can continue to collect data, and we will continue to narrow that 95% credible range, but it will never go to 0 and we will never be certain of the exact probability for any finite amount of data.

entropy1 and Klystron
If you don't know the underlying process, you may be able to collect enough data to get a good estimate of the averages, variation, probabilities, and trends. That is a complicated subject. But then you must be careful about the conclusions that you draw. The estimates you get from the data are for that exact situation. It is difficult to draw valid theoretical conclusions because you will not know what biases have led to your estimates.
One example of the problem is this:
Suppose you want to measure the velocity of radioactive particles being emitted from some material. Suppose you have a container of the material and put a measurement device at the top to measure the velocity of every particle coming out of the container. That will be a biased measurement because you are only measuring the particles with enough speed to make it out of the container. So your measurement is a correct measurement of the particles coming out of the container, but not a correct measurement of all particles being emitted by the material.

entropy1 and Dale

## 1. What is the importance of getting probabilities right in scientific research?

Getting probabilities right is crucial in scientific research because it ensures the accuracy and validity of the results. Inaccurate or incorrect probabilities can lead to flawed conclusions and potentially misleading findings.

## 2. How do scientists calculate probabilities?

Scientists use various mathematical and statistical methods to calculate probabilities. This can include techniques such as regression analysis, Bayesian inference, and hypothesis testing.

## 3. What factors can affect the accuracy of probabilities in scientific research?

There are several factors that can impact the accuracy of probabilities in scientific research, such as sample size, data quality, and the chosen statistical method. It is important for scientists to carefully consider these factors when calculating and interpreting probabilities.

## 4. Can probabilities be used to predict future events?

Yes, probabilities can be used to make predictions about future events. However, it is important to note that probabilities can never guarantee an exact outcome and should be interpreted with caution.

## 5. How can scientists ensure the reliability of their probability calculations?

To ensure the reliability of their probability calculations, scientists can use multiple methods to cross-check their results, conduct sensitivity analyses, and carefully evaluate the assumptions and limitations of their chosen statistical approach.

• Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
• General Math
Replies
7
Views
775
• Set Theory, Logic, Probability, Statistics
Replies
10
Views
3K
• Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
• Precalculus Mathematics Homework Help
Replies
10
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K