The probability of a probability?

haruspex · Aug 10, 2016

alan_longor said:

ok , then in this case we cannot set a probability for the appearance of the light on the next night .

Sure, solipsism is a sound philosophy from a logical perspective, but it doesn't get you very far. In the real world, we function quite effectively by subconsciously assigning reasonable a priori probabilities.
Your rising sun model is insufficiently divorced from real experience to think about in a detached manner. How about, you notice that on three consecutive occasions the winning lottery number ends in a five. What is the probability that there is a flaw in the randomisation? What probability would you have assigned to that beforehand... one in 100? Too high. One in 10,000? Let's say one in a million. How many consecutive occasions of a final digit 5 will push that estimate to greater than half?

alan_longor · Aug 11, 2016

haruspex said:

Sure, solipsism is a sound philosophy from a logical perspective, but it doesn't get you very far. In the real world, we function quite effectively by subconsciously assigning reasonable a priori probabilities.
Your rising sun model is insufficiently divorced from real experience to think about in a detached manner. How about, you notice that on three consecutive occasions the winning lottery number ends in a five. What is the probability that there is a flaw in the randomisation? What probability would you have assigned to that beforehand... one in 100? Too high. One in 10,000? Let's say one in a million. How many consecutive occasions of a final digit 5 will push that estimate to greater than half?

i am sorry for being unable to understand that exact case , and please allow me to ask you a question even though i was unable to answer yours . so what if someone says he has a real number generator that would generate a random number between 1 and 100 . and that there is no pattern and there is no way to predict what that machine gives . is it correct to assume that the probability for the number given by the machine to be between 90 and 100 is 1/10 ? since all the numbers seem to have an equal probability here . thank you very much .

haruspex · Aug 11, 2016

alan_longor said:

i am sorry for being unable to understand that exact case , and please allow me to ask you a question even though i was unable to answer yours . so what if someone says he has a real number generator that would generate a random number between 1 and 100 . and that there is no pattern and there is no way to predict what that machine gives . is it correct to assume that the probability for the number given by the machine to be between 90 and 100 is 1/10 ? since all the numbers seem to have an equal probability here . thank you very much .

If you completely trust that information, yes. But complete trust also constitutes a prior distribution.

alan_longor · Aug 11, 2016

haruspex said:

If you completely trust that information, yes. But complete trust also constitutes a prior distribution.

But complete trust also constitutes a prior distribution.
that's the phrase i have been looking for ... thank you .

FactChecker · Aug 11, 2016

If the word 'assign' implies that it is just an initial value that will be corrected by some (Bayesian) method, than of course you can assign any valid initial probability guess.
If the word 'assign' implies that you can defend that value as being the correct probability parameter, then you can not do that. Here is why:

Whether Bayesian methods are to be used or not, the answer to 1) is that no value can be assigned to p'. The most that Bayesian methods can do is to assign an initial standard distribution ( uniform ) with no justification. There is no real assumption that the initial standard distribution is correct. In fact, Bayesian methods assume that the initial distribution is incorrect and it tells you how to change it as real information is obtained.

You should say that the answer to 1) is that no value can be assigned to p'. If you attempt to assign any value, a, to p', you can easily give an example where that value is wrong.

haruspex · Aug 11, 2016

FactChecker said:

The most that Bayesian methods can do is to assign an initial standard distribution ( uniform ) with no justification.

Maybe it's a matter of philosophy, but I take a different view.
Bayesian methods require you to supply a prior distribution, but that does not provide an excuse to set it to uniform without justification. In normal practice, it will be some gut feel based on experience. The fundamental point is that the choice should not be that crucial provided it is reasonable.

Consider the classic 'fair coin' problem. What would be a sensible a priori estimate of whether a coin has a 0.01% or more bias towards heads? 0.9 would probably be too much; one in a billion almost surely too low. With those bounds, we can run a trial for long enough to bring our posterior estimate within some preset range.

Question 1 is problematic because we are given no information regarding the nature of the event. The only experience we can call on is of probabilities of events in general, a basis so nebulous that to assign any shape to the prior distribution is highly questionable. Still, I would argue that we come come across a lot of probabilities close to 1, so the range "> .9" should have a prior probability of at least, say, 1 in 1000. This is why question 2 can be answered.

FactChecker · Aug 11, 2016

haruspex said:

Maybe it's a matter of philosophy, but I take a different view.
Bayesian methods require you to supply a prior distribution, but that does not provide an excuse to set it to uniform without justification. In normal practice, it will be some gut feel based on experience. The fundamental point is that the choice should not be that crucial provided it is reasonable.

That is a good point. I was influenced by this example that doesn't give any glue for the initial distribution.
That brings up another question (I don't want to hijack this thread, though):
If you use an initial distribution that is not uniform, that might make it harder to correct. I have no experience with this, but the question has come up before at work where it was being applied iteratively.

haruspex · Aug 11, 2016

FactChecker said:

If you use an initial distribution that is not uniform, that might make it harder to correct

That assumes the uniform distribution is closer to the true distribution than is the chosen prior. As I posted, a good approach is to consider some range of priors that you feel encompass the answer and run the trials until sufficiently confident.
In the case of fairness of a coin, one that at least looks fair, you would be well justified in taking a prior distribution that sets the probability that the frequency of heads is between 0.4 and 0.6 as being at least 0.8, say.

FactChecker · Aug 11, 2016

haruspex said:

That assumes the uniform distribution is closer to the true distribution than is the chosen prior.

"closer to" is tricky to define. In an iterative process, if stage 1 resulted in a distribution with a small standard deviation, it might be difficult to move the mean in stage 2. So it might be better to increase the standard deviation for the stage 2 prior distribution -- perhaps even to a uniform distribution. But we never investigated it while I was there, so I don't know.

Stephen Tashi · Aug 12, 2016

haruspex said:

Maybe it's a matter of philosophy, but I take a different view.
Bayesian methods require you to supply a prior distribution, but that does not provide an excuse to set it to uniform without justification. In normal practice, it will be some gut feel based on experience. The fundamental point is that the choice should not be that crucial provided it is reasonable.

There is a way to use Bayesian methods in the same spirit that we use frequentist methods.

Frequentist methods don't answer the question "What is the probability of the result that I'm interested in?". Instead they answer questions like "If I assume the result I'm interested in (or its negation) what is the probability of the data ?". From the answer to that question, people get a subjective feeling about the probability of the result that interests them, but not an actual number for it.

In a similar manner, a person can assume a uniform (or "maximum entropy") prior and compute the posterior distribution just to get a subjective feeling about how strongly the data suggests a certain range of values for the result of interest. This is different that "taking the prior seriously".

Ray Vickson · Aug 12, 2016

FactChecker said:

That is a good point. I was influenced by this example that doesn't give any glue for the initial distribution.
That brings up another question (I don't want to hijack this thread, though):
If you use an initial distribution that is not uniform, that might make it harder to correct. I have no experience with this, but the question has come up before at work where it was being applied iteratively.

There are "natural" types of priors for various type of probability models. In this case the model is that of independent, repeated trials with some unknown success probability probability ##p##---Binomial(n,p) in other words. The natural prior that people tend to use in such a case is the Beta distribution, with density function
$$f_0(p) = \frac{1}{B(a,b)} p^{a-1} (1-p)^{b-1}, \; 0 < p < 1 $$
Here, ##a,b >0## are parameters and ##B(a,b)## is the so-called Beta function; see, eg.,
https://en.wikipedia.org/wiki/Beta_distribution for more formulas and details.
The uniform prior is a special case in which ##a = b = 1##.

Typically one might start by assigning (or estimating, or ...) something like the most probable value of ##p## or the prior mean of ##p##, perhaps also with some typical or probable range estimates. That gives one the purely mathematical problem of determining the parameters ##a,b##. Once one has ##a## and ##b## the Bayesian updating is easy; after observing ##k## successes in ##n## trials the posterior density of ##p## is Beta with parameters ##a+k## and ##b + n-k## in place of ##a## and ##b##. This updating simplicity is the reason the Beta is used as a prior for the Binomial case.

Nothing prevents you from using a different type of prior, but then the updating scheme becomes more difficult and less intuitive. By tuning the parameters, the Beta is capable of representing most types of general prior information, but of course there will always be exceptions.

The probability of a probability?

Similar threads

Hot Threads

Recent Insights