A Understand the Importance of Reference Priors for Signal Search

ChrisVer
Science Advisor
Messages
3,372
Reaction score
465
Hi, a very basic question: what is a good intuitive way to understand the importance of a reference prior? In the context of a signal search.
Bellow, I also try to give the way I understand the approach in a Bayesian analysis (roughly):
1. You have your likelihood model L = p(x_{obs} | \lambda , b+\mu s), with the expected events (b,s background,signal) and several priors \pi (for your parameter of interest \mu and other nuisance parameters \lambda such as the uncertainties).
2. The posterior pdf is what you need in order to study the parameter of interest \mu. By Bayes' Theorem, this is:
p ( \mu | x_{obs} ) = \frac{ L \pi(\lambda) \pi(\mu,\lambda)}{p(x_{obs})}
The denominator several times can be taken out as it's only making sure that the normalization is correct for a pdf. Also I consider \pi(\mu,\lambda) = \pi(\mu) \pi(\lambda), which tells that the two parameters are independent.
3. You run several experiments, and from the outcome of each experiment you "update" your knowledge on the parameter of interest. Aka in the end you build up the posterior probability (once you integrate out the extra dimensions from the NPs):
p( \mu | x_{obs} ) \propto \int d \lambda ~ p(\mu | x_{obs} , \lambda ) \pi(\lambda)

So far I think I understand everything, maybe with some misconceptions which could potentially be pointed out.. When though one starts speaking about reference priors I am somewhat lost. Based on a few searches, I think the main target of the reference prior is to minimize the differences between the posterior and the prior. However I don't quiet understand how is that important as:
"I can give in any prior which I like (with reasonable limitations), and it's up to the observation/experiment to tell me how it evolves with the extra information. By reasonable I mean that for example it can't be 0 at ranges where the posterior is non-zero (as Bayes' theorem would result in 0 for the posterior)."
How could different distributions of a prior end up in updating it to different distributions for the posterior?
 
Physics news on Phys.org
ChrisVer said:
Hi, a very basic question: what is a good intuitive way to understand the importance of a reference prior? In the context of a signal search.
Bellow, I also try to give the way I understand the approach in a Bayesian analysis (roughly):
1. You have your likelihood model L = p(x_{obs} | \lambda , b+\mu s), with the expected events (b,s background,signal) and several priors \pi (for your parameter of interest \mu and other nuisance parameters \lambda such as the uncertainties).
2. The posterior pdf is what you need in order to study the parameter of interest \mu. By Bayes' Theorem, this is:
p ( \mu | x_{obs} ) = \frac{ L \pi(\lambda) \pi(\mu,\lambda)}{p(x_{obs})}
The denominator several times can be taken out as it's only making sure that the normalization is correct for a pdf. Also I consider \pi(\mu,\lambda) = \pi(\mu) \pi(\lambda), which tells that the two parameters are independent.

Your number 2 is wrong by any standard I'm aware of. If you are using a probability density function, then it should be something like ##f_{\mu\vert X}(\cdot \vert x_{obs}) ##. Crucially this is a probability density, not a probability. (Look to the CDF for the probability.)
ChrisVer said:
...
How could different distributions of a prior end up in updating it to different distributions for the posterior?

There are a lot of different issues here. In some sense, this is the whole point of Bayesian Inference.

Have you tried working out some very simple finite cases? I would almost always start with finite, then consider countably infinite, and after all that maybe consider the continuous / uncountable case.

E.g. suppose you have a coin that is either 50:50 heads: tails or 70:30 heads tails or 90:10 heads tails. Now suppose you run 5 trials and the results are ___.
Now suppose instead you run 50000 trials and the results are ___. Try working this through with a uniform prior, vs something heavily skewed toward 50:50 case. You should be able to clearly see a big impact in the choice of prior for the former.

Loosely speaking: the prior has a big impact if you don't have much observations / data. If you have a lot of data / observations, then you can 'overwhelm your prior' if such data and the choice of prior has minimal impact on the posterior -- except if you zero things out as being impossible in the prior, then there is no opportunity to overwhelm. (There are a lot of subtleties in the continuous case, though. A lot of people equate zero probability with impossibility -- and that is in general wrong.)
 
Last edited:
  • Like
Likes WWGD
StoneTemplePython said:
Your number 2 is wrong by any standard I'm aware of.
How so? I mean it's Bayes' theorem and I called them pdfs not probabilities (?)
 
ChrisVer said:
How so? I mean it's Bayes' theorem and I called them pdfs not probabilities (?)

This may be a superficial issue. ##P## seems to alway refer to cumulative probability (read: from CDF) and ##p## for probability at a point (read from PMF or in a transition matrix, and such).

Put differently: by any standard I'm aware of, ##p## (and pr) is reserved for probabilities not densities. This may just be a convention, but I also see a lot people confuse densities with probabilities early on, so I'm fairly convinced that the convention is useful.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top