A Understand the Importance of Reference Priors for Signal Search

ChrisVer · Nov 8, 2017

Hi, a very basic question: what is a good intuitive way to understand the importance of a reference prior? In the context of a signal search.
Bellow, I also try to give the way I understand the approach in a Bayesian analysis (roughly):
1. You have your likelihood model L = p(x_{obs} | \lambda , b+\mu s), with the expected events (b,s background,signal) and several priors \pi (for your parameter of interest \mu and other nuisance parameters \lambda such as the uncertainties).
2. The posterior pdf is what you need in order to study the parameter of interest \mu. By Bayes' Theorem, this is:
p ( \mu | x_{obs} ) = \frac{ L \pi(\lambda) \pi(\mu,\lambda)}{p(x_{obs})}
The denominator several times can be taken out as it's only making sure that the normalization is correct for a pdf. Also I consider \pi(\mu,\lambda) = \pi(\mu) \pi(\lambda), which tells that the two parameters are independent.
3. You run several experiments, and from the outcome of each experiment you "update" your knowledge on the parameter of interest. Aka in the end you build up the posterior probability (once you integrate out the extra dimensions from the NPs):
p( \mu | x_{obs} ) \propto \int d \lambda ~ p(\mu | x_{obs} , \lambda ) \pi(\lambda)

So far I think I understand everything, maybe with some misconceptions which could potentially be pointed out.. When though one starts speaking about reference priors I am somewhat lost. Based on a few searches, I think the main target of the reference prior is to minimize the differences between the posterior and the prior. However I don't quiet understand how is that important as:
"I can give in any prior which I like (with reasonable limitations), and it's up to the observation/experiment to tell me how it evolves with the extra information. By reasonable I mean that for example it can't be 0 at ranges where the posterior is non-zero (as Bayes' theorem would result in 0 for the posterior)."
How could different distributions of a prior end up in updating it to different distributions for the posterior?

StoneTemplePython · Nov 8, 2017

ChrisVer said:

Hi, a very basic question: what is a good intuitive way to understand the importance of a reference prior? In the context of a signal search.
Bellow, I also try to give the way I understand the approach in a Bayesian analysis (roughly):
1. You have your likelihood model L = p(x_{obs} | \lambda , b+\mu s), with the expected events (b,s background,signal) and several priors \pi (for your parameter of interest \mu and other nuisance parameters \lambda such as the uncertainties).
2. The posterior pdf is what you need in order to study the parameter of interest \mu. By Bayes' Theorem, this is:
p ( \mu | x_{obs} ) = \frac{ L \pi(\lambda) \pi(\mu,\lambda)}{p(x_{obs})}
The denominator several times can be taken out as it's only making sure that the normalization is correct for a pdf. Also I consider \pi(\mu,\lambda) = \pi(\mu) \pi(\lambda), which tells that the two parameters are independent.

Your number 2 is wrong by any standard I'm aware of. If you are using a probability density function, then it should be something like ##f_{\mu\vert X}(\cdot \vert x_{obs}) ##. Crucially this is a probability density, not a probability. (Look to the CDF for the probability.)

ChrisVer said:

...
How could different distributions of a prior end up in updating it to different distributions for the posterior?

There are a lot of different issues here. In some sense, this is the whole point of Bayesian Inference.

Have you tried working out some very simple finite cases? I would almost always start with finite, then consider countably infinite, and after all that maybe consider the continuous / uncountable case.

E.g. suppose you have a coin that is either 50:50 heads: tails or 70:30 heads tails or 90:10 heads tails. Now suppose you run 5 trials and the results are ___.
Now suppose instead you run 50000 trials and the results are ___. Try working this through with a uniform prior, vs something heavily skewed toward 50:50 case. You should be able to clearly see a big impact in the choice of prior for the former.

Loosely speaking: the prior has a big impact if you don't have much observations / data. If you have a lot of data / observations, then you can 'overwhelm your prior' if such data and the choice of prior has minimal impact on the posterior -- except if you zero things out as being impossible in the prior, then there is no opportunity to overwhelm. (There are a lot of subtleties in the continuous case, though. A lot of people equate zero probability with impossibility -- and that is in general wrong.)

ChrisVer · Nov 10, 2017

StoneTemplePython said:

Your number 2 is wrong by any standard I'm aware of.

How so? I mean it's Bayes' theorem and I called them pdfs not probabilities (?)

StoneTemplePython · Nov 10, 2017

ChrisVer said:

How so? I mean it's Bayes' theorem and I called them pdfs not probabilities (?)

This may be a superficial issue. ##P## seems to alway refer to cumulative probability (read: from CDF) and ##p## for probability at a point (read from PMF or in a transition matrix, and such).

Put differently: by any standard I'm aware of, ##p## (and pr) is reserved for probabilities not densities. This may just be a convention, but I also see a lot people confuse densities with probabilities early on, so I'm fairly convinced that the convention is useful.

A Understand the Importance of Reference Priors for Signal Search

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Roulette wheel physics and probability'

Thread 'Detail of Diagonalization Lemma'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective