Correct Form of Likelihood Function for Data w/ Upper/Lower Bound

kurros · Jul 4, 2010

So, I have this problem I am tackling where I am doing a Bayesian scan of a multi-dimensional model. Most of the quantities predicted by the model have likelihood functions which are normal distributions (as functions of the possible data values), however there are some pieces of experimental data I have for which only upper or lower bounds exist, for example experiments have been done which show the observable must be above a certain value with 95% confidence limit or something, with no upper bound.

What is the correct form of the likelihood function to use for such a quantity? In the literature I have seen two possibilities, one being a compound function which is a normal distribution below the limit (if it is a lower bound) and a uniform distribution above this limit, while the the other is an error function centred on the bound. I have seen no theoretical justification for either of these distributions and was wondering if anybody knew of any.

The half-gaussian at first seems sensible because the maximum likelihood is obtained at the value of the limit and for anything above that, with the gaussian falling off as determined by the confidence % value, however on thinking about it some more I now think that the err function is probably more justified, although I can't prove why or even formulate why I think this very well. I guess it seems to me that even if the theory gives a perfect match with whatever value was observed in the experiment then it shouldn't actually be the maximum likelihood value, since it would be quite the fluke that this should happen.

Perhaps I should explain by scenario some more to make this make sense. One particular observable I am concerned with is the relic density of dark matter. If dark matter is only made of one type of particle, then the relic density as calculated by the astrophysics guys can be used to constrain the relic density of a given dark matter candidate particle for ones favourite model with a gaussian likelihood function. However, if dark matter is assumed to be composed of this candidate particle plus other stuff, then the relic density calculated by the astrophysicists can only provide an upper bound on the relic density of the candidate particle, since one can't disfavour the model if it fails to reach the required relic density (because we have allowed for the possibility that other stuff is lurking out there than is taken care of by the model). So in this latter case what likelihood function is appropriate for the relic density of the partial dark matter candidate? We need to kill the model if the relic density gets too large, but we don't want to penalise it if the relic density is lower than the observed astrophysical value.

Sorry if this is a bit incoherent, I could probably have made that all more clear. If you need me to clarify anything or indeed write an equation down let me know. Mostly I am looking for some nice probability theory justification for which kind of likelihood function makes the most sense here. I have searched around quite a lot and been unable to find anything. Papers I have read where similar things are done seem to skip over the justification part... and I suspect that many authors actually just guess...
Maybe both forms of likelihood function are valid for different cases. I'm a little lost on this one.
In practice it doesn't make a huge different which of the two possibilities I mentioned are used, but I'd like to know which one is really right!

SW VandeCarr · Jul 4, 2010

kurros said:

however there are some pieces of experimental data I have for which only upper or lower bounds exist, for example experiments have been done which show the observable must be above a certain value with 95% confidence limit or something, with no upper bound.

Are you saying that you have experiments which find no observables below (or above) a certain value? For example (not your particular experiment) you find no evidence of some particle at the required density and conclude that the required density must be higher (or lower) than the experimental range. What data would serve as a basis (prior) for your likelihood function?

EDIT: I'm not a physicist but speaking in general terms, you seem to be saying there's a model which motivated some experiments to observe something within some range of values. However nothing was found, so you want to know if you can still use these results to estimate the probability of a useful observation above the upper limit of the existing experimental range. Is this more or less correct?

If so, you need some basis for writing a likelihood function. The prior would be the probability of a vector of observations [tex]P(X)[/tex]. [tex]P(X|\theta)[/tex] is the posterior probability and the likelihood function that would be needed for the calculation of a posterior probability is [tex] L(\theta|X)[/tex] where [tex]\theta[/tex] is a probability distribution parameter.

kurros · Jul 4, 2010

SW VandeCarr said:

Are you saying that you have experiments which find no observables below (or above) a certain value? For example (not your particular experiment) you find no evidence of some particle at the required density and conclude that the required density must be higher (or lower) than the experimental range. What data would serve as a basis (prior) for your likelihood function?

EDIT: I'm not a physicist but speaking in general terms, you seem to be saying there's a model which motivated some experiments to observe something within some range of values. However nothing was found, so you want to know if you can still use these results to estimate the probability of a useful observation above the upper limit of the existing experimental range. Is this more or less correct?

Yes this is more or less correct. Actually there are two types of observations, possibly requiring separate treatment, which I am concerned with. The first can be identified with your description I believe:
Say some new particle is to be created in an accelerator, and we should see some direct evidence of it if its mass is small enough that the accelerator can create this particle, but then we see nothing. We then want to use the absence of an observation to create a likelihood function describing the probability of seeing this evidence at some future, more powerful collider (or not seeing it at less powerful colliders). Generally people would find the 95% confidence limit on the lower bound for the mass (something reported by the experimentalists) and then just use this as a hard cut (i.e. step function) for future analyses (i.e. to rule out parts of the model parameter space), but I feel that there should be some well motivated err function looking likeihood function which incorporates the uncertainty in the bound and doesn't unduly kill off parts of the parameter space near the boundary.
The second possibility is the one I described in my first post, but I will attempt to clarify it. There is strong astrophysical evidence from a number of sources that a certain extra energy density exists in the vicinity of the Earth, and we assume this to arise from some density of dark matter particles flying around. So we have this so-called relic density number measured, with a mean value known with some gaussian uncertainty. If we have a model which provides a candidate particle for this dark matter, and we require it to account for 100% of the dark matter, then we can create a gaussian likelihood function

[tex] P(\mu_d | \theta ,M) = \frac{1}{\sqrt{2\pi \sigma^2}}e^\frac{-(\mu (\theta) - \mu_d)^2}{2\sigma^2} [/tex]

where [tex]\theta[/tex] is a parameter of the model [tex]M[/tex]we are assuming, [tex]\mu_d[/tex] is the mean experimental value of the "extra energy" density, [tex]\mu(\theta)[/tex] is the mean "extra energy" density predicted by the model [tex]M[/tex] at the point [tex]\theta[/tex] in the model parameter space and [tex]\sigma[/tex] is the standard deviation of the experimental "energy density", although we could include theoretical uncertainties in here as well.
Here we are assuming this is the only observation we are using to constrain the model parameters and that the errors are gaussian (or the distribution of measurements made during the various experiments used to find the reported mean value is gaussian).

However, if we do not require our model to account for 100% of the observed extra energy density then we cannot penalise model points which fail to produce enough extra density. Thus the above gaussian likelihood function cannot be used. So what does it make sense to use instead?

SW VandeCarr said:

If so, you need some basis for writing a likelihood function. The prior would be a vector of observations [tex]X[/tex]. [tex]P(X|\theta)[/tex] is the posterior probability and the likelihood function that would be needed for the calculation of a posterior probability is [tex] L(\theta|X)[/tex] where [tex]\theta[/tex] is a probability distribution.

I'm not quite sure I know what you mean. Do you mean we need to know something about the distribution of measurements, or rather the uncertainties in the bounds we get from the experiments in which we see nothing? Also I'm a little confused by your calling the prior a vector of observations. I understand it to be a joint probability distribution over a set of outcomes [tex]P(X|I)[/tex] based on some initial information [tex]I[/tex]. In my case it is a joint probability distribution over the set of parameters of the model [tex]P(\theta |I)[/tex], but we shouldn't focus on that stuff, I feel this issue of how to construct a sensible likelihood function is quite separate. In fact I don't think we care about the Bayesian-ness of what I'm doing at all, frequentist analyses of the model [tex]M[/tex] need the same likelihood function.

SW VandeCarr · Jul 4, 2010

kurros said:

some new particle is to be created in an accelerator, and we should see some direct evidence of it if its mass is small enough that the accelerator can create this particle, but then we see nothing. We then want to use the absence of an observation to create a likelihood function describing the probability of seeing this evidence at some future, more powerful collider (or not seeing it at less powerful colliders).

Well this is where you lose me. A likelihood function takes the data as a given and calculates a likelihood. Now it's true dummy data can be simulated based on a theoretical model and this is done in a lot of research but I don't know if this is something you want to do.

However, if we do not require our model to account for 100% of the observed extra energy density then we cannot penalise model points which fail to produce enough extra density. Thus the above gaussian likelihood function cannot be used. So what does it make sense to use instead?

The function you referenced is the Gaussian probability density function (pdf), not the likelihood function. Not being a physicist, I can't say if a simulation makes sense here, but in general a simulation based on model parameters may allow you to calculate some uncertainty estimates of the experimentally determined bounds.

Also I'm a little confused by your calling the prior a vector of observations. I understand it to be a joint probability distribution over a set of outcomes [tex]P(X|I)[/tex] based on some initial information [tex]I[/tex].

I should have said P(X) rather than X. P(X) is the unconditional prior. P(X|I) is a conditional probability which could serve as a prior for further analysis, but it has the form of a posterior probability. When you calculate the likelihood; you multiply the probability of each observation (assigned by the pdf) to obtain the likelihood. The likelihood function enters into a Bayesian analysis as a term in Bayes Theorem.

In my case it is a joint probability distribution over the set of parameters of the model [tex]P(\theta |I)[/tex], but we shouldn't focus on that stuff, I feel this issue of how to construct a sensible likelihood function is quite separate. In fact I don't think we care about the Bayesian-ness of what I'm doing at all, frequentist analyses of the model [tex]M[/tex] need the same likelihood function.

Without data, I can only suggest a simulation. Perhaps someone else can be of more help. There's at least one person who posts in this forum who has a physics background. I don't think the rules allow me to give his username, but it begins with Fr.

BTW the conditional probability you referenced is a likelihood function corresponding to [tex] f(P(I|\theta))[/tex] and is written [tex]L(\theta|I)[/tex] where [tex]\theta[/tex] is taken as a probability distribution parameter.

kurros · Jul 4, 2010

SW VandeCarr said:

Well this is where you lose me. A likelihood function takes the data as a given and calculates a likelihood value conditioned on the data. Now it's true dummy data can be simulated based on a theoretical model and this is done in a lot of research but I don't know if this is something you want to do.

Well that's not really true, or at least I think it is a little misleading to use the word conditioned. A likelihood function is simply a conditional probability statement taken as a function of the thing on which it is conditional, which in my case was the model parameter vector [tex]\theta[/tex], certainly not the data. I suppose the way I wrote it didn't make this dependence explicit, sorry. The notation for these things isn't very consistent I find so I try and stick to writing everything as a probability.

SW VandeCarr said:

The function you referenced is the Gaussian probability distribution function (pdf), not the likelihood function. Not being a physicist, I can't say if a simulation makes sense here, but in general a simulation based on model parameters may allow to calculate the uncertainty of the experimentally determined bounds.

Sure, it is a gaussian pdf when taken as a function of the possible observed values of the data, but as a likelihood function this observed value is fixed and we vary the parameter [tex]\theta[/tex] to compare this observed value to different model predictions.

SW VandeCarr said:

Without data, I can only suggest a simulation. Perhaps some else can be of more help. There's at least one person who posts in this forum who has a physics background. I don't think the rules allow me to give his username, but it begins with Fr.

But we do have data, we have a lot of data saying that nothing was found within the coverage of the experiment (in scenario 1 above). This information must be used to rule out those parts of the model parameter space for which something should have been seen at that experiment. This shouldn't be a hard cut though, like is commonly done, the uncertainty in exactly what can and cannot be ruled out by a given experiment must be accommodated.

Thanks for the replies btw, eventually we may distil my question into something more mathematical in nature and then maybe the answer will become obvious :).

kurros · Jul 6, 2010

Ok it has come to my attention that I may be looking at some kind of "censored data" problem. Does this sound reasonable to anyone? I have gone and found a few books that discuss this censored data thing but I have yet to decided if this is actually the kind of scenario I am looking at or not. Also the word "truncated data" has turned up, but I don't think that is the case, this seems to be something more to do with having some data but with samples excluded outside a certain region.

Any thoughts? Or good books on the analysis of censored data?

SW VandeCarr · Jul 7, 2010

kurros said:

Ok it has come to my attention that I may be looking at some kind of "censored data" problem. Does this sound reasonable to anyone? I have gone and found a few books that discuss this censored data thing but I have yet to decided if this is actually the kind of scenario I am looking at or not. Also the word "truncated data" has turned up, but I don't think that is the case, this seems to be something more to do with having some data but with samples excluded outside a certain region.

Any thoughts? Or good books on the analysis of censored data?

No one else has responded and I think it's because there are no obvious solutions by the usual statistical methods without observations. I have a fair amount of experience with censoring in terms of survival analysis. You can read my response to artbio 6/24 #2763321. I don't see how it would work here since you still need some observations to analyze censored data. In particular, you can deal with a lot of left censoring, but not a lot of right censoring of unobserved (and unknown)outcomes.

Just to be sure I understand the situation: You want the probability Y of a particle density X1 for some energy level X2. So the initial question is P(X1|X2). However the range of X2 available is limited so you cannot see the entire probability distribution Y for X1.

We can dispose of this conditional probability by looking only at the maximum available energy level. Your actual observation is zero particle density at this level. Note that your half Gaussian model has no non zero lower limit, so the model should predict some non-zero particle density at this level. You have no observations to work with, but your theoretical model should provide you with a non zero value of what you should see at this energy level

This is where simulation comes in. You cannot actually calculate a confidence interval for your zero observations without other data points. There are random generators which can generate values for specified Gaussian model parameters. All this can tell you, however, is the probability of the data (zero observations) given the model.

winterfors · Jul 10, 2010

kurros said:

...the observable must be above a certain value with 95% confidence limit or something, with no upper bound.

What is the correct form of the likelihood function to use for such a quantity?

In order to calculate the 95% confidence, the experimenters (or whoever wrote the article with the result you refer to) must already have assumed a likelihood function (PDF) of the observable in question. I suggest you try to tease out what PDF they were using and then use the same one yourself.

kurros · Jul 11, 2010

Hmm, ok cheers, that has given me some things to think about. I'll go see what I can find out...

Correct Form of Likelihood Function for Data w/ Upper/Lower Bound

1. What is a likelihood function?

2. What is the correct form of a likelihood function for data with upper and lower bounds?

3. How is a truncated likelihood function different from a regular likelihood function?

4. How is the likelihood function affected by upper and lower bounds on the data?

5. How do you determine the correct form of a likelihood function for data with upper and lower bounds?

Similar threads

Hot Threads

Recent Insights