Combining independent probabilities in a meaningful way

Click For Summary
SUMMARY

The discussion centers on combining independent probabilities from multiple experiments to derive a meaningful overall probability. Participants emphasize the inadequacy of simply multiplying probabilities, which leads to extremely small values, and suggest using average log likelihoods instead. The average log likelihood provides a normalized measure that allows for meaningful comparisons across different numbers of experiments. Additionally, it is noted that the negative average log likelihood can converge to the cross-entropy between the model distribution and the true distribution, offering valuable insights into model performance.

PREREQUISITES
  • Understanding of probability theory and likelihood functions
  • Familiarity with log likelihood and its applications in statistical modeling
  • Knowledge of cross-entropy and its significance in model evaluation
  • Basic concepts of normalization in statistical analysis
NEXT STEPS
  • Research the concept of average log likelihood and its calculation methods
  • Explore cross-entropy and its relationship with likelihood in model evaluation
  • Study normalization techniques in probability distributions
  • Investigate the implications of overfitting in statistical models with varying data points
USEFUL FOR

Data scientists, statisticians, and researchers involved in probabilistic modeling and analysis, particularly those working with independent experiments and likelihood functions.

daviddoria
Messages
96
Reaction score
0
I have 1000 experiments on the same data, each of which is trying to decide the probability that the data agrees with a model.

The problem is, even if each experiment REALLY agrees with the model (ie. p(d1) = .99, p(d2) = .99, etc), when I multiply these together to get P(d1 & d2 & d3... & dn), the result is some really really small number (near 0). If I do 2 sets of such experiments, I can tell which set of data had a higher probability of matching the model (simply by seeing which of the two combined numbers was larger), but the actual numbers at that point are meaningless - they are only useful in a "relative to a different one" sense.

Is there a way to combine these to get a "valid" probability without just multiplying them? A reasonable idea seems to just "average" them, ie \sum p(dn) / n, but everyone tells me this is a really terrible idea and it is not a real probability any more.

Thoughts?

Thanks,

Dave
 
Physics news on Phys.org
Use the log likelihood instead.
 
But if you sum log likelihoods instead of multiplying regular likelihoods, you still get the same problem of only having tiny numbers that don't mean anything unless you compare them to other of the tiny number (they're just larger than if you didn't take logs). Know what I mean?

David
 
daviddoria said:
But if you sum log likelihoods instead of multiplying regular likelihoods, you still get the same problem of only having tiny numbers that don't mean anything unless you compare them to other of the tiny number (they're just larger than if you didn't take logs). Know what I mean?

I'm afraid not. Why do you care about what the numbers look like in an absolute sense? You just want the model that gives you the highest likelihood, so it's not clear to me what meaning the likelihoods could have absent some comparison to one another.

If you just want to get nicely-scaled numbers, in addition to using the log-likelihood you should divide it by the number of data points (i.e., the average log likelihood).
 
Here is a separate problem:

As the number of experiments increases, the likelihood can only go down. It seems like it should go the other way... ie. if you have 5 experiments that each "agree" with the model with .8 likelihood, intuitively you are less sure about the agreement than if you have 1000 experiments that each agree with .99 likelihood, but the sum of the log-likelihoods shows the inverse is true... see what I mean?

Is there a way to overcome this?

Dave
 
daviddoria said:
Here is a separate problem:

As the number of experiments increases, the likelihood can only go down. It seems like it should go the other way... ie. if you have 5 experiments that each "agree" with the model with .8 likelihood, intuitively you are less sure about the agreement than if you have 1000 experiments that each agree with .99 likelihood, but the sum of the log-likelihoods shows the inverse is true... see what I mean?

Is there a way to overcome this?

Yes, use the average log likelihood. In the first case (5 data points with 0.8 likelihood), the average log likelihood is -0.2231. In the second case (1000 data points with 0.99 likelihood), the average log likelihood is -0.0101. Both results are insensitive to the number of data points, and so can be meaningfully compared.

Also, after thinking about it, there are useful interpretations of the absolute value of the average log likelihood. Specifically, under certain assumptions of stationarity/ergodicity on the data, the negative average log likelihood will converge to the cross-entropy between the model distribution and the true distribution. The cross-entropy has a number of interesting properties and interpretations, one of which is that it is lower-bounded by the entropy of the true distribution. You can think of the cross entropy as telling you the average number of bits you'd need to send to specify a given piece of data, under the estimated model. This, in turn, is equal to the true entropy plus the KL divergence between the true distribution and the estimated one, so we see that maximum likelihood estimation is equivalent to minimization of the KL divergence between the model distribution and the true distribution.
 
quadraphonics is right, multiplying all likelihood functions p(d1|m)p(d2|m)... (or taking the sum of their logarithms) is the right approach.

This product will decrease if you increase the number of experiments (at least if you are dealing with discrete variables). This is because it is a probability in data space (the space spanned by of all d1, d2, d3 etc.), where the probability distribution will be ever more "spread out" as the size of the space increases.

If you are interested in the probability of your model parameter m, to get the actual probability you need to add a normalization constant: K*p(d1|m)p(d2|m)... so that if you sum up the probability of all possible values of m, the result will be equal to 1.

However, if you just want to find the m with the highest probability rather than the actual value of the probability itself, you can ignore calculating the constant and just look for the m that maximizes the product p(d1|m)p(d2|m)...

-Emanuel
 
Yea, the problem with just maximizing the sum of log likelihoods is that then if I run a second experiment, the two resulting numbers are not comparable. I guess that's where finding that normalization constant comes in.. I can compare the normalized values. That could be harder than trivial to find though!

Thanks for the discussion, all.

Dave
 
daviddoria said:
Yea, the problem with just maximizing the sum of log likelihoods is that then if I run a second experiment, the two resulting numbers are not comparable. I guess that's where finding that normalization constant comes in.. I can compare the normalized values. That could be harder than trivial to find though!

That's only a problem if the experiments have different numbers of data points. But it's trivial to compensate for this: use the *average* log likelihood (i.e., normalized by the number of data points). This will not be sensitive to the number of data points (to first order; if the number of data points ever becomes small relative to the degrees of freedom of the model, you will then get over-fitting effects).
 

Similar threads

  • · Replies 29 ·
Replies
29
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K