Asymmetry between probability distributions

In summary, the author made an interesting observation that they can't explain to themselves. They thought that the way the asymmetry works is that as you go from the center to the periphery of the equilateral triangle (the entropy of the probability distribution decreases), less information is necessary compared to going from the periphery to the center. However, their intuition was wrong. They can partition the simplex into those points Q for which KLD(Q,P)>KLD(P,Q) (color them red) and those for which KLD(Q,P)<KLD(P,Q) (color them blue). The partitions are pretty and far from trivial.
  • #1
noowutah
57
3
I have made an interesting observation that I can't explain to myself. Think about a prior probability P and a posterior probability Q. They are defined on an event space W with only three elements: w1, w2, and w3 (the number of elements won't matter as long as it's finite). The Kullback-Leibler divergence measures how far these probability distributions are apart, i.e. how much information it takes to get from P to Q. If P(w1)=p1 etc. then

KLD(Q,P)=q1*log(q1/p1)+q2*log(q2/p2)+q3*log(q3/p3)

The KLD is not symmetric, so if P and Q switch roles (Q is now the prior and P the posterior), the divergence will be different. If you think of P and Q as points on a simplex (all points in R3 with r1+r2+r3=1 with rj>0; the simplex in R3 looks like an equilateral triangle), the KLD does NOT define a metric topology on this simplex, because KLD(Q,P) is in general not equal to KLD(P,Q).

My original intuition was that the way this asymmetry works is that as you go from the centre to the periphery of the equilateral triangle (i.e. the entropy of the probability distribution decreases), less information is necessary compared to the other way around going from the periphery to the centre, so

H(P)>H(Q) implies that KLD(Q,P)<KLD(P,Q)

Note that the prior is the second argument for KLD -- that's a bit counterintuitive. H is here the Shannon entropy H(P)=-p1log(p1)-p2log(p2)-p3log(p3).

In any case, my intuition is wrong. Let P (the prior) be fixed. Then you can partition the simplex into those points Q for which KLD(Q,P)>KLD(P,Q) (colour them red) and those for which KLD(Q,P)<KLD(P,Q) (colour them blue). The partitions are pretty and far from trivial. How could you defend this in terms of intuitions about probability distributions? Is there any way to explain, without recourse to information theory, why going from P to Q1 is harder than going from Q1 to P; while it is easier going from P to Q2 than going from Q2 to P? Q1 is an arbitrary red point, while Q2 is an arbitrary blue point.

Here is the partition for P=(1/3,1/3,1/3):

http://streetgreek.com/lpublic/various/asym-eq.png

And here for P=(0.4,0.4,0.2):

http://streetgreek.com/lpublic/various/asym422.png

And here for P=(0.242,0.604,0.154):

http://streetgreek.com/lpublic/various/asym262.png

And here for P=(0.741,0.087,0.172):

http://streetgreek.com/lpublic/various/asym712.png
 
Last edited by a moderator:
  • Like
Likes mfb
Physics news on Phys.org
  • #2
An interesting problem!

Let's define ">" on points on the plane as P>Q iff KLD(Q,P)<KLD(P,Q), i. e. going from Q to P needs more information than the opposite.
Is this transitive? If P>Q and Q>R, is P>R?
If yes, there should be a "smallest" point, one where the whole plane is colored blue.

Your points don't seem to suggest this. If the relation is not transitive, it has a weird consequence: you can find a set P, Q, R where going P->Q->R->P needs more information than going P->R->Q->P.

For an event space with just two events, the solution should be:
P>Q for (p<q and p+q<1) or (p>q and p+q>1) where p=p1 and q=q1.
In other words, P>Q if |p-1/2| < |q-1/2|.
Going closer to the middle needs more information than going outwards. This is transitive.
(I'm sure there is at least one sign error in it)

Example:
p -> q:
0.3 -> 0.1: KLD(Q,P)=0.0505
0.1 -> 0.3: KLD(Q,P)=0.0667
 
  • Like
Likes noowutah
  • #3
Notice that in the definition of the KLD there is an expectation value taken with respect to Q: KLD(Q,P)=q1*log(q1/p1)+q2*log(q2/p2)+q3*log(q3/p3). And of what? the logarithm of ratios of probabilies according to P and to Q.

KLD tells us how fast you learn, when the true distribution is Q, that it isn't P. And this quantity is asymmetric. Which is pretty obvious (the asymmetry) when you think about some examples in which one of the p's or one of the q's is zero.
 
  • Like
Likes noowutah
  • #4
gill1109 -- yes, you are absolutely right. This is precisely what I am trying to show: that the asymmetry is also justified when one of the p's or q's is NOT zero. This is not as obvious as it appears. Continuity between the extreme probability case and the non-extreme probability case is, of course, one argument for asymmetry. But some people have invested a lot of time into a geometric model of non-extreme probabilities where all the distances are symmetric. I am trying to show that they are wrong. There is a sense in which my argument isn't doing so well -- the asymmetries, as the diagrams show, are all over the place and intuitively unpredictable.

mfb -- excellent point. I've been trying since I saw your post to prove that in the two-dimensional case H(p)>H(q) implies the kind of asymmetry you suggest. It's turning out to be a more difficult proof than I envisioned but you must be right about this. Be that as it may, it's not true for the three-dimensional case; there are lots of counter-examples, as my diagrams show. Transitivity should not be at issue here, but more so the triangle inequality: it should definitely be harder to get from P->Q->R than to go from P->R directly, but that's true for both symmetric measures and the KLD.

I will keep working on this. If anybody has ideas please let me know.
 
Last edited by a moderator:
  • #5
How does your plot look like for P=(0.5, 0.25, 0.25)? That seems to be the center of one of the three lobes in the first plot.
 
  • #6
http://www.streetgreek.com/lpublic/various/asym533.png
P=(0.5,0.25,0.25)
 
Last edited by a moderator:
  • #7
Interesting. No transitivity then.

P = (1/3, 1/3, 1/3)
Q = (1/2, 1/4, 1/4)
R = (0.4,0.4,0.2)

KLD(Q,P)>KLD(P,Q)
KLD(R,Q)>KLD(Q,R)
KLD(P,R)>KLD(R,Q)
KLD(Q,P)+KLD(R,Q)+KLD(P,R) > KLD(P,Q)+KLD(Q,R)+KLD(R,Q)
 
  • Like
Likes noowutah
  • #8
Fascinating, mfb! That should be another problem for the Kullback-Leibler divergence as a measure of dissimilarity between probability distributions. Violation of this kind of transitivity is even harder to square with epistemic intuitions we have about probabilities and updating them than the non-trivial asymmetry patterns that I pointed out.
 

1. What is asymmetry between probability distributions?

Asymmetry between probability distributions refers to the unequal distribution of outcomes in a dataset. This means that the data is not evenly spread out and some values occur more frequently than others.

2. How is asymmetry measured in probability distributions?

Asymmetry can be measured using a statistical measure called skewness. Skewness is a measure of the asymmetry of a distribution and can be positive, negative, or zero. A positive skewness indicates a longer tail on the right side of the distribution, while a negative skewness indicates a longer tail on the left side.

3. What causes asymmetry in probability distributions?

Asymmetry in probability distributions can be caused by a variety of factors, including outliers, unequal frequencies of values, and the underlying data generation process. In some cases, the data may be naturally skewed due to the nature of the phenomenon being studied.

4. How does asymmetry affect data analysis?

Asymmetry can have a significant impact on data analysis, as it can affect the mean, median, and other measures of central tendency. It can also impact the accuracy of statistical tests and the interpretation of results. Therefore, it is important to take into account the asymmetry of the data when conducting data analysis.

5. Can asymmetry be corrected in probability distributions?

In some cases, asymmetry can be corrected by transforming the data, such as taking the logarithm or square root of the values. However, in other cases, asymmetry may be inherent in the data and cannot be corrected. It is important to carefully consider the underlying causes of asymmetry before attempting to correct it.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
8K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
4K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
3K
Replies
1
Views
950
Replies
96
Views
9K
  • Advanced Physics Homework Help
Replies
3
Views
2K
Replies
1
Views
594
Back
Top