Isosceles triangle in information theory

Click For Summary

Discussion Overview

The discussion revolves around the properties of the Kullback-Leibler divergence in relation to isosceles triangles and Euclidean geometry. Participants explore the implications of distance measures in both geometric and information-theoretic contexts, particularly focusing on the convexity and concavity of logarithmic functions.

Discussion Character

  • Exploratory, Technical explanation, Debate/contested, Mathematical reasoning

Main Points Raised

  • One participant proposes that the distance from a point C to a point F (a convex combination of points A and B) is less than the distance from C to A or B, using Euclidean distance as a basis.
  • Another participant counters this by suggesting that the concavity of the logarithm implies a different relationship, leading to a potential inequality involving Kullback-Leibler divergence.
  • A later reply acknowledges the concavity of the logarithm and attempts to derive an inequality involving Kullback-Leibler divergence, questioning whether it holds under fixed divergence measurement.
  • Participants discuss the implications of the concavity of the logarithm on the Kullback-Leibler divergence and its convexity properties.
  • One participant shares links to formal proofs regarding the convexity of Kullback-Leibler divergence, indicating a resolution to their query.

Areas of Agreement / Disagreement

Participants express differing views on the implications of concavity and convexity in the context of Kullback-Leibler divergence, leading to an unresolved debate on the inequalities involved. While some participants agree on the concavity of the logarithm, the overall discussion remains contested regarding the implications for divergence.

Contextual Notes

Participants reference specific mathematical properties and inequalities without resolving all assumptions or dependencies on definitions, particularly concerning the nature of the Kullback-Leibler divergence.

noowutah
Messages
56
Reaction score
3
In Euclidean geometry (presumably also in non-Euclidean geometry), the part of the dissecting line that dissects the vertex angle and is inside the isosceles triangle is shorter than the legs of the isosceles triangle. Let ABC be an isosceles triangle with AB being the base. Then, for 0<\lambda<1,

d(C,\lambda{}A+(1-\lambda)B)<d(C,A)=d(C,B)

d is the Euclidean distance measure (taking a_{i} to be the coordinates of A in \mathbb{R}^{n})

d(A,B)=\sum_{i=1}^{n}\sqrt{(a_{i}-b_{i})^{2}}

I want to show that this is also true if our notion of distance is the Kullback-Leibler divergence from information theory. So, let A, B, C be points in n-dimensional space with

D_{KL}(C,A)=D_{KL}(C,B)

where

D_{KL}(X,Y)=\sum_{i=1}^{n}x_{i}\ln\frac{x_{i}}{y_{i}}

Let F be a point between A and B in the sense that

F=\lambda{}A+(1-\lambda)B,0<\lambda<1

Then I want to prove that

D_{KL}(C,F)<D_{KL}(C,A)=D_{KL}(C,B)

Two points that may be helpful are (1) the Gibbs inequality (p\ln{}p<p\ln{}q); and (2) the convexity of the logarithm (\ln(\lambda{}x+(1-\lambda)y)<\lambda\ln{}x+(1-\lambda)\ln{}y), but I haven't been able to get anywhere. I'd love some help.
 
Mathematics news on Phys.org
Actually I think the opposite is true, i.e. by the concavity of the logarithm $$ \ln( \lambda y_i +(1-\lambda)z_i) > \lambda \ln y_i +(1-\lambda)\ln z_i $$ and using $$ D_{KL}(X,Y)=\sum_{i=1}^{n}x_{i}\ln\frac{x_{i}}{y_{i}}=\sum_{i=1}^{n}x_{i}\ln x_{i}-x_{i}\ln y_{i} $$ and similarly for ##D_{KL}(X,Z) ## and ## D_{KL}(X,\lambda Y+(1-\lambda)Z) ## you get $$ D_{KL}(X,\lambda Y+(1-\lambda)Z)<\lambda D_{KL}(X,Y)+(1-\lambda)D_{KL}(X,Z) $$

Edit : corrected, thanks @stlukits, indeed the log is concave, not convex - don't know what I was thinking.
 
Last edited:
  • Like
Likes   Reactions: noowutah
Yes, good point. The natural logarithm is actually concave -- my bad -- so

\ln(\lambda{}x+(1-\lambda)y)\geq\lambda\ln{}x+(1-\lambda)\ln{}y

which, if wabbit were right, would give us the result I need. Following wabbit, however, I only get

D_{KL}(Z,\lambda{}X+(1-\lambda)Y)=\sum_{i=1}^{n}z_{i}(\ln{}z_{i}-\ln(\lambda{}x_{i}+(1-\lambda)y_{i}))\leq\sum_{i=1}^{n}z_{i}\ln\frac{z_{i}}{x_{i}^{\lambda}y_{i}^{1-\lambda}}

but that's not smaller or equal than

\sum_{i=1}^{n}z_{i}\ln\frac{z_{i}}{\lambda{}x_{i}+(1-\lambda)y_{i}}=\lambda{}D_{KL}(Z,X)+(1-\lambda)D_{KL}(Z,Y)

So we are close, but not quite there. Thank you, wabbit, for framing the question nicely -- is the Kullback-Leibler divergence convex if you hold the point from which you measure the divergence fixed, i.e.

D_{KL}(Z,\lambda{}X+(1-\lambda)Y)\stackrel{\mbox{?}}{\leq}\lambda{}D_{KL}(Z,X)+(1-\lambda)D_{KL}(Z,Y)
 
Thanks for the correction about concavity - other than that I don't see what's the problem, the inequality follows directly from the concavity as mentionned above.
 
  • Like
Likes   Reactions: noowutah

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
7
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 5 ·
Replies
5
Views
2K