Isosceles triangle in information theory

In summary, the conversation discusses the comparison between the dissecting line and legs of an isosceles triangle in Euclidean geometry, as well as the Kullback-Leibler divergence in information theory. The question is whether the divergence is convex when the point from which it is measured is fixed, and the answer is yes, as proven by the concavity of the natural logarithm.
  • #1
noowutah
57
3
In Euclidean geometry (presumably also in non-Euclidean geometry), the part of the dissecting line that dissects the vertex angle and is inside the isosceles triangle is shorter than the legs of the isosceles triangle. Let ABC be an isosceles triangle with AB being the base. Then, for [itex]0<\lambda<1[/itex],

[tex]d(C,\lambda{}A+(1-\lambda)B)<d(C,A)=d(C,B)[/tex]

[itex]d[/itex] is the Euclidean distance measure (taking [itex]a_{i}[/itex] to be the coordinates of A in [itex]\mathbb{R}^{n}[/itex])

[tex]d(A,B)=\sum_{i=1}^{n}\sqrt{(a_{i}-b_{i})^{2}}[/tex]

I want to show that this is also true if our notion of distance is the Kullback-Leibler divergence from information theory. So, let A, B, C be points in n-dimensional space with

[tex]D_{KL}(C,A)=D_{KL}(C,B)[/tex]

where

[tex]D_{KL}(X,Y)=\sum_{i=1}^{n}x_{i}\ln\frac{x_{i}}{y_{i}}[/tex]

Let F be a point between A and B in the sense that

[tex]F=\lambda{}A+(1-\lambda)B,0<\lambda<1[/tex]

Then I want to prove that

[tex]D_{KL}(C,F)<D_{KL}(C,A)=D_{KL}(C,B)[/tex]

Two points that may be helpful are (1) the Gibbs inequality ([itex]p\ln{}p<p\ln{}q[/itex]); and (2) the convexity of the logarithm ([itex]\ln(\lambda{}x+(1-\lambda)y)<\lambda\ln{}x+(1-\lambda)\ln{}y[/itex]), but I haven't been able to get anywhere. I'd love some help.
 
Mathematics news on Phys.org
  • #2
Actually I think the opposite is true, i.e. by the concavity of the logarithm $$ \ln( \lambda y_i +(1-\lambda)z_i) > \lambda \ln y_i +(1-\lambda)\ln z_i $$ and using $$ D_{KL}(X,Y)=\sum_{i=1}^{n}x_{i}\ln\frac{x_{i}}{y_{i}}=\sum_{i=1}^{n}x_{i}\ln x_{i}-x_{i}\ln y_{i} $$ and similarly for ##D_{KL}(X,Z) ## and ## D_{KL}(X,\lambda Y+(1-\lambda)Z) ## you get $$ D_{KL}(X,\lambda Y+(1-\lambda)Z)<\lambda D_{KL}(X,Y)+(1-\lambda)D_{KL}(X,Z) $$

Edit : corrected, thanks @stlukits, indeed the log is concave, not convex - don't know what I was thinking.
 
Last edited:
  • Like
Likes noowutah
  • #3
Yes, good point. The natural logarithm is actually concave -- my bad -- so

[tex]\ln(\lambda{}x+(1-\lambda)y)\geq\lambda\ln{}x+(1-\lambda)\ln{}y[/tex]

which, if wabbit were right, would give us the result I need. Following wabbit, however, I only get

[tex]D_{KL}(Z,\lambda{}X+(1-\lambda)Y)=\sum_{i=1}^{n}z_{i}(\ln{}z_{i}-\ln(\lambda{}x_{i}+(1-\lambda)y_{i}))\leq\sum_{i=1}^{n}z_{i}\ln\frac{z_{i}}{x_{i}^{\lambda}y_{i}^{1-\lambda}}[/tex]

but that's not smaller or equal than

[tex]\sum_{i=1}^{n}z_{i}\ln\frac{z_{i}}{\lambda{}x_{i}+(1-\lambda)y_{i}}=\lambda{}D_{KL}(Z,X)+(1-\lambda)D_{KL}(Z,Y)[/tex]

So we are close, but not quite there. Thank you, wabbit, for framing the question nicely -- is the Kullback-Leibler divergence convex if you hold the point from which you measure the divergence fixed, i.e.

[tex]D_{KL}(Z,\lambda{}X+(1-\lambda)Y)\stackrel{\mbox{?}}{\leq}\lambda{}D_{KL}(Z,X)+(1-\lambda)D_{KL}(Z,Y)[/tex]
 
  • #4
Thanks for the correction about concavity - other than that I don't see what's the problem, the inequality follows directly from the concavity as mentionned above.
 
  • Like
Likes noowutah

FAQ: Isosceles triangle in information theory

What is an isosceles triangle in information theory?

An isosceles triangle in information theory refers to a mathematical concept that represents the relationship between three variables in a system. It is often used to analyze the efficiency or complexity of a data transmission process.

What are the three variables represented by an isosceles triangle in information theory?

The three variables represented by an isosceles triangle are the source (where the data originates), the channel (through which the data is transmitted), and the receiver (where the data is received).

How is the isosceles triangle used to measure efficiency in data transmission?

The base of the isosceles triangle represents the amount of data that needs to be transmitted, while the two equal sides represent the amount of data that the channel can efficiently transmit. By comparing the base to the sides, one can measure the efficiency of the data transmission process.

What is the significance of the equal sides in an isosceles triangle in information theory?

The equal sides of an isosceles triangle represent the maximum amount of data that can be efficiently transmitted through the channel. If the data being transmitted exceeds this limit, the efficiency of the process will decrease, leading to potential errors or loss of data.

How does an isosceles triangle relate to the overall field of information theory?

An isosceles triangle is just one of many mathematical concepts used in information theory to analyze and improve data transmission processes. It helps researchers understand the relationship between different variables and how to optimize the transmission of data in various systems.

Similar threads

Replies
1
Views
892
Replies
2
Views
2K
Replies
7
Views
1K
Replies
1
Views
1K
Replies
3
Views
978
Replies
2
Views
5K
Replies
6
Views
1K
Replies
2
Views
974
Back
Top