Information Geometry: Is there any there there?

In summary, John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?
  • #1
Frabjous
Gold Member
1,600
1,927
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271
 
Last edited:
  • Like
Likes Jarvis323
Physics news on Phys.org
  • #2
caz said:
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271
It looks legit to me. The author claims to be with Sony (Sony Computer Science Laboratoties[sic]). It does appear to be a blend of differential geometry and statistics, as you said. Later in the paper the author mentions applications of information geometry as machine learning, signal processing, and game theory.
 
  • Like
Likes Jarvis323, Frabjous and fresh_42
  • #3
Mark44 said:
It looks legit to me.
I have read the Wikipedia entry on it and its link to statistical manifolds. It sounded interesting from a theoretical point of view. I am curious whether there are some actual applications or at least shortcuts which pure stochastic couldn't provide.
 
  • Like
Likes Jarvis323 and Frabjous
  • #4
fresh_42 said:
I have read the Wikipedia entry on it and its link to statistical manifolds. It sounded interesting from a theoretical point of view. I am curious whether there are some actual applications or at least shortcuts which pure stochastic couldn't provide.

The intent of my question was more along these lines.
 
  • #5
John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?
 
  • Like
Likes Jarvis323 and Frabjous
  • #6
Auto-Didact said:
John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?

Thanks, I’ll check him out.
 
  • #7
This seems to be related to a series of lectures given by Leonard Susskind at the Institute of Advanced Studies.

The following is the first of 3 lectures.

The subject of these lectures is quantum complexity and gravity. Susskind creates these lectures after reading several research papers on complexity theory. In these lectures, he describes an experimental approach for unifying quantum mechanics with gravity. Complexity and gravity are deeply related.

The following are several ideas from the first of 3 lectures that I feel are interesting:

Quantifying complexity

Start with K qubits, we have a space of states 2k dimensional.

What is space of states of the k-qubit system? The manifold of normalizable states modding out to U(1) phase is some CP(N) where N = 2k.

CP(N) is a sphere, regularized by epsilon balls, balls with radius ε. Each ε ball represents a discrete state. The # of epsilon balls is the number of states, #S, where #S grows rapidly with K and weakly dependent on 1/ε.


Distance between states


Distance between states A, B = dAB = arcos l<A,B>l is the standard inner product metric.

Max d = π/2, the maximum distance between any two states. This metric does not show the qualitative distance between states.


Relative complexity


A metric that quantifies the # of small steps is the relative complexity.

We use a quantum circuit for this description. The collection of gates are unitary operators between states A and B. There is a set of universal collection of gates to go from anyone state to another. The minimum number n of such gates is C(A,B), or the relative complexity.

The relative complecity is a metrical notion of distance in CP(N). C(A,B) is a geodesic, the shortest path between lA> and lB>.

The conclusion is the maximally entangled states will have an inner product which is at most π/2. The unitary space is small in the inner product metriic. The relative complexity can be large.

The properties of relative complexity are analogous to the classical notion of distance. 1. C ≥ 0. # gates ≥ 0

2. C(u,v) = 0
iff u=v

3. C(u,v) = C(v,u). C
is a symmetric function on u and v

4. C(u,v) ≤ C(u,w) + C(w,v). t
riangle inequalityC is Right invariant, but not left invariant. Complexity has a geometry that is right invariant on the space SU(2k).
 
  • Like
Likes Frabjous and Jarvis323
  • #8
Auto-Didact said:
John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?

I would point out that most people (at least in computational neuroscience and machine learning) view information geometry as completely useless in practice. It's a very beautiful formalisation of information theory and Fisher information etc. in terms of differential geometry, but one that has produced no practical results that weren't already known. I'd be very happy to be proven wrong on that though.

As a side note, information geometry was initially developed by a number of Japanese scientists, especially this guy https://en.wikipedia.org/wiki/Shun'ichi_Amari. For a long time it was only published in japanese language books and articles. And by the way, Amari was a computational neuroscientist, so it kinda developed out of that field in a way.
 
  • Like
Likes jbergman and Frabjous
  • #9
This paper may not be all that related, but you might find it interesting.

OBSERVED UNIVERSALITY OF PHASE TRANSITIONS IN HIGH-DIMENSIONAL GEOMETRY, WITH IMPLICATIONS FOR MODERN DATA ANALYSIS AND SIGNAL PROCESSING

We review connections between phase transitions in high-dimensional combinatorial geometry and phase transitions occurring in modern highdimensional data analysis and signal processing. In data analysis, such transitions arise as abrupt breakdown of linear model selection, robust data fitting or compressed sensing reconstructions, when the complexity of the model or the number of outliers increases beyond a threshold. In combinatorial geometry these transitions appear as abrupt changes in the properties of face counts of convex polytopes when the dimensions are varied. The thresholds in these very different problems appear in the same critical locations after appropriate calibrationof variables.
https://people.maths.ox.ac.uk/tanner/papers/DoTa_Universality.pdf
 
Last edited:
  • Like
  • Informative
Likes atyy, madness and Frabjous
  • #10
Jarvis323 said:
This paper may not be all that related, but you might find it interesting.https://people.maths.ox.ac.uk/tanner/papers/DoTa_Universality.pdf

This reminds me of another result that I found interesting, buried in the supplemental material of this paper (https://www.nature.com/articles/s41586-020-2130-2) (equation 23 in supplemental). What they show is that the eigenvectors of the sample covariance matrix undergo a phase transition as the number of samples and number of dimensions are varied.
 
  • Informative
  • Like
Likes atyy and Jarvis323
  • #11
caz said:
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

The book 'Information Geometry' of Ay et al. expands on some applications (see image attached).

There is also a fascinating quantum version of information geometry. As a teaser, consider this. The state space of a quantum system with finitely many states is [itex]\mathbb{C}P^n = \mathbb{S}^{2n-1}/\mathbb{S}^1[/itex]: the vectors of unit length (total probability =1) in the Hilbert space modulo [itex]\mathbb{S}^1[/itex] (states that differ only by a phase factor are physically indistinguishable). On [itex]\mathbb{C}P^n[/itex] there is a natural metric that comes from the round metric on [itex]\mathbb{S}^{2n-1}[/itex], the so-called Fubini-Study metric. Geometrically, [itex]\mathbb{C}P^n[/itex] is relatively simple to describe; it is the product of the standard [itex]n[/itex]-dimensional probability simplex [itex]\Delta_n[/itex] with a [itex]n[/itex]-torus with the edges identified in some complicated manner. But the fact remains that on an open dense subset, [itex]\mathbb{C}P^n[/itex] is just the product [itex]\mathrm{int}(\Delta_n)\times \mathbb{T}^n[/itex] and the Fubini-Study metric with respect to this factorization is the product of the Fisher metric (!) on [itex]\Delta_n[/itex] times a metric on [itex]\mathbb{T}^n[/itex] that varies with where we are on [itex]\Delta_n[/itex].

Source: 'Geometry of quantum states: An Introduction to Quantum Entanglement'. Bengtsson et al. (2017). Also Gromov's meditations on Entropy: https://www.ihes.fr/~gromov/expository/579/
 

Attachments

  • ch6.png
    ch6.png
    76.3 KB · Views: 278
  • Like
Likes WWGD and Frabjous
  • #12
caz said:
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

In machine learning they try to minimize functions on parameter space. They do this by the method of gradient descent. Information geometry comes in and says

"You guys are doing gradient descent but you're wrongly assuming that the parameter space is flat. It is not, it carries a natural nonplanar shape given by (the pullback of) the Fisher metric and when you take this into account your gradient descent method works better."

Unfortunately computing the Fisher metric is computationally too expensive for the large parameter spaces involved in Neural networks so they largely ignore it lol.

Source:
https://towardsdatascience.com/natural-gradient-ce454b3dcdfa
https://www.mitpressjournals.org/doi/10.1162/089976698300017746?mobileUi=0&
 
  • Like
Likes WWGD and Frabjous
  • #13
caz said:
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

The book 'Information Geometry and Its Applications' by Amari (the main pioneer of the subject) has a 130 pages long section on applications.There is a freely downloadable Table of Content here:
https://www.springer.com/gp/book/9784431559771
 
  • Like
Likes WWGD and Frabjous
  • #14
madness said:
I would point out that most people (at least in computational neuroscience and machine learning) view information geometry as completely useless in practice. It's a very beautiful formalisation of information theory and Fisher information etc. in terms of differential geometry, but one that has produced no practical results that weren't already known. I'd be very happy to be proven wrong on that though.

As a side note, information geometry was initially developed by a number of Japanese scientists, especially this guy https://en.wikipedia.org/wiki/Shun'ichi_Amari. For a long time it was only published in japanese language books and articles. And by the way, Amari was a computational neuroscientist, so it kinda developed out of that field in a way.
I am no expert on the literature here but that is what I've also heard from Differential Geometers. In other words, it mostly involves "dressing up" things with Differential Geometry without really providing any new results or insights.
 
  • Like
Likes Frabjous and madness

1. What is information geometry?

Information geometry is a field of mathematics that studies the geometric structures and properties of probability distributions. It combines concepts from differential geometry, information theory, and statistics to provide a framework for understanding and analyzing complex data sets.

2. How is information geometry used?

Information geometry has applications in various fields such as machine learning, signal processing, and statistical physics. It can be used to analyze and model data, as well as to design efficient algorithms for data processing and inference.

3. What are the key concepts in information geometry?

The key concepts in information geometry include manifolds, metrics, connections, and divergences. Manifolds are geometric spaces that represent the set of possible probability distributions. Metrics measure the distance between distributions, while connections describe how distributions change as parameters are varied. Divergences quantify the difference between two distributions.

4. What are the benefits of using information geometry?

Information geometry provides a powerful framework for understanding and analyzing complex data sets. It allows for a more intuitive and geometric interpretation of statistical concepts, and can lead to more efficient and accurate algorithms for data analysis. It also provides a bridge between different fields of mathematics, allowing for interdisciplinary research and applications.

5. Are there any limitations to information geometry?

Like any mathematical framework, information geometry has its limitations. It may not be suitable for all types of data and may not always provide the most accurate or efficient solutions. Additionally, it requires a strong understanding of mathematics and statistics to apply effectively. However, it continues to be a valuable tool for analyzing and modeling complex data sets.

Similar threads

  • Beyond the Standard Models
Replies
3
Views
1K
  • Topology and Analysis
Replies
3
Views
1K
  • Quantum Interpretations and Foundations
Replies
2
Views
772
  • Quantum Physics
Replies
4
Views
769
  • Beyond the Standard Models
Replies
1
Views
774
  • Quantum Interpretations and Foundations
5
Replies
152
Views
7K
Replies
24
Views
702
  • Biology and Medical
Replies
15
Views
2K
  • Atomic and Condensed Matter
Replies
1
Views
833
Back
Top