Graduate Information Geometry: Is there any there there?

Click For Summary
SUMMARY

The discussion centers on the concept of Information Geometry, which integrates differential geometry and statistics, with applications in machine learning, signal processing, and game theory. Key contributors include John Baez and Shun'ichi Amari, who have explored its implications in theoretical biophysics and computational neuroscience. Despite its theoretical appeal, many practitioners in machine learning view Information Geometry as lacking practical utility, primarily due to the computational challenges of applying the Fisher metric in large parameter spaces. The conversation highlights both the potential and skepticism surrounding the field.

PREREQUISITES
  • Understanding of differential geometry principles
  • Familiarity with statistical manifolds
  • Knowledge of machine learning techniques, particularly gradient descent
  • Basic concepts of quantum mechanics and state spaces
NEXT STEPS
  • Research "Fisher metric" and its applications in machine learning
  • Explore John Baez's contributions to Information Geometry
  • Study Shun'ichi Amari's work on Information Geometry and its applications
  • Investigate the relationship between Information Geometry and quantum mechanics
USEFUL FOR

Researchers, mathematicians, and practitioners in machine learning, theoretical biophysics, and computational neuroscience who are interested in the intersection of geometry and statistical analysis.

Frabjous
Gold Member
Messages
1,959
Reaction score
2,402
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271
 
Last edited:
  • Like
Likes Jarvis323
Physics news on Phys.org
caz said:
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271
It looks legit to me. The author claims to be with Sony (Sony Computer Science Laboratoties[sic]). It does appear to be a blend of differential geometry and statistics, as you said. Later in the paper the author mentions applications of information geometry as machine learning, signal processing, and game theory.
 
  • Like
Likes Jarvis323, Frabjous and fresh_42
Mark44 said:
It looks legit to me.
I have read the Wikipedia entry on it and its link to statistical manifolds. It sounded interesting from a theoretical point of view. I am curious whether there are some actual applications or at least shortcuts which pure stochastic couldn't provide.
 
  • Like
Likes Jarvis323 and Frabjous
fresh_42 said:
I have read the Wikipedia entry on it and its link to statistical manifolds. It sounded interesting from a theoretical point of view. I am curious whether there are some actual applications or at least shortcuts which pure stochastic couldn't provide.

The intent of my question was more along these lines.
 
John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?
 
  • Like
Likes Jarvis323 and Frabjous
Auto-Didact said:
John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?

Thanks, I’ll check him out.
 
This seems to be related to a series of lectures given by Leonard Susskind at the Institute of Advanced Studies.

The following is the first of 3 lectures.

The subject of these lectures is quantum complexity and gravity. Susskind creates these lectures after reading several research papers on complexity theory. In these lectures, he describes an experimental approach for unifying quantum mechanics with gravity. Complexity and gravity are deeply related.

The following are several ideas from the first of 3 lectures that I feel are interesting:

Quantifying complexity

Start with K qubits, we have a space of states 2k dimensional.

What is space of states of the k-qubit system? The manifold of normalizable states modding out to U(1) phase is some CP(N) where N = 2k.

CP(N) is a sphere, regularized by epsilon balls, balls with radius ε. Each ε ball represents a discrete state. The # of epsilon balls is the number of states, #S, where #S grows rapidly with K and weakly dependent on 1/ε.


Distance between states


Distance between states A, B = dAB = arcos l<A,B>l is the standard inner product metric.

Max d = π/2, the maximum distance between any two states. This metric does not show the qualitative distance between states.


Relative complexity


A metric that quantifies the # of small steps is the relative complexity.

We use a quantum circuit for this description. The collection of gates are unitary operators between states A and B. There is a set of universal collection of gates to go from anyone state to another. The minimum number n of such gates is C(A,B), or the relative complexity.

The relative complecity is a metrical notion of distance in CP(N). C(A,B) is a geodesic, the shortest path between lA> and lB>.

The conclusion is the maximally entangled states will have an inner product which is at most π/2. The unitary space is small in the inner product metriic. The relative complexity can be large.

The properties of relative complexity are analogous to the classical notion of distance. 1. C ≥ 0. # gates ≥ 0

2. C(u,v) = 0
iff u=v

3. C(u,v) = C(v,u). C
is a symmetric function on u and v

4. C(u,v) ≤ C(u,w) + C(w,v). t
riangle inequalityC is Right invariant, but not left invariant. Complexity has a geometry that is right invariant on the space SU(2k).
 
  • Like
Likes Frabjous and Jarvis323
Auto-Didact said:
John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?

I would point out that most people (at least in computational neuroscience and machine learning) view information geometry as completely useless in practice. It's a very beautiful formalisation of information theory and Fisher information etc. in terms of differential geometry, but one that has produced no practical results that weren't already known. I'd be very happy to be proven wrong on that though.

As a side note, information geometry was initially developed by a number of Japanese scientists, especially this guy https://en.wikipedia.org/wiki/Shun'ichi_Amari. For a long time it was only published in japanese language books and articles. And by the way, Amari was a computational neuroscientist, so it kinda developed out of that field in a way.
 
  • Like
Likes jbergman and Frabjous
This paper may not be all that related, but you might find it interesting.

OBSERVED UNIVERSALITY OF PHASE TRANSITIONS IN HIGH-DIMENSIONAL GEOMETRY, WITH IMPLICATIONS FOR MODERN DATA ANALYSIS AND SIGNAL PROCESSING

We review connections between phase transitions in high-dimensional combinatorial geometry and phase transitions occurring in modern highdimensional data analysis and signal processing. In data analysis, such transitions arise as abrupt breakdown of linear model selection, robust data fitting or compressed sensing reconstructions, when the complexity of the model or the number of outliers increases beyond a threshold. In combinatorial geometry these transitions appear as abrupt changes in the properties of face counts of convex polytopes when the dimensions are varied. The thresholds in these very different problems appear in the same critical locations after appropriate calibrationof variables.
https://people.maths.ox.ac.uk/tanner/papers/DoTa_Universality.pdf
 
Last edited:
  • Like
  • Informative
Likes atyy, madness and Frabjous
  • #10
Jarvis323 said:
This paper may not be all that related, but you might find it interesting.https://people.maths.ox.ac.uk/tanner/papers/DoTa_Universality.pdf

This reminds me of another result that I found interesting, buried in the supplemental material of this paper (https://www.nature.com/articles/s41586-020-2130-2) (equation 23 in supplemental). What they show is that the eigenvectors of the sample covariance matrix undergo a phase transition as the number of samples and number of dimensions are varied.
 
  • Informative
  • Like
Likes atyy and Jarvis323
  • #11
caz said:
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

The book 'Information Geometry' of Ay et al. expands on some applications (see image attached).

There is also a fascinating quantum version of information geometry. As a teaser, consider this. The state space of a quantum system with finitely many states is \mathbb{C}P^n = \mathbb{S}^{2n-1}/\mathbb{S}^1: the vectors of unit length (total probability =1) in the Hilbert space modulo \mathbb{S}^1 (states that differ only by a phase factor are physically indistinguishable). On \mathbb{C}P^n there is a natural metric that comes from the round metric on \mathbb{S}^{2n-1}, the so-called Fubini-Study metric. Geometrically, \mathbb{C}P^n is relatively simple to describe; it is the product of the standard n-dimensional probability simplex \Delta_n with a n-torus with the edges identified in some complicated manner. But the fact remains that on an open dense subset, \mathbb{C}P^n is just the product \mathrm{int}(\Delta_n)\times \mathbb{T}^n and the Fubini-Study metric with respect to this factorization is the product of the Fisher metric (!) on \Delta_n times a metric on \mathbb{T}^n that varies with where we are on \Delta_n.

Source: 'Geometry of quantum states: An Introduction to Quantum Entanglement'. Bengtsson et al. (2017). Also Gromov's meditations on Entropy: https://www.ihes.fr/~gromov/expository/579/
 

Attachments

  • ch6.png
    ch6.png
    76.3 KB · Views: 353
  • Like
Likes WWGD and Frabjous
  • #12
caz said:
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

In machine learning they try to minimize functions on parameter space. They do this by the method of gradient descent. Information geometry comes in and says

"You guys are doing gradient descent but you're wrongly assuming that the parameter space is flat. It is not, it carries a natural nonplanar shape given by (the pullback of) the Fisher metric and when you take this into account your gradient descent method works better."

Unfortunately computing the Fisher metric is computationally too expensive for the large parameter spaces involved in Neural networks so they largely ignore it lol.

Source:
https://towardsdatascience.com/natural-gradient-ce454b3dcdfa
https://www.mitpressjournals.org/doi/10.1162/089976698300017746?mobileUi=0&
 
  • Like
Likes WWGD and Frabjous
  • #13
caz said:
Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

The book 'Information Geometry and Its Applications' by Amari (the main pioneer of the subject) has a 130 pages long section on applications.There is a freely downloadable Table of Content here:
https://www.springer.com/gp/book/9784431559771
 
  • Like
Likes WWGD and Frabjous
  • #14
madness said:
I would point out that most people (at least in computational neuroscience and machine learning) view information geometry as completely useless in practice. It's a very beautiful formalisation of information theory and Fisher information etc. in terms of differential geometry, but one that has produced no practical results that weren't already known. I'd be very happy to be proven wrong on that though.

As a side note, information geometry was initially developed by a number of Japanese scientists, especially this guy https://en.wikipedia.org/wiki/Shun'ichi_Amari. For a long time it was only published in japanese language books and articles. And by the way, Amari was a computational neuroscientist, so it kinda developed out of that field in a way.
I am no expert on the literature here but that is what I've also heard from Differential Geometers. In other words, it mostly involves "dressing up" things with Differential Geometry without really providing any new results or insights.
 
  • Like
Likes Frabjous and madness

Similar threads

  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 5 ·
Replies
5
Views
1K
Replies
1
Views
2K
  • · Replies 152 ·
6
Replies
152
Views
10K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 31 ·
2
Replies
31
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K