Information Geometry: Is there any there there?

Frabjous · Dec 17, 2020

Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

Mark44 · Dec 18, 2020

caz said:

Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

It looks legit to me. The author claims to be with Sony (Sony Computer Science Laboratoties[sic]). It does appear to be a blend of differential geometry and statistics, as you said. Later in the paper the author mentions applications of information geometry as machine learning, signal processing, and game theory.

fresh_42 · Dec 18, 2020

Mark44 said:

It looks legit to me.

I have read the Wikipedia entry on it and its link to statistical manifolds. It sounded interesting from a theoretical point of view. I am curious whether there are some actual applications or at least shortcuts which pure stochastic couldn't provide.

Frabjous · Dec 18, 2020

fresh_42 said:

I have read the Wikipedia entry on it and its link to statistical manifolds. It sounded interesting from a theoretical point of view. I am curious whether there are some actual applications or at least shortcuts which pure stochastic couldn't provide.

The intent of my question was more along these lines.

Auto-Didact · Dec 27, 2020

John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?

Frabjous · Dec 27, 2020

Auto-Didact said:

John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?

Thanks, I’ll check him out.

docnet · Jan 2, 2021

This seems to be related to a series of lectures given by Leonard Susskind at the Institute of Advanced Studies.

The following is the first of 3 lectures.

The subject of these lectures is quantum complexity and gravity. Susskind creates these lectures after reading several research papers on complexity theory. In these lectures, he describes an experimental approach for unifying quantum mechanics with gravity. Complexity and gravity are deeply related.

The following are several ideas from the first of 3 lectures that I feel are interesting:

Quantifying complexity

Start with K qubits, we have a space of states 2^k dimensional.

What is space of states of the k-qubit system? The manifold of normalizable states modding out to U(1) phase is some CP(N) where N = 2^k.

CP(N) is a sphere, regularized by epsilon balls, balls with radius ε. Each ε ball represents a discrete state. The # of epsilon balls is the number of states, #S, where #S grows rapidly with K and weakly dependent on 1/ε.

Distance between states

Distance between states A, B = d_AB = arcos l<A,B>l is the standard inner product metric.

Max d = π/2, the maximum distance between any two states. This metric does not show the qualitative distance between states.

Relative complexity

A metric that quantifies the # of small steps is the relative complexity.

We use a quantum circuit for this description. The collection of gates are unitary operators between states A and B. There is a set of universal collection of gates to go from anyone state to another. The minimum number n of such gates is C(A,B), or the relative complexity.

The relative complecity is a metrical notion of distance in CP(N). C(A,B) is a geodesic, the shortest path between lA> and lB>.

The conclusion is the maximally entangled states will have an inner product which is at most π/2. The unitary space is small in the inner product metriic. The relative complexity can be large.

The properties of relative complexity are analogous to the classical notion of distance. 1. C ≥ 0. # gates ≥ 0

2. C(u,v) = 0 iff u=v

3. C(u,v) = C(v,u). C is a symmetric function on u and v

4. C(u,v) ≤ C(u,w) + C(w,v). triangle inequalityC is Right invariant, but not left invariant. Complexity has a geometry that is right invariant on the space SU(2^k).

madness · Jan 4, 2021

Auto-Didact said:

John Baez has written extensively on information geometry. This branch of mathematics seems to be an essential tool for solving many central open issues in theoretical biophysics, including answering the central question: what is life?

I would point out that most people (at least in computational neuroscience and machine learning) view information geometry as completely useless in practice. It's a very beautiful formalisation of information theory and Fisher information etc. in terms of differential geometry, but one that has produced no practical results that weren't already known. I'd be very happy to be proven wrong on that though.

As a side note, information geometry was initially developed by a number of Japanese scientists, especially this guy https://en.wikipedia.org/wiki/Shun'ichi_Amari. For a long time it was only published in japanese language books and articles. And by the way, Amari was a computational neuroscientist, so it kinda developed out of that field in a way.

Jarvis323 · Jan 4, 2021

This paper may not be all that related, but you might find it interesting.

OBSERVED UNIVERSALITY OF PHASE TRANSITIONS IN HIGH-DIMENSIONAL GEOMETRY, WITH IMPLICATIONS FOR MODERN DATA ANALYSIS AND SIGNAL PROCESSING

We review connections between phase transitions in high-dimensional combinatorial geometry and phase transitions occurring in modern highdimensional data analysis and signal processing. In data analysis, such transitions arise as abrupt breakdown of linear model selection, robust data fitting or compressed sensing reconstructions, when the complexity of the model or the number of outliers increases beyond a threshold. In combinatorial geometry these transitions appear as abrupt changes in the properties of face counts of convex polytopes when the dimensions are varied. The thresholds in these very different problems appear in the same critical locations after appropriate calibrationof variables.

https://people.maths.ox.ac.uk/tanner/papers/DoTa_Universality.pdf

madness · Jan 5, 2021

Jarvis323 said:

This paper may not be all that related, but you might find it interesting.https://people.maths.ox.ac.uk/tanner/papers/DoTa_Universality.pdf

This reminds me of another result that I found interesting, buried in the supplemental material of this paper (https://www.nature.com/articles/s41586-020-2130-2) (equation 23 in supplemental). What they show is that the eigenvectors of the sample covariance matrix undergo a phase transition as the number of samples and number of dimensions are varied.

quasar987 · Jan 23, 2021

caz said:

Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

The book 'Information Geometry' of Ay et al. expands on some applications (see image attached).

There is also a fascinating quantum version of information geometry. As a teaser, consider this. The state space of a quantum system with finitely many states is [itex]\mathbb{C}P^n = \mathbb{S}^{2n-1}/\mathbb{S}^1[/itex]: the vectors of unit length (total probability =1) in the Hilbert space modulo [itex]\mathbb{S}^1[/itex] (states that differ only by a phase factor are physically indistinguishable). On [itex]\mathbb{C}P^n[/itex] there is a natural metric that comes from the round metric on [itex]\mathbb{S}^{2n-1}[/itex], the so-called Fubini-Study metric. Geometrically, [itex]\mathbb{C}P^n[/itex] is relatively simple to describe; it is the product of the standard [itex]n[/itex]-dimensional probability simplex [itex]\Delta_n[/itex] with a [itex]n[/itex]-torus with the edges identified in some complicated manner. But the fact remains that on an open dense subset, [itex]\mathbb{C}P^n[/itex] is just the product [itex]\mathrm{int}(\Delta_n)\times \mathbb{T}^n[/itex] and the Fubini-Study metric with respect to this factorization is the product of the Fisher metric (!) on [itex]\Delta_n[/itex] times a metric on [itex]\mathbb{T}^n[/itex] that varies with where we are on [itex]\Delta_n[/itex].

Source: 'Geometry of quantum states: An Introduction to Quantum Entanglement'. Bengtsson et al. (2017). Also Gromov's meditations on Entropy: https://www.ihes.fr/~gromov/expository/579/

quasar987 · Jan 23, 2021

caz said:

Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

In machine learning they try to minimize functions on parameter space. They do this by the method of gradient descent. Information geometry comes in and says

"You guys are doing gradient descent but you're wrongly assuming that the parameter space is flat. It is not, it carries a natural nonplanar shape given by (the pullback of) the Fisher metric and when you take this into account your gradient descent method works better."

Unfortunately computing the Fisher metric is computationally too expensive for the large parameter spaces involved in Neural networks so they largely ignore it lol.

Source:
https://towardsdatascience.com/natural-gradient-ce454b3dcdfa
https://www.mitpressjournals.org/doi/10.1162/089976698300017746?mobileUi=0&

quasar987 · Jan 23, 2021

caz said:

Recently came across this concept. It looks like a combination of dg and statistics. It sounds interesting, but I do not feel competent enough to make an informed decision.

For example see
https://arxiv.org/abs/1808.08271

The book 'Information Geometry and Its Applications' by Amari (the main pioneer of the subject) has a 130 pages long section on applications.There is a freely downloadable Table of Content here:
https://www.springer.com/gp/book/9784431559771

jbergman · Feb 6, 2021

madness said:

I would point out that most people (at least in computational neuroscience and machine learning) view information geometry as completely useless in practice. It's a very beautiful formalisation of information theory and Fisher information etc. in terms of differential geometry, but one that has produced no practical results that weren't already known. I'd be very happy to be proven wrong on that though.

As a side note, information geometry was initially developed by a number of Japanese scientists, especially this guy https://en.wikipedia.org/wiki/Shun'ichi_Amari. For a long time it was only published in japanese language books and articles. And by the way, Amari was a computational neuroscientist, so it kinda developed out of that field in a way.

I am no expert on the literature here but that is what I've also heard from Differential Geometers. In other words, it mostly involves "dressing up" things with Differential Geometry without really providing any new results or insights.

Information Geometry: Is there any there there?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Attachments

Similar threads

Graduate Strain Tensor Based on Clifford Algebra

Graduate Nonautonomous Lie derivative

Graduate Equivalent definitions of tensor field

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight