Riemannian Fisher-Rao metric and orthogonal parameter space

Vini · Sep 24, 2020

Let ## \mathcal{S} ## be a family of probability distributions ## \mathcal{P} ## of random variable ## \beta ## which is smoothly parametrized by a finite number of real parameters, i.e.,
## \mathcal{S}=\left\{\mathcal{P}_{\theta}=w(\beta;\theta);\theta \in \mathbb{R}^{n}, \theta=(\theta^{i})\right\} ## . The statistical model ## \mathcal{S} ## carries the structure of smooth Riemannian manifold ## \mathcal{M} ## , with respect to which ## \theta=(\theta^{i}) ## play the role of coordinates of a point ## \mathcal{P}_{\theta}\in \mathcal{S} ## , and whose metric is defined by the Fisher's information matrix ## \mbox{H}=(g_{ij}(\theta)) ## , where the coefficients of this matrix, which yields a positive definite metric, are calculated as the expectation of a product involving partial derivatives of the logarithm of the probability density function's (PDF)

## g_{ij}(\theta)=\int^{+\infty}_{-\infty} \displaystyle \frac{\partial^{2}ln \left( w(\beta;\theta)\right)}{\partial \theta^{i} \partial \theta^{j}}w(\beta;\theta)d\beta ## .How do we neglect the off-diagonal terms ## g_{12}= g_{21} ## ?

In other words, is there a mathematical argument, wherein it is possible to consider ## g_{12}=g_{21}=0 ## ?

quasar987 · Oct 11, 2020

Hello Vini,

First let me reformulate the general setup that you explained in the way that I understand it.

I will focus on the case where [itex]\Omega[/itex] is finite because for the general case, although the idea is the same, the technical details are infinitely more subtle. A statistical model for some random variable on [itex]\Omega[/itex] is a map [itex]p[/itex] from some open subset [itex]U[/itex] of [itex]\mathbb{R}^n[/itex] (or more generally, some manifold!) into the set [itex]\mathcal{P}=\mathcal{P}(\Omega)[/itex] of all probability measures on [itex]\Omega[/itex]. The set [itex]\mathcal{P}[/itex] itself sits inside the set [itex]\mathcal{S}[/itex] of all signed measures on [itex]\Omega[/itex] which (contrary to [itex]\mathcal{P}[/itex]) is a vector space. Thus, as a manifold, its tangent space at any point is naturally identified to [itex]\mathcal{S}[/itex] itself. Now, let us restrict our attention to the open submanifold [itex]\mathcal{S}^{\circ}[/itex] of all the nowhere vanishing signed measures. Here, there is a canonical family of covariant [itex]k[/itex]-tensor fields for every integer [itex]k[/itex] given by integration (of the product of the Radon-Nykodym derivatives):
$$
T_{\mu}(\mathcal{S}^{\circ})\times \ldots T_{\mu}(\mathcal{S}^{\circ})\rightarrow \mathbb{R}: (\sigma_1,\ldots \sigma_k)\mapsto \int_{\Omega}\frac{d\sigma_1}{d\mu}\cdots \frac{d\sigma_k}{d\mu} d\mu.
$$
In particular, for [itex]k=2[/itex], this is a Riemannian metric called the Fisher metric and when you pull it back through [itex]p[/itex] you get, up to an integration by part, the Fisher metric on [itex]U[/itex] that you wrote down. The only source I know of to properly learn about this in the [itex]|\Omega|=\infty[/itex] case is the book Information Geometry (2017) by Ay, Jost, Lê, Schwachhöfer.

So back to your question, which actually has nothing to do with the specifics of how the Fisher metric arises: around every point [itex]x[/itex] of a Riemannian manifold [itex](U,g)[/itex] there is a coordinate system for which the metric is diagonal at [itex]x[/itex]. This is easy to see when you know that the exponential map is a local diffeomorphism: just pick an orthogonal basis [itex](V_1,\ldots V_m)[/itex] of [itex]T_xU[/itex] and define coordinates [itex]u^i[/itex] by setting [itex]u^i\mapsto \mathrm{exp}_x(u^iV_i)[/itex].

It is certainly not always true that there exists coordinates for which the metric is diagonal. Such a metric is called conformally flat (or simply flat if the diagonal elements are actually constant). As far as information geometry is concerned, one of the more endearing result is that, for the most important probability distribution in statistics (the normal distribution), the Fisher metric is (up to a scaling factor), the hyperbolic metric on the upper half plane.

Riemannian Fisher-Rao metric and orthogonal parameter space

SUMMARY

PREREQUISITES

NEXT STEPS

USEFUL FOR

Similar threads

Graduate Nonautonomous Lie derivative

Graduate Equivalent definitions of tensor field

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect