# Riemannian Fisher-Rao metric and orthogonal parameter space

• I
• Vini
The upper half plane can be identified to the set of positive-definite matrices. Thus, via this identification, one can view the normal distribution as a sub-Riemannian manifold of the manifold of positive-definite matrices. This has far-reaching generalization to other distributions.In summary, a statistical model is a map between an open subset of a manifold and the set of probability measures on a finite set. The Fisher metric, which is defined using the Fisher information matrix, is a Riemannian metric on the tangent space of the set of nowhere vanishing signed measures. This metric can be diagonalized at any point by choosing appropriate coordinate system. However, not all metrics can be diagonalized, and the Fisher metric has a special propertyf

#### Vini

TL;DR Summary
A silly question on off-diagonal elements of the Fisher-Rao metric
Let ## \mathcal{S} ## be a family of probability distributions ## \mathcal{P} ## of random variable ## \beta ## which is smoothly parametrized by a finite number of real parameters, i.e.,
## \mathcal{S}=\left\{\mathcal{P}_{\theta}=w(\beta;\theta);\theta \in \mathbb{R}^{n}, \theta=(\theta^{i})\right\} ## . The statistical model ## \mathcal{S} ## carries the structure of smooth Riemannian manifold ## \mathcal{M} ## , with respect to which ## \theta=(\theta^{i}) ## play the role of coordinates of a point ## \mathcal{P}_{\theta}\in \mathcal{S} ## , and whose metric is defined by the Fisher's information matrix ## \mbox{H}=(g_{ij}(\theta)) ## , where the coefficients of this matrix, which yields a positive definite metric, are calculated as the expectation of a product involving partial derivatives of the logarithm of the probability density function's (PDF)

## g_{ij}(\theta)=\int^{+\infty}_{-\infty} \displaystyle \frac{\partial^{2}ln \left( w(\beta;\theta)\right)}{\partial \theta^{i} \partial \theta^{j}}w(\beta;\theta)d\beta ## .

How do we neglect the off-diagonal terms ## g_{12}= g_{21} ## ?

In other words, is there a mathematical argument, wherein it is possible to consider ## g_{12}=g_{21}=0 ## ?

Hello Vini,

First let me reformulate the general setup that you explained in the way that I understand it.

I will focus on the case where $\Omega$ is finite because for the general case, although the idea is the same, the technical details are infinitely more subtle. A statistical model for some random variable on $\Omega$ is a map $p$ from some open subset $U$ of $\mathbb{R}^n$ (or more generally, some manifold!) into the set $\mathcal{P}=\mathcal{P}(\Omega)$ of all probability measures on $\Omega$. The set $\mathcal{P}$ itself sits inside the set $\mathcal{S}$ of all signed measures on $\Omega$ which (contrary to $\mathcal{P}$) is a vector space. Thus, as a manifold, its tangent space at any point is naturally identified to $\mathcal{S}$ itself. Now, let us restrict our attention to the open submanifold $\mathcal{S}^{\circ}$ of all the nowhere vanishing signed measures. Here, there is a canonical family of covariant $k$-tensor fields for every integer $k$ given by integration (of the product of the Radon-Nykodym derivatives):
$$T_{\mu}(\mathcal{S}^{\circ})\times \ldots T_{\mu}(\mathcal{S}^{\circ})\rightarrow \mathbb{R}: (\sigma_1,\ldots \sigma_k)\mapsto \int_{\Omega}\frac{d\sigma_1}{d\mu}\cdots \frac{d\sigma_k}{d\mu} d\mu.$$
In particular, for $k=2$, this is a Riemannian metric called the Fisher metric and when you pull it back through $p$ you get, up to an integration by part, the Fisher metric on $U$ that you wrote down. The only source I know of to properly learn about this in the $|\Omega|=\infty$ case is the book Information Geometry (2017) by Ay, Jost, , Schwachhöfer.

So back to your question, which actually has nothing to do with the specifics of how the Fisher metric arises: around every point $x$ of a Riemannian manifold $(U,g)$ there is a coordinate system for which the metric is diagonal at $x$. This is easy to see when you know that the exponential map is a local diffeomorphism: just pick an orthogonal basis $(V_1,\ldots V_m)$ of $T_xU$ and define coordinates $u^i$ by setting $u^i\mapsto \mathrm{exp}_x(u^iV_i)$.

It is certainly not always true that there exists coordinates for which the metric is diagonal. Such a metric is called conformally flat (or simply flat if the diagonal elements are actually constant). As far as information geometry is concerned, one of the more endearing result is that, for the most important probability distribution in statistics (the normal distribution), the Fisher metric is (up to a scaling factor), the hyperbolic metric on the upper half plane.

Vini