Riemannian Fisher-Rao metric and orthogonal parameter space

Click For Summary
SUMMARY

The discussion centers on the Riemannian Fisher-Rao metric and its application in statistical models parametrized by real parameters. The Fisher information matrix, defined as the expectation of the product of partial derivatives of the logarithm of the probability density function, yields a positive definite metric. The conversation explores the conditions under which off-diagonal terms of the metric can be neglected, specifically addressing the case where these terms are equal to zero. The book "Information Geometry" (2017) by Ay, Jost, Lê, and Schwachhöfer is recommended for further understanding of these concepts.

PREREQUISITES
  • Understanding of Riemannian manifolds and their properties
  • Familiarity with Fisher information matrix and its applications
  • Knowledge of probability distributions and their parametrization
  • Basic concepts of differential geometry and tensor fields
NEXT STEPS
  • Study the derivation and properties of the Fisher information matrix in detail
  • Explore the concept of conformally flat metrics and their implications
  • Learn about the exponential map and its role in Riemannian geometry
  • Read "Information Geometry" (2017) by Ay, Jost, Lê, and Schwachhöfer for advanced insights
USEFUL FOR

Mathematicians, statisticians, and data scientists interested in advanced statistical modeling, Riemannian geometry, and the application of Fisher metrics in probability theory.

Vini
Messages
3
Reaction score
1
TL;DR
A silly question on off-diagonal elements of the Fisher-Rao metric
Let ## \mathcal{S} ## be a family of probability distributions ## \mathcal{P} ## of random variable ## \beta ## which is smoothly parametrized by a finite number of real parameters, i.e.,
## \mathcal{S}=\left\{\mathcal{P}_{\theta}=w(\beta;\theta);\theta \in \mathbb{R}^{n}, \theta=(\theta^{i})\right\} ## . The statistical model ## \mathcal{S} ## carries the structure of smooth Riemannian manifold ## \mathcal{M} ## , with respect to which ## \theta=(\theta^{i}) ## play the role of coordinates of a point ## \mathcal{P}_{\theta}\in \mathcal{S} ## , and whose metric is defined by the Fisher's information matrix ## \mbox{H}=(g_{ij}(\theta)) ## , where the coefficients of this matrix, which yields a positive definite metric, are calculated as the expectation of a product involving partial derivatives of the logarithm of the probability density function's (PDF)

## g_{ij}(\theta)=\int^{+\infty}_{-\infty} \displaystyle \frac{\partial^{2}ln \left( w(\beta;\theta)\right)}{\partial \theta^{i} \partial \theta^{j}}w(\beta;\theta)d\beta ## .How do we neglect the off-diagonal terms ## g_{12}= g_{21} ## ?

In other words, is there a mathematical argument, wherein it is possible to consider ## g_{12}=g_{21}=0 ## ?
 
Physics news on Phys.org
Hello Vini,

First let me reformulate the general setup that you explained in the way that I understand it.

I will focus on the case where \Omega is finite because for the general case, although the idea is the same, the technical details are infinitely more subtle. A statistical model for some random variable on \Omega is a map p from some open subset U of \mathbb{R}^n (or more generally, some manifold!) into the set \mathcal{P}=\mathcal{P}(\Omega) of all probability measures on \Omega. The set \mathcal{P} itself sits inside the set \mathcal{S} of all signed measures on \Omega which (contrary to \mathcal{P}) is a vector space. Thus, as a manifold, its tangent space at any point is naturally identified to \mathcal{S} itself. Now, let us restrict our attention to the open submanifold \mathcal{S}^{\circ} of all the nowhere vanishing signed measures. Here, there is a canonical family of covariant k-tensor fields for every integer k given by integration (of the product of the Radon-Nykodym derivatives):
$$
T_{\mu}(\mathcal{S}^{\circ})\times \ldots T_{\mu}(\mathcal{S}^{\circ})\rightarrow \mathbb{R}: (\sigma_1,\ldots \sigma_k)\mapsto \int_{\Omega}\frac{d\sigma_1}{d\mu}\cdots \frac{d\sigma_k}{d\mu} d\mu.
$$
In particular, for k=2, this is a Riemannian metric called the Fisher metric and when you pull it back through p you get, up to an integration by part, the Fisher metric on U that you wrote down. The only source I know of to properly learn about this in the |\Omega|=\infty case is the book Information Geometry (2017) by Ay, Jost, , Schwachhöfer.

So back to your question, which actually has nothing to do with the specifics of how the Fisher metric arises: around every point x of a Riemannian manifold (U,g) there is a coordinate system for which the metric is diagonal at x. This is easy to see when you know that the exponential map is a local diffeomorphism: just pick an orthogonal basis (V_1,\ldots V_m) of T_xU and define coordinates u^i by setting u^i\mapsto \mathrm{exp}_x(u^iV_i).

It is certainly not always true that there exists coordinates for which the metric is diagonal. Such a metric is called conformally flat (or simply flat if the diagonal elements are actually constant). As far as information geometry is concerned, one of the more endearing result is that, for the most important probability distribution in statistics (the normal distribution), the Fisher metric is (up to a scaling factor), the hyperbolic metric on the upper half plane.
 
  • Like
Likes   Reactions: Vini

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 24 ·
Replies
24
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 43 ·
2
Replies
43
Views
5K
  • · Replies 61 ·
3
Replies
61
Views
13K
  • · Replies 10 ·
Replies
10
Views
13K
  • · Replies 175 ·
6
Replies
175
Views
27K