How can I convert chi-square into a matrix and graph likelihood contours?

BrunoMota · May 13, 2018

Hello, I'm new to the world of "numerical" astronomy/cosmology and I've been studying maximum likelihood estimation for model tests. The (analytical) theory is somewhat easy to understand, but I've been struggling with the numerical aspect of the computations.

Suppose you have some model with a given number of free parameters (say, 3 for simplicity) and you're given an array of (N) observed values (along with the respective deviations).

What I'm trying to do is to graph the 2D/1D likelihood contours. I've been recommended to first calculate de chi-square sum:

$$ \chi^2 = \sum_{i}^{N} \left(\frac{y_i - y_i(a_1,a_2,a_3) }{\sigma _i }\right)^2 $$

where $$ a_1,a_2,a_3 $$ are my free parameters.

The the likelihood is given by:
$$ L = e^{-\chi^2/2} $$

My problem is the numerical aspect of this. How do I convert the chi-square into a matrix form and after I get the likelihood matrix how do I graph the countours?

Also because want the 2D/1D likelihood contours I'll have to marginalize but obviously I can not integrate since this is a not an analytical problem. How should I do this.

Thanks in advance!

Orodruin · May 13, 2018

This is really a numerical problem and depends on what kind of software you are using. It is not a priori a cosmology problem.

BrunoMota said:

Also because want the 2D/1D likelihood contours I'll have to marginalize but obviously I can not integrate since this is a not an analytical problem. How should I do this.

What does integration have to do with being analytical? There are ways to do numerical integration.

BrunoMota · May 13, 2018

Orodruin said:

This is really a numerical problem and depends on what kind of software you are using. It is not a priori a cosmology problem.What does integration have to do with being analytical? There are ways to do numerical integration.

Before anything, thanks for your reply.

Yes, this is mainly a numerical problem since the physics here comes from the model in case (which I omitted). Nevertheless I would think this is standard in some areas of cosmology (as it is in many other areas of physics) therefore the question.

About the integration, I may have used the wrong words. Sure if you have a given domain and an integrable function there are a lot of methods for a numerical integration but in this case this is not so linear. I presume I’ll have to performance some kind of sum in my likelihood matrix (which is indeed the integration) but I don’t think a specific method will be needed here.

By the way I’m using python for this, although any other programming language would be good (matlab software for instance is the most standard as far as I know).

Orodruin · May 13, 2018

If you have a rectangular grid in your parameters, numerical integration just amounts to summing over the parameters you want to marginalise (and multiplying by the grid spacing) as this is just the Riemann sum. You can refine this somewhat by using the trapezoid method or similar higher order approximations to the integral.

BrunoMota · May 13, 2018

Orodruin said:

If you have a rectangular grid in your parameters, numerical integration just amounts to summing over the parameters you want to marginalise (and multiplying by the grid spacing) as this is just the Riemann sum. You can refine this somewhat by using the trapezoid method or similar higher order approximations to the integral.

Still (and although easy to implement ) I don’t believe I’ll have to go to such lengths to marginalize since I’m working with likelihood here but it might worth a shot.

Orodruin · May 13, 2018

If likelihood marginalisation (and Bayesian interpretation) is what you are doing, this is what you have to do. It is completely standard and really not anything out of the ordinary in terms of complexity. If you are doing frequentist profiling it is different.

kimbyd · May 14, 2018

BrunoMota said:

Hello, I'm new to the world of "numerical" astronomy/cosmology and I've been studying maximum likelihood estimation for model tests. The (analytical) theory is somewhat easy to understand, but I've been struggling with the numerical aspect of the computations.

Suppose you have some model with a given number of free parameters (say, 3 for simplicity) and you're given an array of (N) observed values (along with the respective deviations).

What I'm trying to do is to graph the 2D/1D likelihood contours. I've been recommended to first calculate de chi-square sum:

$$ \chi^2 = \sum_{i}^{N} \left(\frac{y_i - y_i(a_1,a_2,a_3) }{\sigma _i }\right)^2 $$

where $$ a_1,a_2,a_3 $$ are my free parameters.

The the likelihood is given by:
$$ L = e^{-\chi^2/2} $$

My problem is the numerical aspect of this. How do I convert the chi-square into a matrix form and after I get the likelihood matrix how do I graph the countours?

Also because want the 2D/1D likelihood contours I'll have to marginalize but obviously I can not integrate since this is a not an analytical problem. How should I do this.

Thanks in advance!

So, first of all, the ##\chi^2## is not a matrix. It's a number.

What you're looking for is probably the Fisher Information matrix, which is drawn from the partial derivatives of the ##\chi^2## with respect to the model parameters.

kurros · May 14, 2018

BrunoMota said:

Still (and although easy to implement ) I don’t believe I’ll have to go to such lengths to marginalize since I’m working with likelihood here but it might worth a shot.

Actually marginalising likelihoods is in general not easy, because you are trying to do a numerical integration in a 3D (in your case, but often higher D) space. But there are a lot of tools out there for doing it, for example you could check out PyMC since you said you are using Python. But there are a couple things I would mention about this whole approach first:

1. By making this particular chi-squared assumption you are assuming that your data is drawn from a normal distribution. Is this what you want? You need to make sure you have the correct statistical description of your data before you go further.

2. You shouldn't just go marginalising this quantity without thinking about it. Marginalising means you are treating the problem in a Bayesian way, which means you have to consider prior probability distributions for your parameters. Your choice of these will affect your inferences. Since you are basing things on a chi-squared sum it seems likely that you actually want profile likelihood contours. See here for how those work: https://en.wikipedia.org/wiki/Likelihood-ratio_test

As for a likelihood matrix, I'm not sure what you are asking there. If you are talking about a Fisher matrix as kimbyd suggests, well it's not clear why you want that from your question. You don't need it to construct either marginal posteriors or profile likelihoods.

Orodruin · May 14, 2018

kimbyd said:

So, first of all, the ##\chi^2## is not a matrix. It's a number.

He never claimed this as far as I can see. As I understand it he has a matrix with the value of the ##\chi^2## at different points in parameter space, i.e., he has discretised the ##\chi^2## function.

kimbyd · May 14, 2018

Orodruin said:

He never claimed this as far as I can see. As I understand it he has a matrix with the value of the ##\chi^2## at different points in parameter space, i.e., he has discretised the ##\chi^2## function.

The formal way of doing that is via the Fisher Information matrix.

The ##\chi^2## function is, after all, just a function. To use a matrix to describe that function you have to make certain assumptions. The assumption that's made for the Fisher Information matrix is that the probability distribution is approximately Gaussian. That is,

$$L(\vec{p}) \propto e^{-\vec{p}^\dagger F \vec{p}}$$

Here, ##\vec{p}## is the parameter vector, ##F## is the Fisher information matrix (which is a function of the data), and ##L## is the likelihood function.

Orodruin · May 14, 2018

kimbyd said:

The formal way of doing that is via the Fisher Information matrix.

The ##\chi^2## function is, after all, just a function. To use a matrix to describe that function you have to make certain assumptions. The assumption that's made for the Fisher Information matrix is that the probability distribution is approximately Gaussian. That is,

$$L(\vec{p}) \propto e^{-\vec{p}^\dagger F \vec{p}}$$

Here, ##\vec{p}## is the parameter vector, ##F## is the Fisher information matrix (which is a function of the data), and ##L## is the likelihood function.

I believe you are still not understanding what he is doing. It has nothing to do with the Fisher matrix. It is just a grid discretisation of the function itself. Every entry in his matrix (which is not really a matrix but a three-dimensional grid) represents the value of the chi square function at a point in his parameter space.

Stephen Tashi · May 14, 2018

BrunoMota said:

Suppose you have some model with a given number of free parameters (say, 3 for simplicity) and you're given an array of (N) observed values (along with the respective deviations).

By "array" do mean a 1-dimensional array? Or does each entry of the array consist of more than one number? What does an element of the array look like? What are "the respective deviations"?

What I'm trying to do is to graph the 2D/1D likelihood contours.

What is the definition of a "likelihood contour"? One guess is that a "2D likelihood contour" is a line representing where some probability density function has a constant value. What probability density function are you talking about? What is its domain?

I've been recommended to first calculate de chi-square sum:

$$ \chi^2 = \sum_{i}^{N} \left(\frac{y_i - y_i(a_1,a_2,a_3) }{\sigma _i }\right)^2 $$

where $$ a_1,a_2,a_3 $$ are my free parameters.

For what purpose was this sum recommended?

It would be better notation to distinguish between a ##y_i## that is data and a variable ##\hat{y_i}(a_1,a_2,a_3)## that is a prediction of the model, if that's what the notation is supposed to represent. Besides the parameters, what other variables are used by the model to make predictions? Are both the data and the model a function of spatial coordinates?

The the likelihood is given by:
$$ L = e^{-\chi^2/2} $$

My problem is the numerical aspect of this. How do I convert the chi-square into a matrix form

If you only have N data points and the sum for ##L## involves all of them, then why would there be any matrix associated with ##L##?

kimbyd · May 14, 2018

Orodruin said:

I believe you are still not understanding what he is doing. It has nothing to do with the Fisher matrix. It is just a grid discretisation of the function itself. Every entry in his matrix (which is not really a matrix but a three-dimensional grid) represents the value of the chi square function at a point in his parameter space.

I really don't understand what you're saying.

If you want to get the likelihood contours from a ##\chi^2## function using a Gaussian approximation, then the Fisher Information matrix is the way to do that.

Orodruin · May 15, 2018

kimbyd said:

I really don't understand what you're saying.

If you want to get the likelihood contours from a ##\chi^2## function using a Gaussian approximation, then the Fisher Information matrix is the way to do that.

For a Gaussian approximation yes. As you noted yourself, this only gives an approximation that is valid close to the best fit. Let's take an example from one of my papers

Using the Fisher matrix method here would have been completely disastrous. So how was this plot constructed then? By discretising the ##\chi^2## function and evaluating it on a grid of points, like all contour plots in Matlab. The matrix he is talking about is the matrix of values of the ##\chi^2## function on that grid. Calling the parameters ##x## and ##y## and their grid point values ##x_i## and ##y_i##, respectively, the matrix elements are
$$
\chi^2_{ij} = \chi^2(x_i,y_i).
$$

kimbyd · May 15, 2018

Orodruin said:

For a Gaussian approximation yes. As you noted yourself, this only gives an approximation that is valid close to the best fit. Let's take an example from one of my papers
View attachment 225749
Using the Fisher matrix method here would have been completely disastrous. So how was this plot constructed then? By discretising the ##\chi^2## function and evaluating it on a grid of points, like all contour plots in Matlab. The matrix he is talking about is the matrix of values of the ##\chi^2## function on that grid. Calling the parameters ##x## and ##y## and their grid point values ##x_i## and ##y_i##, respectively, the matrix elements are
$$
\chi^2_{ij} = \chi^2(x_i,y_i).
$$

If you want to do the full Bayesian analysis without assuming a normal distribution, yes, you have to do something like this. It involves marginalizing the likelihood function over all of the other parameters, which can be a computationally-intensive task. My impression of the OP was that they were looking for something far more basic.

If you have ##\chi^2(a_1,a_2,a_3)## and want to graph the likelihood contours in the ##a_1##, ##a_2## plane, then you have to marginalize over the remaining parameter via integration:

$$e^{-\chi^2(a_1,a_2) \over 2} \propto \int P(a_3)e^{-\chi^2(a_1,a_2,a_3) \over 2} da$$

Where ##P(a_3)## is the assumed prior probability for ##a_3##. Often this is taken to be uniform, though the choice of the parameter defines what "uniform" means.

For more parameters, it's generally necessary to use Markov Chain Monte Carlo methods, which are vastly more efficient in those cases. There are a great many analysis packages, such as CosmoMC, that provide nice utilities to do this sort of analysis, plus packages to plot contour graphs for the results.

How can I convert chi-square into a matrix and graph likelihood contours?

Attachments

1. What is likelihood?

2. What is the chi-square test?

3. How is the likelihood ratio calculated?

4. What is the relationship between likelihood and chi-square?

5. When is the chi-square test used?

Similar threads

Hot Threads

Recent Insights