Correlations vs. negligence of correlations in a covariance matrix

CAF123 · Apr 25, 2020

Suppose I have a model composed of two parameters ##(a,b)## that I want to describe a set of data points with. In CASE A, I fit the model taking into consideration the correlations between the data points (that is, in the chi square formulation I use the full covariance matrix for the data) and in CASE B I only use the diagonal covariance matrix (that is, off diagonal elements are set to zero, so I neglect the correlations).

Is it a general feature that I may expect different results for the best fit parameters ##(a,b)## in each analysis?

Thank you.

Stephen Tashi · Apr 25, 2020

CAF123 said:

Is it a general feature that I may expect different results for the best fit parameters ##(a,b)## in each analysis?

You didn't say what function you are fitting to the data or whether a and b are the only quantities involved in the model. Are you fitting a linear model of the form a = Bc + D where B and D are constants?

CAF123 · Apr 25, 2020

Hi @Stephen Tashi

Stephen Tashi said:

You didn't say what function you are fitting to the data or whether a and b are the only quantities involved in the model. Are you fitting a linear model of the form a = Bc + D where B and D are constants?

It’s a power like form, ##f(x,a,b) = a x^b ## and the only two parameters to be fitted are a and b.

Stephen Tashi · Apr 25, 2020

CAF123 said:

It’s a power like form, ##f(x,a,b) = a x^b ## and the only two parameters to be fitted are a and b.

Do you mean that your data has the format ##(x_i, y_i)## and the model is ##y = a x^b## ? What statistic are you using that has a chi-square distribution?

CAF123 · Apr 26, 2020

Stephen Tashi said:

Do you mean that your data has the format ##(x_i, y_i)## and the model is ##y = a x^b## ?

Yes

What statistic are you using that has a chi-square distribution?

I use ##\chi^2 = \sum_{k,l=1}^N (y_k - ax_k^b) C^{-1}_{kl} (y_l - ax_l^b)## where N is number of data points I’m fitting too and C is the data covariance matrix. In one case, I populate C including the correlations of the data points while in the other case I only use the diagonal elements.

Stephen Tashi · Apr 26, 2020

I don't understand what you mean by the "data covariance matrix". Is ##C## a 2x2 matrix?

When I think of computing a covariance between two random variables X, Y , I think of having many pairs of observations of the form ##(x_i, y_i)## where ##x_i## is a realization of ##X## and ##y_i## is a realization of ##Y##. In such a situation, we usually assuming ##(X,Y)## has some joint probability density ##f(x,y)## and each datum ##(x_i,y_i)## is an independently realized sample from that joint density.

By contrast, when we discuss models of the form ##y = g(x)## we often consider the ##x## values to be chosen by an experimenter or by some circumstances that we don't wish to model by a probabiity distribution. The complete model is something like ##y_i = g(x_i) + e_i## where the term ##e_i## represents the realization of a random variable. In this model, ##x_i## is not a realization of random variable.

CAF123 · Apr 26, 2020

Stephen Tashi said:

I don't understand what you mean by the "data covariance matrix". Is ##C## a 2x2 matrix?

No, here C is a NxN matrix, where N is the number of data points I am fitting the model ansatz y = ax^b to. In my case, C is provided by the experimenters. I was just asking the question if the best fit parameters obtained depend on whether a) I use C or b) I use C with the off diagonal elements set to zero (neglecting correlations)

Stephen Tashi · Apr 26, 2020

CAF123 said:

No, here C is a NxN matrix, where N is the number of data points I am fitting the model ansatz y = ax^b to. In my case, C is provided by the experimenters.

I don't understand the data or the model.

If the data has the format ##(x_i,y_i)## , ##i = 1,2,...N## then, for example, what does the entry ##C_{2,3}## represent? What two random variables are involved?

CAF123 · Apr 26, 2020

Stephen Tashi said:

I don't understand the data or the model.

If the data has the format ##(x_i,y_i)## , ##i = 1,2,...N## then, for example, what does the entry ##C_{2,3}## represent? What two random variables are involved?

It would give the covariance between data points in bins 2 and 3. The data points have errors assumed to follow a Gaussian distribution. It is in this sense they are to be regarded as random variables.

Stephen Tashi · Apr 26, 2020

CAF123 said:

It would give the covariance between data points in bins 2 and 3.

Bins? What bins? What does a datum like ##(x_2,y_2)## represent? Is ##x_2## one measurement? Or is it a mean value of many measurements?

The data points have errors assumed to follow a Gaussian distribution.

Are you saying that both the ##x## and ##y## values have associated errors?

Give a coherent statement of the probability model and the random variables involved. Explain how bins are defined.

Stephen Tashi · Apr 26, 2020

Is there a link to an example of the situation? - perhaps a textbook problem or example.

CAF123 · Apr 26, 2020

Stephen Tashi said:

Bins? What bins? What does a datum like ##(x_2,y_2)## represent? Is ##x_2## one measurement? Or is it a mean value of many measurements?

Yes, the ##x_i## are really ##\langle x_i \rangle## - some mean value of the bin range.

Are you saying that both the ##x## and ##y## values have associated errors?

Only the y values. Actually, I don’t know much about the experimental input - I’m mainly taking the measurements and errors provided at face value.

The simple case is where one takes the covariance as being diagonal, neglecting correlations so that ##C_{kl} = \delta_{kl}\sigma_k^2##. Then the chi square is the usual ##\chi^2 = \sum_{k=1}^N (y_k - ax_k^b)^2/\sigma_k^2##.

Stephen Tashi · Apr 27, 2020

To answer the question in the original post:

CAF123 said:

Is it a general feature that I may expect different results for the best fit parameters ##(a,b)## in each analysis?

Yes, since your results comes from finding the values of ##(a,b)## that minimize two different functions.

Whether the functions you are minimizing have ##\chi^2## distributions isn't clear to me.

For example:

CAF123 said:

The simple case is where one takes the covariance as being diagonal, neglecting correlations so that ##C_{kl} = \delta_{kl}\sigma_k^2##. Then the chi square is the usual ##\chi^2 = \sum_{k=1}^N (y_k - ax_k^b)^2/\sigma_k^2##.

If ##y_i## is the sample mean of ##K## measurements of the random variable ##Y_i## then the variance of ##y_i## is less than the variance of ##Y_i## (i.e. a single realization of ##Y_i##) So what does ##\sigma_i^2## represent? The variance of ##Y_i## or the variance of the sample mean ##y_i##?

Are the ##y_i## computed from the same number of measurements in each bin?

CAF123 · Apr 27, 2020

Yes, since your results comes from finding the values of ##(a,b)## that minimize two different functions.

Indeed, I guess I should have asked 'to what extent are the values different'. E.g typically when one plots data points on a graph, the error bars shown are the standard deviations (i.e square root of the diagonal elements (=variances) ). So these errors shown will be the same regardless of whether one puts the off diagonal elements in the covariance matrix to zero or not. In this sense, it seems to me that the best fit parameter estimates should not vary significantly if one uses the covariance matrix with correlations vs. the one with only diagonal elements (otherwise you would obtain two sets of estimates for (a,b) and therefore two different curves and by eye ball you could perhaps tell what is better)

If ##y_i## is the sample mean of ##K## measurements of the random variable ##Y_i## then the variance of ##y_i## is less than the variance of ##Y_i## (i.e. a single realization of ##Y_i##) So what does ##\sigma_i^2## represent? The variance of ##Y_i## or the variance of the sample mean ##y_i##?

Are the ##y_i## computed from the same number of measurements in each bin?

Actually, I am not terribly sure about the experimental input. I would say the ##\sigma_i^2## corresponds to the variance of the sample mean given the experimental physics background (data obtained from an experimenter with a limited sample size of measurements).

I have a table of ##\langle x_i \rangle## values and corresponding mean values for the ##y_i##. In addition, I am provided with the covariance matrix which tells me how the error assignment on a certain ##y_i## will affect the corresponding error on ##y_j##. This matrix together with the ##y_i## and ##\langle x_i \rangle## are used in the chi square formulation to determine best fit estimates for model parameters a and b.

Sorry for lack of preciseness but the background/input is anyway provided by experimental physics.

Stephen Tashi · Apr 27, 2020

CAF123 said:

Indeed, I guess I should have asked 'to what extent are the values different'.

I don't know. I don't know any theoretical results that say they will or won't be similar.

E.g typically when one plots data points on a graph, the error bars shown are the standard deviations (i.e square root of the diagonal elements (=variances) ). So these errors shown will be the same regardless of whether one puts the off diagonal elements in the covariance matrix to zero or not.

You could also represent the predicted value for ##COV(x_1, x_2)## on a graph. If https://stats.stackexchange.com/que...e-of-a-sample-covariance-for-normal-variables is correct, you could draw an error bar around it.

The usual way of presenting a fit graphically would not reveal whether the model predicted covariances well. This is a limitation of the usual way of presenting things, not a proof that the covariances don't matter. Do the people who furnished the data care about how well covariances are predicted?

Correlations vs. negligence of correlations in a covariance matrix

1. What is a covariance matrix?

2. What is the purpose of a covariance matrix?

3. What is the difference between correlation and negligence of correlations in a covariance matrix?

4. How are correlations and negligence of correlations represented in a covariance matrix?

5. How can correlations and negligence of correlations impact data analysis?

Similar threads

Hot Threads

Recent Insights