Correlations vs. negligence of correlations in a covariance matrix

Click For Summary

Discussion Overview

The discussion revolves around the impact of using different covariance matrices on the best fit parameters in a model fitting context. Participants explore the implications of considering correlations between data points versus neglecting them, specifically in the context of fitting a power law model to data.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that using a full covariance matrix that includes correlations may yield different best fit parameters (a, b) compared to using a diagonal covariance matrix that neglects correlations.
  • Others question the specific function being fitted and the nature of the data, seeking clarification on whether the model is linear or follows a power law form.
  • A participant describes the chi-square statistic used for fitting, which incorporates the covariance matrix, and notes the distinction between using the full matrix versus a diagonal approximation.
  • There is uncertainty regarding the interpretation of the covariance matrix, with some participants seeking clarification on what the entries represent and how the data is structured.
  • Participants discuss the implications of using sample means and the associated variances in the context of the covariance matrix, raising questions about the representation of errors in the data.
  • One participant suggests that the differences in best fit parameters may not be significant if the error bars remain consistent across analyses.

Areas of Agreement / Disagreement

Participants express differing views on whether the best fit parameters will significantly differ based on the covariance matrix used. Some assert that differences are expected, while others suggest that the differences may not be substantial.

Contextual Notes

There is a lack of clarity regarding the experimental setup and the definitions of the random variables involved, particularly concerning the nature of the data points and the errors associated with them. The discussion also highlights unresolved questions about the statistical properties of the covariance matrix and the implications for model fitting.

CAF123
Gold Member
Messages
2,918
Reaction score
87
Suppose I have a model composed of two parameters ##(a,b)## that I want to describe a set of data points with. In CASE A, I fit the model taking into consideration the correlations between the data points (that is, in the chi square formulation I use the full covariance matrix for the data) and in CASE B I only use the diagonal covariance matrix (that is, off diagonal elements are set to zero, so I neglect the correlations).

Is it a general feature that I may expect different results for the best fit parameters ##(a,b)## in each analysis?

Thank you.
 
Physics news on Phys.org
CAF123 said:
Is it a general feature that I may expect different results for the best fit parameters ##(a,b)## in each analysis?

You didn't say what function you are fitting to the data or whether a and b are the only quantities involved in the model. Are you fitting a linear model of the form a = Bc + D where B and D are constants?
 
Hi @Stephen Tashi
Stephen Tashi said:
You didn't say what function you are fitting to the data or whether a and b are the only quantities involved in the model. Are you fitting a linear model of the form a = Bc + D where B and D are constants?
It’s a power like form, ##f(x,a,b) = a x^b ## and the only two parameters to be fitted are a and b.
 
CAF123 said:
It’s a power like form, ##f(x,a,b) = a x^b ## and the only two parameters to be fitted are a and b.

Do you mean that your data has the format ##(x_i, y_i)## and the model is ##y = a x^b## ? What statistic are you using that has a chi-square distribution?
 
Stephen Tashi said:
Do you mean that your data has the format ##(x_i, y_i)## and the model is ##y = a x^b## ?
Yes
What statistic are you using that has a chi-square distribution?
I use ##\chi^2 = \sum_{k,l=1}^N (y_k - ax_k^b) C^{-1}_{kl} (y_l - ax_l^b)## where N is number of data points I’m fitting too and C is the data covariance matrix. In one case, I populate C including the correlations of the data points while in the other case I only use the diagonal elements.
 
I don't understand what you mean by the "data covariance matrix". Is ##C## a 2x2 matrix?

When I think of computing a covariance between two random variables X, Y , I think of having many pairs of observations of the form ##(x_i, y_i)## where ##x_i## is a realization of ##X## and ##y_i## is a realization of ##Y##. In such a situation, we usually assuming ##(X,Y)## has some joint probability density ##f(x,y)## and each datum ##(x_i,y_i)## is an independently realized sample from that joint density.

By contrast, when we discuss models of the form ##y = g(x)## we often consider the ##x## values to be chosen by an experimenter or by some circumstances that we don't wish to model by a probabiity distribution. The complete model is something like ##y_i = g(x_i) + e_i## where the term ##e_i## represents the realization of a random variable. In this model, ##x_i## is not a realization of random variable.
 
Stephen Tashi said:
I don't understand what you mean by the "data covariance matrix". Is ##C## a 2x2 matrix?
No, here C is a NxN matrix, where N is the number of data points I am fitting the model ansatz y = ax^b to. In my case, C is provided by the experimenters. I was just asking the question if the best fit parameters obtained depend on whether a) I use C or b) I use C with the off diagonal elements set to zero (neglecting correlations)
 
CAF123 said:
No, here C is a NxN matrix, where N is the number of data points I am fitting the model ansatz y = ax^b to. In my case, C is provided by the experimenters.

I don't understand the data or the model.

If the data has the format ##(x_i,y_i)## , ##i = 1,2,...N## then, for example, what does the entry ##C_{2,3}## represent? What two random variables are involved?
 
Stephen Tashi said:
I don't understand the data or the model.

If the data has the format ##(x_i,y_i)## , ##i = 1,2,...N## then, for example, what does the entry ##C_{2,3}## represent? What two random variables are involved?
It would give the covariance between data points in bins 2 and 3. The data points have errors assumed to follow a Gaussian distribution. It is in this sense they are to be regarded as random variables.
 
  • #10
CAF123 said:
It would give the covariance between data points in bins 2 and 3.

Bins? What bins? What does a datum like ##(x_2,y_2)## represent? Is ##x_2## one measurement? Or is it a mean value of many measurements?
The data points have errors assumed to follow a Gaussian distribution.

Are you saying that both the ##x## and ##y## values have associated errors?

Give a coherent statement of the probability model and the random variables involved. Explain how bins are defined.
 
  • #11
Is there a link to an example of the situation? - perhaps a textbook problem or example.
 
  • #12
Stephen Tashi said:
Bins? What bins? What does a datum like ##(x_2,y_2)## represent? Is ##x_2## one measurement? Or is it a mean value of many measurements?
Yes, the ##x_i## are really ##\langle x_i \rangle## - some mean value of the bin range.

Are you saying that both the ##x## and ##y## values have associated errors?
Only the y values. Actually, I don’t know much about the experimental input - I’m mainly taking the measurements and errors provided at face value.

The simple case is where one takes the covariance as being diagonal, neglecting correlations so that ##C_{kl} = \delta_{kl}\sigma_k^2##. Then the chi square is the usual ##\chi^2 = \sum_{k=1}^N (y_k - ax_k^b)^2/\sigma_k^2##.
 
  • #13
To answer the question in the original post:

CAF123 said:
Is it a general feature that I may expect different results for the best fit parameters ##(a,b)## in each analysis?

Yes, since your results comes from finding the values of ##(a,b)## that minimize two different functions.

Whether the functions you are minimizing have ##\chi^2## distributions isn't clear to me.

For example:

CAF123 said:
The simple case is where one takes the covariance as being diagonal, neglecting correlations so that ##C_{kl} = \delta_{kl}\sigma_k^2##. Then the chi square is the usual ##\chi^2 = \sum_{k=1}^N (y_k - ax_k^b)^2/\sigma_k^2##.

If ##y_i## is the sample mean of ##K## measurements of the random variable ##Y_i## then the variance of ##y_i## is less than the variance of ##Y_i## (i.e. a single realization of ##Y_i##) So what does ##\sigma_i^2## represent? The variance of ##Y_i## or the variance of the sample mean ##y_i##?

Are the ##y_i## computed from the same number of measurements in each bin?
 
  • #14
Yes, since your results comes from finding the values of ##(a,b)## that minimize two different functions.
Indeed, I guess I should have asked 'to what extent are the values different'. E.g typically when one plots data points on a graph, the error bars shown are the standard deviations (i.e square root of the diagonal elements (=variances) ). So these errors shown will be the same regardless of whether one puts the off diagonal elements in the covariance matrix to zero or not. In this sense, it seems to me that the best fit parameter estimates should not vary significantly if one uses the covariance matrix with correlations vs. the one with only diagonal elements (otherwise you would obtain two sets of estimates for (a,b) and therefore two different curves and by eye ball you could perhaps tell what is better)

If ##y_i## is the sample mean of ##K## measurements of the random variable ##Y_i## then the variance of ##y_i## is less than the variance of ##Y_i## (i.e. a single realization of ##Y_i##) So what does ##\sigma_i^2## represent? The variance of ##Y_i## or the variance of the sample mean ##y_i##?

Are the ##y_i## computed from the same number of measurements in each bin?
Actually, I am not terribly sure about the experimental input. I would say the ##\sigma_i^2## corresponds to the variance of the sample mean given the experimental physics background (data obtained from an experimenter with a limited sample size of measurements).

I have a table of ##\langle x_i \rangle## values and corresponding mean values for the ##y_i##. In addition, I am provided with the covariance matrix which tells me how the error assignment on a certain ##y_i## will affect the corresponding error on ##y_j##. This matrix together with the ##y_i## and ##\langle x_i \rangle## are used in the chi square formulation to determine best fit estimates for model parameters a and b.

Sorry for lack of preciseness but the background/input is anyway provided by experimental physics.
 
  • #15
CAF123 said:
Indeed, I guess I should have asked 'to what extent are the values different'.
I don't know. I don't know any theoretical results that say they will or won't be similar.

E.g typically when one plots data points on a graph, the error bars shown are the standard deviations (i.e square root of the diagonal elements (=variances) ). So these errors shown will be the same regardless of whether one puts the off diagonal elements in the covariance matrix to zero or not.
You could also represent the predicted value for ##COV(x_1, x_2)## on a graph. If https://stats.stackexchange.com/que...e-of-a-sample-covariance-for-normal-variables is correct, you could draw an error bar around it.

The usual way of presenting a fit graphically would not reveal whether the model predicted covariances well. This is a limitation of the usual way of presenting things, not a proof that the covariances don't matter. Do the people who furnished the data care about how well covariances are predicted?
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
13
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K