Correlations vs. negligence of correlations in a covariance matrix

Click For Summary
SUMMARY

This discussion centers on the impact of using a full covariance matrix versus a diagonal covariance matrix in fitting a power law model, specifically the function ##f(x,a,b) = a x^b##. The participants confirm that different best fit parameters ##(a,b)## are expected when using these two approaches due to the minimization of distinct chi-square functions. The chi-square statistic employed is ##\chi^2 = \sum_{k,l=1}^N (y_k - ax_k^b) C^{-1}_{kl} (y_l - ax_l^b)##, where ##C## is the covariance matrix provided by experimenters. The discussion highlights the importance of understanding how correlations affect parameter estimation in statistical modeling.

PREREQUISITES
  • Understanding of chi-square statistics and its application in model fitting.
  • Familiarity with covariance matrices, particularly in the context of statistical data analysis.
  • Knowledge of power law functions and their fitting procedures.
  • Basic concepts of error propagation in statistical measurements.
NEXT STEPS
  • Study the implications of using full covariance matrices in model fitting.
  • Learn about the differences between diagonal and full covariance matrices in statistical analysis.
  • Explore advanced techniques for fitting power law models, including error analysis.
  • Investigate the role of correlations in parameter estimation and their effects on model accuracy.
USEFUL FOR

Statisticians, data analysts, and researchers involved in modeling data with power law distributions, particularly those interested in the effects of correlation on parameter estimation.

CAF123
Gold Member
Messages
2,918
Reaction score
87
Suppose I have a model composed of two parameters ##(a,b)## that I want to describe a set of data points with. In CASE A, I fit the model taking into consideration the correlations between the data points (that is, in the chi square formulation I use the full covariance matrix for the data) and in CASE B I only use the diagonal covariance matrix (that is, off diagonal elements are set to zero, so I neglect the correlations).

Is it a general feature that I may expect different results for the best fit parameters ##(a,b)## in each analysis?

Thank you.
 
Physics news on Phys.org
CAF123 said:
Is it a general feature that I may expect different results for the best fit parameters ##(a,b)## in each analysis?

You didn't say what function you are fitting to the data or whether a and b are the only quantities involved in the model. Are you fitting a linear model of the form a = Bc + D where B and D are constants?
 
Hi @Stephen Tashi
Stephen Tashi said:
You didn't say what function you are fitting to the data or whether a and b are the only quantities involved in the model. Are you fitting a linear model of the form a = Bc + D where B and D are constants?
It’s a power like form, ##f(x,a,b) = a x^b ## and the only two parameters to be fitted are a and b.
 
CAF123 said:
It’s a power like form, ##f(x,a,b) = a x^b ## and the only two parameters to be fitted are a and b.

Do you mean that your data has the format ##(x_i, y_i)## and the model is ##y = a x^b## ? What statistic are you using that has a chi-square distribution?
 
Stephen Tashi said:
Do you mean that your data has the format ##(x_i, y_i)## and the model is ##y = a x^b## ?
Yes
What statistic are you using that has a chi-square distribution?
I use ##\chi^2 = \sum_{k,l=1}^N (y_k - ax_k^b) C^{-1}_{kl} (y_l - ax_l^b)## where N is number of data points I’m fitting too and C is the data covariance matrix. In one case, I populate C including the correlations of the data points while in the other case I only use the diagonal elements.
 
I don't understand what you mean by the "data covariance matrix". Is ##C## a 2x2 matrix?

When I think of computing a covariance between two random variables X, Y , I think of having many pairs of observations of the form ##(x_i, y_i)## where ##x_i## is a realization of ##X## and ##y_i## is a realization of ##Y##. In such a situation, we usually assuming ##(X,Y)## has some joint probability density ##f(x,y)## and each datum ##(x_i,y_i)## is an independently realized sample from that joint density.

By contrast, when we discuss models of the form ##y = g(x)## we often consider the ##x## values to be chosen by an experimenter or by some circumstances that we don't wish to model by a probabiity distribution. The complete model is something like ##y_i = g(x_i) + e_i## where the term ##e_i## represents the realization of a random variable. In this model, ##x_i## is not a realization of random variable.
 
Stephen Tashi said:
I don't understand what you mean by the "data covariance matrix". Is ##C## a 2x2 matrix?
No, here C is a NxN matrix, where N is the number of data points I am fitting the model ansatz y = ax^b to. In my case, C is provided by the experimenters. I was just asking the question if the best fit parameters obtained depend on whether a) I use C or b) I use C with the off diagonal elements set to zero (neglecting correlations)
 
CAF123 said:
No, here C is a NxN matrix, where N is the number of data points I am fitting the model ansatz y = ax^b to. In my case, C is provided by the experimenters.

I don't understand the data or the model.

If the data has the format ##(x_i,y_i)## , ##i = 1,2,...N## then, for example, what does the entry ##C_{2,3}## represent? What two random variables are involved?
 
Stephen Tashi said:
I don't understand the data or the model.

If the data has the format ##(x_i,y_i)## , ##i = 1,2,...N## then, for example, what does the entry ##C_{2,3}## represent? What two random variables are involved?
It would give the covariance between data points in bins 2 and 3. The data points have errors assumed to follow a Gaussian distribution. It is in this sense they are to be regarded as random variables.
 
  • #10
CAF123 said:
It would give the covariance between data points in bins 2 and 3.

Bins? What bins? What does a datum like ##(x_2,y_2)## represent? Is ##x_2## one measurement? Or is it a mean value of many measurements?
The data points have errors assumed to follow a Gaussian distribution.

Are you saying that both the ##x## and ##y## values have associated errors?

Give a coherent statement of the probability model and the random variables involved. Explain how bins are defined.
 
  • #11
Is there a link to an example of the situation? - perhaps a textbook problem or example.
 
  • #12
Stephen Tashi said:
Bins? What bins? What does a datum like ##(x_2,y_2)## represent? Is ##x_2## one measurement? Or is it a mean value of many measurements?
Yes, the ##x_i## are really ##\langle x_i \rangle## - some mean value of the bin range.

Are you saying that both the ##x## and ##y## values have associated errors?
Only the y values. Actually, I don’t know much about the experimental input - I’m mainly taking the measurements and errors provided at face value.

The simple case is where one takes the covariance as being diagonal, neglecting correlations so that ##C_{kl} = \delta_{kl}\sigma_k^2##. Then the chi square is the usual ##\chi^2 = \sum_{k=1}^N (y_k - ax_k^b)^2/\sigma_k^2##.
 
  • #13
To answer the question in the original post:

CAF123 said:
Is it a general feature that I may expect different results for the best fit parameters ##(a,b)## in each analysis?

Yes, since your results comes from finding the values of ##(a,b)## that minimize two different functions.

Whether the functions you are minimizing have ##\chi^2## distributions isn't clear to me.

For example:

CAF123 said:
The simple case is where one takes the covariance as being diagonal, neglecting correlations so that ##C_{kl} = \delta_{kl}\sigma_k^2##. Then the chi square is the usual ##\chi^2 = \sum_{k=1}^N (y_k - ax_k^b)^2/\sigma_k^2##.

If ##y_i## is the sample mean of ##K## measurements of the random variable ##Y_i## then the variance of ##y_i## is less than the variance of ##Y_i## (i.e. a single realization of ##Y_i##) So what does ##\sigma_i^2## represent? The variance of ##Y_i## or the variance of the sample mean ##y_i##?

Are the ##y_i## computed from the same number of measurements in each bin?
 
  • #14
Yes, since your results comes from finding the values of ##(a,b)## that minimize two different functions.
Indeed, I guess I should have asked 'to what extent are the values different'. E.g typically when one plots data points on a graph, the error bars shown are the standard deviations (i.e square root of the diagonal elements (=variances) ). So these errors shown will be the same regardless of whether one puts the off diagonal elements in the covariance matrix to zero or not. In this sense, it seems to me that the best fit parameter estimates should not vary significantly if one uses the covariance matrix with correlations vs. the one with only diagonal elements (otherwise you would obtain two sets of estimates for (a,b) and therefore two different curves and by eye ball you could perhaps tell what is better)

If ##y_i## is the sample mean of ##K## measurements of the random variable ##Y_i## then the variance of ##y_i## is less than the variance of ##Y_i## (i.e. a single realization of ##Y_i##) So what does ##\sigma_i^2## represent? The variance of ##Y_i## or the variance of the sample mean ##y_i##?

Are the ##y_i## computed from the same number of measurements in each bin?
Actually, I am not terribly sure about the experimental input. I would say the ##\sigma_i^2## corresponds to the variance of the sample mean given the experimental physics background (data obtained from an experimenter with a limited sample size of measurements).

I have a table of ##\langle x_i \rangle## values and corresponding mean values for the ##y_i##. In addition, I am provided with the covariance matrix which tells me how the error assignment on a certain ##y_i## will affect the corresponding error on ##y_j##. This matrix together with the ##y_i## and ##\langle x_i \rangle## are used in the chi square formulation to determine best fit estimates for model parameters a and b.

Sorry for lack of preciseness but the background/input is anyway provided by experimental physics.
 
  • #15
CAF123 said:
Indeed, I guess I should have asked 'to what extent are the values different'.
I don't know. I don't know any theoretical results that say they will or won't be similar.

E.g typically when one plots data points on a graph, the error bars shown are the standard deviations (i.e square root of the diagonal elements (=variances) ). So these errors shown will be the same regardless of whether one puts the off diagonal elements in the covariance matrix to zero or not.
You could also represent the predicted value for ##COV(x_1, x_2)## on a graph. If https://stats.stackexchange.com/que...e-of-a-sample-covariance-for-normal-variables is correct, you could draw an error bar around it.

The usual way of presenting a fit graphically would not reveal whether the model predicted covariances well. This is a limitation of the usual way of presenting things, not a proof that the covariances don't matter. Do the people who furnished the data care about how well covariances are predicted?
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
13
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K