Undergrad Determining the covariance of two parameters given their explicit relationship

CAF123 · Mar 24, 2020

I am wondering if it is possible to determine the covariance, ##\text{Cov}(a,b)##, of two fitted parameters given I know their explicit relationship ##a=a(b)##?

I would like to construct the covariance matrix in the space of the parameters ##\left\{a,b\right\}##. Using the relationship ##a=a(b)##, I can generate a table of, say, ##n## values for ##a## and ##b## and determine the covariance between ##a## and ##b## using $$\text{Cov}(a,b) = \frac{1}{n} \sum_{i=1}^n (a_i - \bar{a} ) ( b_i - \bar{b} )$$

It is not clear, however, how these ##n## values should be chosen so as to give the correct value of ##\text{Cov}(a,b)##.

Another way to determine the standard deviation of, say, the best fitted ##a## is to plot the chi square vs. a and determine the a that gives ##\Delta \chi^2 = 2.3## (1 sigma deviation for a 2 parameter fit). Somehow this should agree with the result I get by taking the square root of $$\text{Cov}(a,a) = \frac{1}{n} \sum_{i=1}^n (a_i - \bar{a} )^2.$$

But, again, this depends on ##n##.

So, given the relationship ##a=a(b)##, how may I determine the covariance of these two parameters unambiguously?

FactChecker · Mar 24, 2020

If you know the relationship ##a = a(b)##, between ##a## and ##b##, then the covariance is the slope of ##a(b)## at the point that you consider the mean of ##b##.

Stephen Tashi · Mar 24, 2020

CAF123 said:

I am wondering if it is possible to determine the covariance, ##\text{Cov}(a,b)##, of two fitted parameters given I know their explicit relationship ##a=a(b)##?

Why is the phrase "fitted parameters" used?

One guess is that that you have data of the form ##(x,y)## where##x## and ##y## are random variables. Using some critera (e.g. least squares, total least squares etc.) you fit a function of the form ##y= f(x,a,b)## to the data. This fit produces a single value for the parameter ##a## and single value for the parameter ##b##. Having only a single value of ##a## and a single value of ##b##, we can't compute a (nonzero) sample variance and sample covariance for ##a## and ##b## with the usual estimating formulas. So estimating such things ( as many curve fitting programs purport to do) requires some sophisticated assumptions and techniques.

BvU · Mar 24, 2020

Moreover, why do a two-parameter fit when you have only one parameter ?

CAF123 · Mar 24, 2020

Thanks all.

FactChecker said:

If you know the relationship ##a = a(b)##, between ##a## and ##b##, then the covariance is the slope of ##a(b)## at the point that you consider the mean of ##b##.

What would this correspond too? In my case, the relationship is a straight line so the slope is the same everywhere. If I invert the relationship I get another straight line but this has different slope. So this seems to imply it cannot correspond to ##\text{Cov}(a,b)## as this is symmetric.

Stephen Tashi said:

Why is the phrase "fitted parameters" used?

I had obtained the best fit values in the manner you describe below, was really just added for a bit of context.

One guess is that that you have data of the form ##(x,y)## where##x## and ##y## are random variables. Using some critera (e.g. least squares, total least squares etc.) you fit a function of the form ##y= f(x,a,b)## to the data. This fit produces a single value for the parameter ##a## and single value for the parameter ##b##. Having only a single value of ##a## and a single value of ##b##, we can't compute a (nonzero) sample variance and sample covariance for ##a## and ##b## with the usual estimating formulas. So estimating such things ( as many curve fitting programs purport to do) requires some sophisticated assumptions and techniques.

May you elaborate further on what I need?

BvU said:

Moreover, why do a two-parameter fit when you have only one parameter ?

Essentially for a given b, I found the best fit a. I did this for 5 values of b. Using these 5 tuples of (a,b) I then constructed the relationship between them and found it to be a straight line. But I think it is then incorrect to compute the covariance matrix elements using these 5 values as the values of b I chose are basically arbitrary.

FactChecker · Mar 24, 2020

CAF123 said:

Thanks all.

What would this correspond too? In my case, the relationship is a straight line so the slope is the same everywhere. If I invert the relationship I get another straight line but this has different slope. So this seems to imply it cannot correspond to ##\text{Cov}(a,b)## as this is symmetric.

Good point! I will have to re-examine my intuition on this point.

CAF123 · Mar 24, 2020

I wonder, is it possible to determine the covariance matrix through a bootstrapping?

Stephen Tashi · Mar 25, 2020

CAF123 said:

I am wondering if it is possible to determine the covariance, ##\text{Cov}(a,b)##, of two fitted parameters given I know their explicit relationship ##a=a(b)##?

Essentially for a given b, I found the best fit a. I did this for 5 values of b.

If the fitting process does not define a unique value of ##b## as a function of the data, then ##b## is not a random variable. So it's not meaningful to talk about the covariance of ##b## with anything.

If you did a fitting process where the fitting algorithm produces a unique value of ##b## then ##b## is a function of the data. We consider data to be random variables, so ##b## is also a random variable since it a function of other random variables. Likewise since ##a## is a function of ##b## and ##b## is a function of random variables, ##a## is also a random variable. In that scenario it is meaningful to discuss the covariance of ##a## with ##b##. However, if ##b## is a value that you can pick arbitrarily, it is not a function of the data, so ##b## is not a random variable.

CAF123 · Mar 25, 2020

@Stephen Tashi Thanks for the comment. Basically if one wants to do a fit 'by hand', one can use an arbitrary value of one parameter to determine the best fitted value of the other by searching for a local minimum of chisq. Eventually, one may find the tuple ##(a^*, b^*)## corresponding to the global minimum of chisq. The search will be guided by intuition about the parameters and the form of the ansatz to be fitted.

In my case, the context being Physics, it is easy for me to find such a global minimum. I used five values of b and for each b I find a local minimum in the chisq. vs. a plot. Eventually I find a value of b and the corresponding a that gives the global minimum (see attached image). I wonder, is it possible to determine the covariance of a and b from this construction? I see articles on the net claiming the bootstrap can for example do this but that doesn't work with my intuition. Thanks in advance.

Stephen Tashi · Mar 27, 2020

CAF123 said:

Thanks for the comment. Basically if one wants to do a fit 'by hand', one can use an arbitrary value of one parameter to determine the best fitted value of the other by searching for a local minimum of chisq. Eventually, one may find the tuple ##(a^*, b^*)## corresponding to the global minimum of chisq. The search will be guided by intuition about the parameters and the form of the ansatz to be fitted.

If you consider the fitting process as an algorithm that produces a unique result that depends only on the data, then the final answer ##(a^*,b^*)## can be considered to be a pair of random variables. From what you have said, the different trial values ##(a,b)## leading to this result have no obvious relationship to the final answer, so there is no justification for using them to estimate ##Cov(a^*,b^*)##.

I see articles on the net claiming the bootstrap can for example do this but that doesn't work with my intuition.

Before we discuss this, are you familiar with the general concepts of estimation used in statistics? The definition of an estimator? The various concepts of a "good" estimator - unbiased, minimum variance, best least squares, maximum liklihood etc. ?

CAF123 · Mar 27, 2020

@Stephen Tashi

Thanks.

I have never studied a formal statistics course in say a math department but all the terms you mentioned are familiar from less abstract courses in statistical physics. I have seen 'Bootstrap' in other contexts but only came across the term in statistics when searching for ways to construct the covariance matrix between the fitted parameters.

Stephen Tashi · Mar 27, 2020

As I visualize the bootstrap method in your case, you would repeat the following procedure many times
1) Randomly select N samples from your data
2) Compute the ##(a^*, b^*)## values that best fit the data from the N samples

Then consider that collection of ##(a^*, b^*)## values as random samples of a pair of random variables and use the sample covariance from that collection of data as an estimator of the covariance between the two variables.

I don't know whether such an estimator is "good" in the various senses of "good". I think answering that question would depend on what probability model is assumed for the original data. What model is reasonable to assume?

CAF123 · Mar 30, 2020

@Stephen Tashi thank you, that makes sense. The original data is Gaussian distributed. Is there any guideline on the number and size of the data samples one should use?

Also, I guess more automated fitting programs would compute the covariance matrix through computing the curvature matrix ( = second derivative of the chi square wrt parameters and result evaluated at best fit values ). I know Mathematica has this as a useable function but I suppose this is what it is doing in the background.

Stephen Tashi · Apr 1, 2020

CAF123 said:

The original data is Gaussian distributed.

I don't understand the probability model for the data - or whatever the fitted function represents. If the the data was (scalar) Gaussian data, you could fit a two parameter gaussian to it and one parameter wouldn't be a function of the other. I also don't understand how minimizing chi-square enters the picture. Are you using the concept of minimizing chi-square to get a different fit than a "least squares" fit? What is the format of the data? ##(x_i)##? ##(x_i, y_i)##? ##(x_i, y_i, z_i)##?

Also, I guess more automated fitting programs would compute the covariance matrix through computing the curvature matrix ( = second derivative of the chi square wrt parameters and result evaluated at best fit values ). I know Mathematica has this as a useable function but I suppose this is what it is doing in the background.

I'm not sure whether "best fit values" refers to values of the data or values of the parameters. If it refers only to values of the parameters, I don't see how statistical properties of the data are included in Mathematica's calculation.

Things that can be "computed" from a specific data are not population parameters. They can be estimators of population parameters. So you should distinguish between "sample mean" vs "mean" and "covariance matrix" versus "sample covariance matrix" etc.

For a given set of data, we have only one value for the best fitting parameters. So we don't have a "sample covariance matrix" for the parameters. We can use the sample statistics from the data to estimate the population covariance matrix for the parameters.

Technically any method of doing this qualifies as an estimator - it may not be unbiased, minimum variance, maximum liklihood etc , but it can still be called an esimator.

Guessing the formula for a good estimator is usually done by expressing (or imagining) the vector of best fitting parameters to be a function of the sample statistics of the data. ##\overrightarrow{\theta} = \overrightarrow{A}(\overrightarrow{S})## where ##\overrightarrow{S}## is a vector of sample statistics, such as the sample means, sample covariances etc. of one realization of the data ##\overrightarrow{x}##. (For example, in least squrare fitting of a line to (x,y) data, the slope and intercept of the line are each a function of the sample means, sample variances and sample covariance of the data.)

The sample statistics ##\overrightarrow{S}## are themselves are random variables since they depend on the random variable ##\overrightarrow{x}## When one random variable ##\overrightarrow{\theta}## is a function of another ##\overrightarrow{S}##, we can, in principle, compute the parameters and (population) statistics of ##\overrightarrow{\theta}## if we know the distribution of ##\overrightarrow{S}##. The distribution of ##\overrightarrow{S}## is, in principle, computable from the distribution of ##\overrightarrow{x}##. In many cases it can be computed by only knowing some (population) parameters of the distribution for ##\overrightarrow{x}##.

However, having only data, we don't know the distribution of ##\overrightarrow{x}##. In particular, we don't know the population parameters of that distribution. So the above process of deduction can't be carried out. That line of deduction does suggest a procedure for estimation. This would be:

1) Use the sample statistics from the particular data we have as estimators of the population parameters for the distribution of ##\overrightarrow{x}##.

2) Use the estimated distribution of ##\overrightarrow{x}## to estimate the distribution of ##\overrightarrow{S}##.

3) Use the estimated distribution of ##\overrightarrow{S}## to compute an estimated distribution of ##\overrightarrow{\theta}##.Linear approximations are often used in the above calculations. In a linear approximation we expand a function ##G(\overrightarrow{y})## about some point ##\overrightarrow{y_0}## using coefficients that depend on the partial derivatives of ##G##. In the above process, what points ##\overrightarrow{y_0}## are used?

In step 2), the only point we know is the vector of sample statistics ##\overrightarrow{S_0}## computed from the particular data we have.

The above procedure is plausible, but it is not a proof that such an estimation process is good.

The outcome of it will be results like ##\mu_{\theta_1} = 25.2##, ##\sigma_{\theta_1} = 4.3##. People tend to interpret such results as telling the probability that a parameter is in a specific numerical interval. This interpretation is unjustified. It is a Bayesian interpretation of calculations that were done without the proper Bayesian technique. (An actual Bayesian approach requires assuming a prior distribution for ##\overrightarrow{\theta}##).

Furthermore, the above estimation process computes things like ##\sigma_{\theta_1}## by assuming that the particular data we have is representative enough so that the "uncertainties" in the probability distribution for the data are accurately estimated from the uncertanties (e.g. sample standard deviations, sample covariances) in our particular sample. So trying to portray the above process a logical deduction gets into a circular argument: If we assume the data is representative (with probability 1) then we compute uncertainties in our estimates for the parameters of the distribution of the data, and this implies that the probability that our data is representative is less than 1.

However, human nature makes performing the above estimation procedure irresistible. In a particular situation, you can perform simulations to get a practical idea of how well it works.

Undergrad Determining the covariance of two parameters given their explicit relationship

Attachments

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers