Conf.intervals for fitted parameters: divide by sqrt(n)?

Jonas Hall · Oct 31, 2018

If you fit a parametrized model (i.e. y = a log(x + b) + c) to some data points the output is typically the optimized parameters (i.e. a, b, c) and the covariance matrix. The squares of the diagonal elements of this matrix are the standard errors of the optimized parameters. (i.e. sea, seb, sec). Now to get a confidence interval of 95% for a parameter you typically multiply this error with 1.96 (assuming a normal distribution)(i.e. a ± 1.96 sea). At least this is what I have found so far. But I wonder if this is the whole truth. Shouldn't you also divide by √n the way you do when you create a confidence interval for a mean? It just seems to me that the more data you have, the better the estimates of the parameters should become. Also, i find that if i don't divide by √n, then the values seem rather large, sometimes falling om the wrong side of 0.

Or... does the covariance matrix values grow smaller with increasing n and this is the reason that you don't divide by √n and the values are supposed to be quite large?

Grateful if someone could make this clear to me. I have never studied statistics "properly" but dabble in mathematical models and teach math in upper secondary school.

Dale · Oct 31, 2018

Jonas Hall said:

Shouldn't you also divide by √n the way you do when you create a confidence interval for a mean?

When you divide the standard deviation by the square root of n you obtain the standard error of the mean. You said that the values you have are already standard errors, so it wouldn’t make sense to divide them again.

mjc123 · Nov 1, 2018

The √n factor does come into it. For example, for the equation y = mx + c, if the error variance of the points is σ², the variance of m is
σ²/n(<x²> - <x>²)
If your measured error variance is s², the estimated variance of m is s²/n(<x²> - <x>²). This is the diagonal element of the covariance matrix. (I think you meant to say "The diagonal elements of this matrix are the squares of the standard errors of the optimized parameters. (i.e. sea, seb, sec).") As you suggest, the value decreases with increasing n.
If you did p separate experiments, with independent data sets, and determined a value of m for each, and calculated the mean value of m, the standard error of this mean value would be the standard error of m divided by √p.
Another point is that though the standard errors of m and c might be large, the values of m and c are usually strongly correlated, so you can't have any value of m in its confidence interval with any value of c in its confidence interval. This limits the variability of the regression line more than might at first appear.

BvU · Nov 1, 2018

Jonas Hall said:

teach math in upper secondary school

mjc123 said:

though the standard errors of m and c might be large, the values of m and c are usually strongly correlated, so you can't have any value of m in its confidence interval with any value of c in its confidence interval. This limits the variability of the regression line more than might at first appear.

e.g. for a straight line fit it's easy to check that the best line goes 'through the center of mass of the mesaured points' and can wiggle its slope due to the (hopefully) random errors. The abscissa error is a combination of this wiggling and shifting up and down a bit. The correlation disappears if the origin is at the 'center of mass'.

Stephen Tashi · Nov 1, 2018

Jonas Hall said:

The squares of the diagonal elements of this matrix are the standard errors of the optimized parameters. (i.e. sea, seb, sec).

It would be interesting to know if that's actually true for an arbitrary nonlinear model. In fact, it would interesting to know how one can speak of statistical errors in the optimized parameters at all since fitting the model to the data gives one value of each parameter, not a sample of several values for each parameter.

I think most computer programs estimate the statistical distributions of the model parameters by using a linear approximation to the model and assuming the measured data values tell us the correct location about which to make the linear approximation. However a linear approximation to log(ax + b) + c will have a different coefficients than a linear approximation to sin(ax + b) + c. So how general is the claim that the squares of the diagonal elements of the covariance matrix are (good estimators of) the standard errors of the parameters?

Jonas Hall · Nov 5, 2018

Thank you all! You have convinced me that it is not appropriate to divide by sqrt(n). I find Stephen Tashis comment interesting. I also wonder if the statistical errors given by e.g. scipy etc compares to experimental values found when bootstrapping/jackknifeing (spelling?) your data. I guess I will have to run some experiments... I still find the standard errors quite large though, but I appreciate the comments on this by mjc123 and BvU. I envisage the following scenario: You take data and fit parameters according to your model. In reality though, you ar often only interested in one single parameter (such as b in y = a * b^x + c). So after you obtain your parameters (say a = 2, b = 3 and c = 4) you do a new fit according to the model y = 2 * b^x + 4. You will now presumably get b = 3 again but with a standard error that does not depend on any other parameters.

Would this work?

mjc123 · Nov 5, 2018

Yes it does. It depends on a being 2 and c being 4. It is not a function that depends on a and c as variables, but a value that is only true for particular values of a and c.

Stephen Tashi · Nov 5, 2018

Jonas Hall said:

I also wonder if the statistical errors given by e.g. scipy etc compares to experimental values found when bootstrapping/jackknifeing (spelling?) your data.

The best experiments would be to compare all those methods to the correct answer.

Assume the correct model for the data has the form: ##Y = G(X,a,b,..)## where ##Y## and ##X## are random variables and ##a,b,...## are specific values of parameters A particular fitting algorithm produces estimates for ##a,b,...## that are functions of the sample data.

I'll denote this by:
##\hat{a} = f_1(X_s)##
##\hat{b} = f_2(X_s)##
...

##\hat{a},\hat{b},.. ## are random variables since they depend on the random values in a sample.

If we simulate a lot of samples ##X_s##, we get samples of ##\hat{a},\hat{b},...##. We can estimate the distributions of those random variables.

From that, we can estimate confidence intervals. However, we may have to do this by looking at the distribution of ##\hat{a}## in detail, not merely by looking at the parameters of that distribution. For example ##\hat{a}## may not be an unbiased estimator of ##a##. In that case, knowing the standard deviation of ##\hat{a}##, doesn't let us compute "confidence" by assuming ##a## is at the center of the interval. ( It's also possible that ##\hat{a}## may be unbiased estimator of ##a##, but not normally distributed.)

A limitation of such experiments is that the answer depends on particular choices of ##a,b,...## so the size of a confidence interval may vary with a big variation in the magnitudes of ##a,b,...##.

Conf.intervals for fitted parameters: divide by sqrt(n)?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Undergrad My basic understanding of set theory

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers