Conf.intervals for fitted parameters: divide by sqrt(n)?

Click For Summary

Discussion Overview

The discussion revolves around the calculation of confidence intervals for parameters fitted to a model, specifically questioning whether the standard errors of these parameters should be divided by the square root of the sample size (n). Participants explore the implications of this division in the context of statistical modeling and parameter estimation.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant suggests that when calculating confidence intervals for fitted parameters, one should divide the standard errors by √n, similar to how it is done for means, to account for the increase in data size.
  • Another participant argues that since the values provided are already standard errors, dividing them by √n would not be appropriate.
  • A different participant explains that the variance of a parameter estimate decreases with increasing n, implying that the standard errors should not be divided by √n.
  • Concerns are raised about the validity of using the diagonal elements of the covariance matrix as standard errors for arbitrary nonlinear models, questioning the generalizability of this approach.
  • One participant expresses interest in comparing statistical errors from software with experimental values obtained through bootstrapping or jackknife methods.
  • A later reply discusses the dependency of parameter estimates on specific values and how fitting a model with fixed parameters can yield different standard errors that do not depend on other parameters.
  • Another participant emphasizes the need for careful consideration of the distribution of parameter estimates when determining confidence intervals, noting that biases and non-normal distributions can affect the validity of confidence intervals.

Areas of Agreement / Disagreement

Participants do not reach a consensus on whether to divide the standard errors by √n. There are competing views regarding the appropriateness of this division and the interpretation of standard errors in the context of nonlinear models.

Contextual Notes

Limitations include the potential dependence of statistical errors on specific parameter values and the unresolved nature of how confidence intervals should be calculated for different types of models.

Jonas Hall
Messages
4
Reaction score
1
If you fit a parametrized model (i.e. y = a log(x + b) + c) to some data points the output is typically the optimized parameters (i.e. a, b, c) and the covariance matrix. The squares of the diagonal elements of this matrix are the standard errors of the optimized parameters. (i.e. sea, seb, sec). Now to get a confidence interval of 95% for a parameter you typically multiply this error with 1.96 (assuming a normal distribution)(i.e. a ± 1.96 sea). At least this is what I have found so far. But I wonder if this is the whole truth. Shouldn't you also divide by √n the way you do when you create a confidence interval for a mean? It just seems to me that the more data you have, the better the estimates of the parameters should become. Also, i find that if i don't divide by √n, then the values seem rather large, sometimes falling om the wrong side of 0.

Or... does the covariance matrix values grow smaller with increasing n and this is the reason that you don't divide by √n and the values are supposed to be quite large?

Grateful if someone could make this clear to me. I have never studied statistics "properly" but dabble in mathematical models and teach math in upper secondary school.
 
Physics news on Phys.org
Jonas Hall said:
Shouldn't you also divide by √n the way you do when you create a confidence interval for a mean?
When you divide the standard deviation by the square root of n you obtain the standard error of the mean. You said that the values you have are already standard errors, so it wouldn’t make sense to divide them again.
 
  • Like
Likes   Reactions: FactChecker
The √n factor does come into it. For example, for the equation y = mx + c, if the error variance of the points is σ2, the variance of m is
σ2/n(<x2> - <x>2)
If your measured error variance is s2, the estimated variance of m is s2/n(<x2> - <x>2). This is the diagonal element of the covariance matrix. (I think you meant to say "The diagonal elements of this matrix are the squares of the standard errors of the optimized parameters. (i.e. sea, seb, sec).") As you suggest, the value decreases with increasing n.
If you did p separate experiments, with independent data sets, and determined a value of m for each, and calculated the mean value of m, the standard error of this mean value would be the standard error of m divided by √p.
Another point is that though the standard errors of m and c might be large, the values of m and c are usually strongly correlated, so you can't have any value of m in its confidence interval with any value of c in its confidence interval. This limits the variability of the regression line more than might at first appear.
 
  • Like
Likes   Reactions: Dale
Jonas Hall said:
teach math in upper secondary school
mjc123 said:
though the standard errors of m and c might be large, the values of m and c are usually strongly correlated, so you can't have any value of m in its confidence interval with any value of c in its confidence interval. This limits the variability of the regression line more than might at first appear.
e.g. for a straight line fit it's easy to check that the best line goes 'through the center of mass of the mesaured points' and can wiggle its slope due to the (hopefully) random errors. The abscissa error is a combination of this wiggling and shifting up and down a bit. The correlation disappears if the origin is at the 'center of mass'.
 
Jonas Hall said:
The squares of the diagonal elements of this matrix are the standard errors of the optimized parameters. (i.e. sea, seb, sec).

It would be interesting to know if that's actually true for an arbitrary nonlinear model. In fact, it would interesting to know how one can speak of statistical errors in the optimized parameters at all since fitting the model to the data gives one value of each parameter, not a sample of several values for each parameter.

I think most computer programs estimate the statistical distributions of the model parameters by using a linear approximation to the model and assuming the measured data values tell us the correct location about which to make the linear approximation. However a linear approximation to log(ax + b) + c will have a different coefficients than a linear approximation to sin(ax + b) + c. So how general is the claim that the squares of the diagonal elements of the covariance matrix are (good estimators of) the standard errors of the parameters?
 
Thank you all! You have convinced me that it is not appropriate to divide by sqrt(n). I find Stephen Tashis comment interesting. I also wonder if the statistical errors given by e.g. scipy etc compares to experimental values found when bootstrapping/jackknifeing (spelling?) your data. I guess I will have to run some experiments... I still find the standard errors quite large though, but I appreciate the comments on this by mjc123 and BvU. I envisage the following scenario: You take data and fit parameters according to your model. In reality though, you ar often only interested in one single parameter (such as b in y = a * b^x + c). So after you obtain your parameters (say a = 2, b = 3 and c = 4) you do a new fit according to the model y = 2 * b^x + 4. You will now presumably get b = 3 again but with a standard error that does not depend on any other parameters.

Would this work?
 
Yes it does. It depends on a being 2 and c being 4. It is not a function that depends on a and c as variables, but a value that is only true for particular values of a and c.
 
Jonas Hall said:
I also wonder if the statistical errors given by e.g. scipy etc compares to experimental values found when bootstrapping/jackknifeing (spelling?) your data.

The best experiments would be to compare all those methods to the correct answer.

Assume the correct model for the data has the form: ##Y = G(X,a,b,..)## where ##Y## and ##X## are random variables and ##a,b,...## are specific values of parameters A particular fitting algorithm produces estimates for ##a,b,...## that are functions of the sample data.

I'll denote this by:
##\hat{a} = f_1(X_s)##
##\hat{b} = f_2(X_s)##
...

##\hat{a},\hat{b},.. ## are random variables since they depend on the random values in a sample.

If we simulate a lot of samples ##X_s##, we get samples of ##\hat{a},\hat{b},...##. We can estimate the distributions of those random variables.

From that, we can estimate confidence intervals. However, we may have to do this by looking at the distribution of ##\hat{a}## in detail, not merely by looking at the parameters of that distribution. For example ##\hat{a}## may not be an unbiased estimator of ##a##. In that case, knowing the standard deviation of ##\hat{a}##, doesn't let us compute "confidence" by assuming ##a## is at the center of the interval. ( It's also possible that ##\hat{a}## may be unbiased estimator of ##a##, but not normally distributed.)

A limitation of such experiments is that the answer depends on particular choices of ##a,b,...## so the size of a confidence interval may vary with a big variation in the magnitudes of ##a,b,...##.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
Replies
13
Views
2K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
28
Views
4K