Estimating measurement error using error from linear regression

mesogen · Jul 24, 2013

Sorry if I'm in the wrong subforum.

This is a rather simple and straightforward question, I hope.

I'm doing a measurement that requires me to do a linear regression on data points to get a value of the slope. The slope is the value of the actual property that I am measuring.

Assuming no uncertainty in the data points that are being fit, can I simply use the standard deviation of the slope (output by fitting software) as the uncertainty in that measurement? Is this standard practice?

I ask because the standard deviation in the slope is quite small and results in an uncertainty that, to me, seems unreasonably small.

Stephen Tashi · Jul 24, 2013

mesogen said:

Assuming no uncertainty in the data points that are being fit

For the standard deviation of the slope (or other parameter of a curve fit) to make sense, you have to assume there is some uncertainty somewhere. If there were no uncertainty in the process, there wouldn't be any uncertainty in the slope! If you don't assume uncertainty in the points measured, you'll have to assume uncertainty in the selection of the points that were measured - or something like that.

How computer programs get a standard deviation for a parameter of a fit is an interesting question since a curve fit produces one specific curve and thus one value for the parameter, it isn't obvious how one can assign a standard deviation to a result that consists of just a single value.

I've given my own speculation about how curve fitting programs produce a standard deviation for parameters several times on the forum and no one volunteered to confirm or correct what I said. We can discuss that interpretation if you wish, but otherwise I'll spare us the effort of repeating it.

I think you'll get the best advice if you describe the particular problem that you're trying to solve in more detail. In particular, why do you wish to assume there is no uncertainty in the data?

mesogen · Jul 24, 2013

I think there is a standard deviation in the coefficients (slope and intercept) because there is a scatter to the data and there may be several values of the slope, very close to each other, that minimize the residuals. But I'm just guessing here.

I do have uncertainty in the data, but I wanted to address that separately.

Stephen Tashi · Jul 24, 2013

mesogen said:

I think there is a standard deviation in the coefficients (slope and intercept) because there is a scatter to the data and there may be several values of the slope, very close to each other, that minimize the residuals. But I'm just guessing here.

The slope of the least squares fit is unique, so that doesn't explain why the program produces a result for the uncertainty of the slope. If the "uncertainty" is to account for roundoff error in computer arithmetic, that would makes sense. However, most curve fitting programs don't provide such a result. What does the program's documentation say about what the output means?

I do have uncertainty in the data, but I wanted to address that separately.

What program is being used? My bet is that you can't interpret the program's output for the uncertainty in the slope unless you address the "errors" in the data. The typical curve fitting program assumes that the "error" in the slope is approximated by some linear function of the "errors" in curve fit. There is more than one point to use in estimating the standard deviation of the errors in the fit. This allows the program to solve (algebraically) for the standard deviation of the slope using standard deviations that have been estimated. This type of reasoning typically assumes that the "errors" (i.e. "residuals") are independently distributed. So if you were fitting a a straight line to relationship that was actually determined by Mother Nature to be a curve, this type of reasoning does not apply because the "errors" between the line and the curve are not random and independent.

mesogen · Jul 24, 2013

I'm using IGOR. (Also using Labview. Actually trying to determine which is best for this.)

The IGOR Manual

It says:

Estimates of Error

Igor automatically calculates the estimated error (standard deviation) for each of the coefficients in a curve fit. When you perform a curve fit, it creates a wave called W_sigma. Each point of W_sigma is set to the estimated error of the corresponding coefficients in the fit. The estimated errors are also indicated in the history area, along with the other results from the fit. If you don’t provide a weighting wave, the sigma values are estimated from the residuals. This implicitly assumes that the errors are normally distributed with zero mean and constant variance and that the fit function is a good description of the data.

The coefficients and their sigma values are estimates (usually remarkably good estimates) of what you would get if you performed the same fit an infinite number of times on the same underlying data (but with different noise each time) and then calculated the mean and standard deviation for each coefficient.

I think it just makes up some random noise in the data just based on the residuals.

Ok, so with that said, and let's go ahead and say that I fed the weights into the fitting algorithm, can I use this sigma in the slope as the error in the measurement?

To me it seems more like a measure of 'goodness of fit' and not necessarily an uncertainty in measurement.

Stephen Tashi · Jul 24, 2013

mesogen said:

The IGOR Manual

Of course, the manual doesn't give the mathematical details. (It's interesting that IGOR has decided to call an array a "wave". That must be a cute idea from the marketing department.)

I think it just makes up some random noise in the data just based on the residuals.

It's possible that IGOR uses the Monte-Carlo method, but my guess is that it makes a deterministic calculation. The documentation makes it sound like it uses the method I've proposed in other threads.

Ok, so with that said, and let's go ahead and say that I fed the weights into the fitting algorithm, can I use this sigma in the slope as the error in the measurement?

To me it seems more like a measure of 'goodness of fit' and not necessarily an uncertainty in measurement.

If you assume the data has no uncertainty in it then there isn't any uncertainty about the slope of the least squares fit. To repeat, the solution for the slope of a linear regression to a given set of data isn't a set of equally good answers. It just one single number.

To interpret the sigma of the slope when you use "weights", you must first figure out what the IGOR manual means by "weights". If a software product is going to call an array a "wave", who knows what it means by "weights".

mesogen · Jul 25, 2013

Stephen Tashi said:

Of course, the manual doesn't give the mathematical details. (It's interesting that IGOR has decided to call an array a "wave". That must be a cute idea from the marketing department.)

I think that goes back to the days when IGOR was primarily used for signals processing. It later expanded into a full blown data analysis program.

To interpret the sigma of the slope when you use "weights", you must first figure out what the IGOR manual means by "weights". If a software product is going to call an array a "wave", who knows what it means by "weights".

From what I understand, commonly "weight" means 1/variance (1/σ²). But IGOR has an input in the fitting dialog that asks for the wave of weights as standard deviations.

mesogen · Jul 25, 2013

So, I'm still curious if people use, as standard practice, the "error" in a fit as the "error" in a measurement when in my eyes the fitting error is just a measure of the goodness of fit and not necessarily the uncertainty in the measurement.

Stephen Tashi · Jul 25, 2013

mesogen said:

So, I'm still curious if people use, as standard practice, the "error" in a fit as the "error" in a measurement when in my eyes the fitting error is just a measure of the goodness of fit and not necessarily the uncertainty in the measurement.

You have to be clear about whether you are assuming the real life situation matches the assumptions behind the linear regression model. As the documentation indicates, the estimated standard deviation of the slope only makes sense if the cause of variation in it is due to independent identically distributed random variation in the data - and in ordinary linear regression this means variation in the y measurements of the (x,y) data, not variation in the x measurements.

If the cause of "errors" in the Y data is due to the fact that the phenomenon is not actually modeled correctly by a straight line then this violates the assumptions of the program.

As to what people do, that's a sociological question. If you are preparing a report for publication somewhere, you should look at other articles that were accepted and see what the editors of the publication approved.

Estimating measurement error using error from linear regression

1. What is the purpose of estimating measurement error using error from linear regression?

2. How is measurement error estimated using error from linear regression?

3. What are the limitations of estimating measurement error using error from linear regression?

4. Can estimating measurement error using error from linear regression be used for all types of data?

5. How can the results of estimating measurement error using error from linear regression be used in a study?

Similar threads

Hot Threads

Recent Insights