Estimating measurement error using error from linear regression

In summary, the conversation discusses the use of standard deviation in the slope of a linear regression as a measure of uncertainty in a measurement. While some believe that the standard deviation is necessary to account for uncertainty in the data, others question its validity and suggest that it may just be a measure of the goodness of fit. The use of weighting in the fitting algorithm is also mentioned as a factor in determining the standard deviation. The program IGOR is referenced as an example, with its manual stating that the standard deviation is an estimate based on the residuals of the fit. However, the mathematical details are not provided.
  • #1
mesogen
25
0
Sorry if I'm in the wrong subforum.

This is a rather simple and straightforward question, I hope.

I'm doing a measurement that requires me to do a linear regression on data points to get a value of the slope. The slope is the value of the actual property that I am measuring.

Assuming no uncertainty in the data points that are being fit, can I simply use the standard deviation of the slope (output by fitting software) as the uncertainty in that measurement? Is this standard practice?

I ask because the standard deviation in the slope is quite small and results in an uncertainty that, to me, seems unreasonably small.
 
Physics news on Phys.org
  • #2
mesogen said:
Assuming no uncertainty in the data points that are being fit

For the standard deviation of the slope (or other parameter of a curve fit) to make sense, you have to assume there is some uncertainty somewhere. If there were no uncertainty in the process, there wouldn't be any uncertainty in the slope! If you don't assume uncertainty in the points measured, you'll have to assume uncertainty in the selection of the points that were measured - or something like that.

How computer programs get a standard deviation for a parameter of a fit is an interesting question since a curve fit produces one specific curve and thus one value for the parameter, it isn't obvious how one can assign a standard deviation to a result that consists of just a single value.

I've given my own speculation about how curve fitting programs produce a standard deviation for parameters several times on the forum and no one volunteered to confirm or correct what I said. We can discuss that interpretation if you wish, but otherwise I'll spare us the effort of repeating it.

I think you'll get the best advice if you describe the particular problem that you're trying to solve in more detail. In particular, why do you wish to assume there is no uncertainty in the data?
 
  • #3
I think there is a standard deviation in the coefficients (slope and intercept) because there is a scatter to the data and there may be several values of the slope, very close to each other, that minimize the residuals. But I'm just guessing here.

I do have uncertainty in the data, but I wanted to address that separately.
 
  • #4
mesogen said:
I think there is a standard deviation in the coefficients (slope and intercept) because there is a scatter to the data and there may be several values of the slope, very close to each other, that minimize the residuals. But I'm just guessing here.

The slope of the least squares fit is unique, so that doesn't explain why the program produces a result for the uncertainty of the slope. If the "uncertainty" is to account for roundoff error in computer arithmetic, that would makes sense. However, most curve fitting programs don't provide such a result. What does the program's documentation say about what the output means?

I do have uncertainty in the data, but I wanted to address that separately.

What program is being used? My bet is that you can't interpret the program's output for the uncertainty in the slope unless you address the "errors" in the data. The typical curve fitting program assumes that the "error" in the slope is approximated by some linear function of the "errors" in curve fit. There is more than one point to use in estimating the standard deviation of the errors in the fit. This allows the program to solve (algebraically) for the standard deviation of the slope using standard deviations that have been estimated. This type of reasoning typically assumes that the "errors" (i.e. "residuals") are independently distributed. So if you were fitting a a straight line to relationship that was actually determined by Mother Nature to be a curve, this type of reasoning does not apply because the "errors" between the line and the curve are not random and independent.
 
  • #5
I'm using IGOR. (Also using Labview. Actually trying to determine which is best for this.)

The IGOR Manual

It says:
Estimates of Error

Igor automatically calculates the estimated error (standard deviation) for each of the coefficients in a curve fit. When you perform a curve fit, it creates a wave called W_sigma. Each point of W_sigma is set to the estimated error of the corresponding coefficients in the fit. The estimated errors are also indicated in the history area, along with the other results from the fit. If you don’t provide a weighting wave, the sigma values are estimated from the residuals. This implicitly assumes that the errors are normally distributed with zero mean and constant variance and that the fit function is a good description of the data.

The coefficients and their sigma values are estimates (usually remarkably good estimates) of what you would get if you performed the same fit an infinite number of times on the same underlying data (but with different noise each time) and then calculated the mean and standard deviation for each coefficient.

I think it just makes up some random noise in the data just based on the residuals.

Ok, so with that said, and let's go ahead and say that I fed the weights into the fitting algorithm, can I use this sigma in the slope as the error in the measurement?

To me it seems more like a measure of 'goodness of fit' and not necessarily an uncertainty in measurement.
 
  • #6
mesogen said:

Of course, the manual doesn't give the mathematical details. (It's interesting that IGOR has decided to call an array a "wave". That must be a cute idea from the marketing department.)

I think it just makes up some random noise in the data just based on the residuals.

It's possible that IGOR uses the Monte-Carlo method, but my guess is that it makes a deterministic calculation. The documentation makes it sound like it uses the method I've proposed in other threads.

Ok, so with that said, and let's go ahead and say that I fed the weights into the fitting algorithm, can I use this sigma in the slope as the error in the measurement?

To me it seems more like a measure of 'goodness of fit' and not necessarily an uncertainty in measurement.

If you assume the data has no uncertainty in it then there isn't any uncertainty about the slope of the least squares fit. To repeat, the solution for the slope of a linear regression to a given set of data isn't a set of equally good answers. It just one single number.

To interpret the sigma of the slope when you use "weights", you must first figure out what the IGOR manual means by "weights". If a software product is going to call an array a "wave", who knows what it means by "weights".
 
  • #7
Stephen Tashi said:
Of course, the manual doesn't give the mathematical details. (It's interesting that IGOR has decided to call an array a "wave". That must be a cute idea from the marketing department.)

I think that goes back to the days when IGOR was primarily used for signals processing. It later expanded into a full blown data analysis program.

To interpret the sigma of the slope when you use "weights", you must first figure out what the IGOR manual means by "weights". If a software product is going to call an array a "wave", who knows what it means by "weights".

From what I understand, commonly "weight" means 1/variance (1/σ2). But IGOR has an input in the fitting dialog that asks for the wave of weights as standard deviations.
 
  • #8
So, I'm still curious if people use, as standard practice, the "error" in a fit as the "error" in a measurement when in my eyes the fitting error is just a measure of the goodness of fit and not necessarily the uncertainty in the measurement.
 
  • #9
mesogen said:
So, I'm still curious if people use, as standard practice, the "error" in a fit as the "error" in a measurement when in my eyes the fitting error is just a measure of the goodness of fit and not necessarily the uncertainty in the measurement.

You have to be clear about whether you are assuming the real life situation matches the assumptions behind the linear regression model. As the documentation indicates, the estimated standard deviation of the slope only makes sense if the cause of variation in it is due to independent identically distributed random variation in the data - and in ordinary linear regression this means variation in the y measurements of the (x,y) data, not variation in the x measurements.

If the cause of "errors" in the Y data is due to the fact that the phenomenon is not actually modeled correctly by a straight line then this violates the assumptions of the program.

As to what people do, that's a sociological question. If you are preparing a report for publication somewhere, you should look at other articles that were accepted and see what the editors of the publication approved.
 

1. What is the purpose of estimating measurement error using error from linear regression?

The purpose of estimating measurement error using error from linear regression is to assess the accuracy and precision of the measurements taken in a study. This allows researchers to understand the potential impact of measurement error on their results and make necessary adjustments to improve the validity of their findings.

2. How is measurement error estimated using error from linear regression?

Measurement error can be estimated using error from linear regression by comparing the observed values of a variable to the predicted values from a linear regression model. The difference between the two values represents the measurement error, which can then be used to calculate the accuracy and precision of the measurements.

3. What are the limitations of estimating measurement error using error from linear regression?

One limitation is that linear regression assumes a linear relationship between the variables, which may not always be the case. Additionally, this method only measures the error in the predicted values, not the actual error in the measurements themselves. It also assumes that all sources of error are normally distributed, which may not always be true.

4. Can estimating measurement error using error from linear regression be used for all types of data?

No, this method is most suitable for continuous data. It may not be appropriate for categorical or ordinal data, as the assumptions of linear regression may not hold for these types of variables.

5. How can the results of estimating measurement error using error from linear regression be used in a study?

The results of estimating measurement error can be used to determine the reliability and validity of the measurements in a study. This information can then be used to make adjustments or corrections to the data analysis, or to interpret the results with caution. It can also inform future research and improve the overall quality of the study.

Similar threads

  • Linear and Abstract Algebra
Replies
3
Views
1K
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
476
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
894
Back
Top