Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Estimating measurement error using error from linear regression

  1. Jul 24, 2013 #1
    Sorry if I'm in the wrong subforum.

    This is a rather simple and straightforward question, I hope.

    I'm doing a measurement that requires me to do a linear regression on data points to get a value of the slope. The slope is the value of the actual property that I am measuring.

    Assuming no uncertainty in the data points that are being fit, can I simply use the standard deviation of the slope (output by fitting software) as the uncertainty in that measurement? Is this standard practice?

    I ask because the standard deviation in the slope is quite small and results in an uncertainty that, to me, seems unreasonably small.
  2. jcsd
  3. Jul 24, 2013 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    For the standard deviation of the slope (or other parameter of a curve fit) to make sense, you have to assume there is some uncertainty somewhere. If there were no uncertainty in the process, there wouldn't be any uncertainty in the slope! If you don't assume uncertainty in the points measured, you'll have to assume uncertainty in the selection of the points that were measured - or something like that.

    How computer programs get a standard deviation for a parameter of a fit is an interesting question since a curve fit produces one specific curve and thus one value for the parameter, it isn't obvious how one can assign a standard deviation to a result that consists of just a single value.

    I've given my own speculation about how curve fitting programs produce a standard deviation for parameters several times on the forum and no one volunteered to confirm or correct what I said. We can discuss that interpretation if you wish, but otherwise I'll spare us the effort of repeating it.

    I think you'll get the best advice if you describe the particular problem that you're trying to solve in more detail. In particular, why do you wish to assume there is no uncertainty in the data?
  4. Jul 24, 2013 #3
    I think there is a standard deviation in the coefficients (slope and intercept) because there is a scatter to the data and there may be several values of the slope, very close to each other, that minimize the residuals. But I'm just guessing here.

    I do have uncertainty in the data, but I wanted to address that separately.
  5. Jul 24, 2013 #4

    Stephen Tashi

    User Avatar
    Science Advisor

    The slope of the least squares fit is unique, so that doesn't explain why the program produces a result for the uncertainty of the slope. If the "uncertainty" is to account for roundoff error in computer arithmetic, that would makes sense. However, most curve fitting programs don't provide such a result. What does the program's documentation say about what the output means?

    What program is being used? My bet is that you can't interpret the program's output for the uncertainty in the slope unless you address the "errors" in the data. The typical curve fitting program assumes that the "error" in the slope is approximated by some linear function of the "errors" in curve fit. There is more than one point to use in estimating the standard deviation of the errors in the fit. This allows the program to solve (algebraically) for the standard deviation of the slope using standard deviations that have been estimated. This type of reasoning typically assumes that the "errors" (i.e. "residuals") are independently distributed. So if you were fitting a a straight line to relationship that was actually determined by Mother Nature to be a curve, this type of reasoning does not apply because the "errors" between the line and the curve are not random and independent.
  6. Jul 24, 2013 #5
    I'm using IGOR. (Also using Labview. Actually trying to determine which is best for this.)

    The IGOR Manual

    It says:
    I think it just makes up some random noise in the data just based on the residuals.

    Ok, so with that said, and let's go ahead and say that I fed the weights into the fitting algorithm, can I use this sigma in the slope as the error in the measurement?

    To me it seems more like a measure of 'goodness of fit' and not necessarily an uncertainty in measurement.
  7. Jul 24, 2013 #6

    Stephen Tashi

    User Avatar
    Science Advisor

    Of course, the manual doesn't give the mathematical details. (It's interesting that IGOR has decided to call an array a "wave". That must be a cute idea from the marketing department.)

    It's possible that IGOR uses the Monte-Carlo method, but my guess is that it makes a deterministic calculation. The documentation makes it sound like it uses the method I've proposed in other threads.

    If you assume the data has no uncertainty in it then there isn't any uncertainty about the slope of the least squares fit. To repeat, the solution for the slope of a linear regression to a given set of data isn't a set of equally good answers. It just one single number.

    To interpret the sigma of the slope when you use "weights", you must first figure out what the IGOR manual means by "weights". If a software product is going to call an array a "wave", who knows what it means by "weights".
  8. Jul 25, 2013 #7
    I think that goes back to the days when IGOR was primarily used for signals processing. It later expanded into a full blown data analysis program.

    From what I understand, commonly "weight" means 1/variance (1/σ2). But IGOR has an input in the fitting dialog that asks for the wave of weights as standard deviations.
  9. Jul 25, 2013 #8
    So, I'm still curious if people use, as standard practice, the "error" in a fit as the "error" in a measurement when in my eyes the fitting error is just a measure of the goodness of fit and not necessarily the uncertainty in the measurement.
  10. Jul 25, 2013 #9

    Stephen Tashi

    User Avatar
    Science Advisor

    You have to be clear about whether you are assuming the real life situation matches the assumptions behind the linear regression model. As the documentation indicates, the estimated standard deviation of the slope only makes sense if the cause of variation in it is due to independent identically distributed random variation in the data - and in ordinary linear regression this means variation in the y measurements of the (x,y) data, not variation in the x measurements.

    If the cause of "errors" in the Y data is due to the fact that the phenomenon is not actually modelled correctly by a straight line then this violates the assumptions of the program.

    As to what people do, that's a sociological question. If you are preparing a report for publication somewhere, you should look at other articles that were accepted and see what the editors of the publication approved.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook