# Linear regression, including Uncertainties

by dreamspy
Tags: including, linear, regression, uncertainties
 P: 37 My problem in short: I have a set of data, and I want to calculate the linear regression, and the uncertainty of the slope of the linear regression line, based on the uncertainties of the variables My problem in detail: My data is from an experiment and the uncertainties (errors) are from experimental imprecision. In my case I am comparing these two variables x= a reading on a pressure meter, y= a number on a counter. Every time the pressure meter went over a multiple of 100 (100, 200, 300, etc), I noted down the values of X and Y (pressure meter and counter) I estimate the error of my reading on the pressure meter to be 10, and my error of reading on the counter to be 1. So some points from my data could look like this: x1 = 100 ± 10 y1 = 4 ± 1 x2 = 200 ± 10 y2 = 7 ± 1 x3 = 300 ± 10 y3 = 13 ± 1 So I say that the error for every x value is ± 10 and the error for every y value is ± 1 My goal is to find the slope (or the formula) for the linear regression line through these data points, and through the point (0,0) (intercept = 0). This is easy part though. Most of all I'd like to find the uncertainty of the slope of the line, based on the uncertainties of the X and Y values. I have tried various programs, including excel, graphical analysis, prism and pro fit, without luck. Anyone know of a program to do this, or the mathematical method I could use? regards Frímann Kjerúlf
 P: 37 Hi Thanks for your detailed answer :) I took a look at linest in excel, and it seems to me that this method only calculates the error from the points, but does not take into account any uncertainty of the points. I also looked at the book you pointed me to. Seems like this is exactly the info I need, though the math seems a little hard, would take me some time to figure out. But from my first look then it seemed that these formulas only work for uncertainties on y, and give that x is always exact. I might be wrong though. But in my case I need to calculate the slope uncertainty from both the x and y uncertainties. I have an idea though. What if I use the first and last x value in my dataset and based on the uncertainty of x and y, I calculate the slope of the "worst line" through these two points. Then subtract that slope from the slope of the regression line through the dataset. And use that as my uncertainty? Something like this: Using excel I get a formula for the regression line which might be: y=10 * x From that I know that the slope for the best line (regression line) is 10. I estimate the uncertainty of x to be ± 2 And the uncertainty of y to be ± 10 So now I have: Δx = ± 2 uncertainty of x Δy = ± 10 uncertainty of y x1 = 33 first x in the data set x2 = 113 last x in the data set a1 = 10 slope of the linear regression line y = a1 * x y1 = 330 calculated values of the endpoints in the regression line y2 = 1130 from the equation y = a1 * x Now I give myself that the worst line through these two points, is the line that has the most slope, but is still within the uncertainties of the two points. See picture for better explanation: Now using the end points of the worst line ( X1 , Y1 ) and ( X2 , Y2 ) I calculate the slope of the worst line X1 = x1 + Δx = 35 Y1 = y1 - Δy = 320 X2 = x2 - Δx = 111 Y2 = y2 + Δy = 1140 So the slope for the "worst line" would be: a2 = ( Y2 - Y1 ) / ( X2 - X1 ) = 10.8 Now subtracting a1 from a2 to get the difference of the slopes: a2 - a1 = 0.8 Could I use that difference as the uncertainty of the slope, based on the uncertainty of the data set??? So the slope of the regression line would be: 10 ± 0.8 Would this work? Really hope I got this right :) regards Frímann Kjerúlf
P: 37

## Linear regression, including Uncertainties

I forgot to add that the correlation coefficient for the dataset is 0.999, and I would say that this method only works when the correlation coefficient is very close to 1
 P: 1,235 dreamspy, It is clear that with the small number of points in the data set (4 points, including (0,0)), looking at the various lines that can be drawn gives you a few possible slopes. Therefore, you can easily give a range for the estimated slope. In addition, I now understand that your point (x,y)=(0,0) has no error on it. Therefore you are looking for a regression without constant term: y= a*x (and not y=a*x+b). In this case, you only need to calculate the slope based on each of your three data points as well as the uncertainty on each of theses slopes: s1=y1/x1 standard deviation d1 s2=y2/x2 standard deviation d2 s3=y3/x3 standard deviation d3 Above, d1 is given by the relation d1² = (dy1²*x1²+dx1²*y1²)/x1^4 , if assuming uncorrelated Gauss distributions for x1 and y1. Similar formulas for d2 and d3. You can then calculate the most probable slope and the uncertainty on this most probable slope. In this most probable slope, each of the slope calculated from each given point will have a weight. This weight will be greater for the most precise evaluations. Therefore, point P3=(300,13) will be the most important. Probably the information provided by the points P1 and P2 will play a smaller role. You need to look in a statistics book how s1, d1, s2, d2, s3, and d3 can be combined to get the most probable estimate and its uncertainty: s and d. There could be a little be more to look at in statistics. Indeed, it may be possible that s1 and d1 are in contradiction with s2 and d2 for example. This should be not be the case with your data, but this can happen sometimes. Generally it is important to check if different data are compatible. Look in the "variance analysis" chapter of a statistical book. Michel Postscriptum: I got these slopes and uncertainties fom the three data points: slope uncertainty 0.040 0.0108 (point 1) 0.035 0.0053 (point 2) 0.043 0.0036 (point 3) You can see that indeed that point 3 provides the best data. You can also see that point 2 is nearly inconsistent with other data, depending on the probability tolerance. Indeed, random errors have little chance to explain such a large difference with point 3. To be checked.
 P: 2 This paper might be of interest to you about the uncertainty in slope after regression analysis has been performed. Michael J. Ruiz UNC-Asheville American Journal of Physics -- February 1991 -- Volume 59, Issue 2, pp. 184-185 Uncertainty in the linear regression slope Jack Higbie Department of Physics, University of Queensland, Brisbane 4072, Australia (Received 12 December 1989; accepted 28 January 1990) ©1991 American Association of Physics Teachers doi:10.1119/1.16607 PACS: 06.50.Mk, 02.60.Ed
 P: 37 Thanks for your answer. Is this paper available online? I did a quick library search here in Iceland and didn't find a copy. regards frímann
 P: 2 Hi, I would try first to see if you school library has the hard copy of the journal: American Journal of Physics. Then, check if your library has a subscription to it - many schools do. If that does not work, then go to the journal web site but you will have to pay a nominal fee to download it I believe. It is a very short paper. The key formula is this: the uncertainty sigma(slope) = |slope| tan[arccos(R)]/sqr(N-2) where R is the correlation coefficent R = cov(x,y)/sqr[var(x)var(y)] and N - 2 refers to the number of degrees of freedom in the data - where 2 have been lost to fit the slope and intercept. I am now studying this area of statistics - I am not an expert. I am still searching on the internet for an equivalent discussion and might find one. By the way, the paper refers to Mathews and Walker - Mathematical Physics - second edition for some related analsys. I hope this helps. Mike

 Related Discussions Computing & Technology 3 Advanced Physics Homework 2 Calculus & Beyond Homework 1 Precalculus Mathematics Homework 1