• Support PF! Buy your school textbooks, materials and every day products Here!

Error on regression line slope

  • #1
2
0
I'm currently trying to determine the error on the slope of a regression line and the y-intercept.

My y values are: My y error is: My x values are:
27.44535013 0.03928063 136
29.78207524 0.07836946 44
27.4482858 0.0385213 143
27.27481069 0.02117426 153


I'd like to code the solution and have attempted to do so with python. So far I have generated different sets of data by adding or subtracting the error on y to get all the possible regression lines within the errors and then determining the slope and y intercept, to get the max and min slope and y-intercept, and therefore the error. I'm sure if this is the correct method though, and when I apply it to a larger data set the number of regression lines I have to calculate is so large the code breaks. Is there a simpler solution, or an equation I'm missing that takes account of the y error in the error for the slope?
 

Answers and Replies

  • #2
BvU
Science Advisor
Homework Helper
2019 Award
13,007
3,005
Hi h, welcome to PF :)

Kirchner (Berkeley) gives a derivation and the expressions here

[edit] His eqn (16) looks terrible in my PDF reader (Adobe XI), so I render what I can deduct, since the expressions are needed for ##s_a##:$$e_i = Y_i - \hat Y_i$$$$SSE = \sum e_i^2$$$$MSE = s_{Y {\bf\cdot} X}^2={SSE\over n-2}=Var(Y)\;(1-r^2)\;{n-1\over n-2}$$$$RMSE = s_{Y {\bf\cdot} X}=\sqrt{SSE\over n-2}=S_Y\;\sqrt{(1-r^2)}\;\sqrt{n-1\over n-2}$$And here ##S_Y## is not the square root of his ##SS_Y##, but the square root of his ##SS_Y/3##. Very tricky.

As you can guess, I did some work here. You do yours too and we'll compare if you want. Friday at the earliest, I'm afraid.

PS ten digits is a bit much for this kind of scatter. They must be calculation results ? Of what ?

[edit2] from the last expression above you can see that in fact you don't need the ##\sum e_i##, SSE, MSE, since ##SS_Y## and ##r^2## are enough !
 
Last edited:
  • #3
2
0
Hey thanks for the reply.

Does this solution take into account the error on the y value? Should I be drawing a line of best fit that is weighted for the errors?
 
  • #4
Quantum Defect
Homework Helper
Gold Member
495
116
Hey thanks for the reply.

Does this solution take into account the error on the y value? Should I be drawing a line of best fit that is weighted for the errors?
Look in either:

Bevington: "Data Reduction and Error Analysis in the Physical Sciences" (most good colege libraries have this)
Press, et al. "Numerical Recipes in C"

You need to look up "weighted least squares" in these sources. The result is similar to what BvU has above.

You can also see:

http://en.wikipedia.org/wiki/Least_squares (Section 6 talks about weighted least squares)
http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd143.htm
http://elsa.berkeley.edu/eml/ra_reader/14-wls.pdf [Broken]
http://www.stat.ncsu.edu/people/bloomfield/courses/st430-514/slides/MandS-ch09-sec04-04.pdf [Broken]
 
Last edited by a moderator:
  • #5
BvU
Science Advisor
Homework Helper
2019 Award
13,007
3,005
You can give weights to the measurements: instead of e.g. ##\sum y_i## you do ##\sum {y_i\over \sigma_i^2}## etc. And instead of dividing by N you divide by ##\sum {1\over \sigma_i^2}##

In your case it doesn't make much difference (*): the point with deviating weight lies so far away from the other points that its leverage makes the line go through anyway.


PS how accurate are your ##x_i## ? The whole analysis is based on ##\sigma_x \ll \sigma_y##

[edit] (*) Have to withdraw that: uncertainties come out twice as high
 
Last edited:
  • #6
Ray Vickson
Science Advisor
Homework Helper
Dearly Missed
10,706
1,728
I'm currently trying to determine the error on the slope of a regression line and the y-intercept.

My y values are: My y error is: My x values are:
27.44535013 0.03928063 136
29.78207524 0.07836946 44
27.4482858 0.0385213 143
27.27481069 0.02117426 153


I'd like to code the solution and have attempted to do so with python. So far I have generated different sets of data by adding or subtracting the error on y to get all the possible regression lines within the errors and then determining the slope and y intercept, to get the max and min slope and y-intercept, and therefore the error. I'm sure if this is the correct method though, and when I apply it to a larger data set the number of regression lines I have to calculate is so large the code breaks. Is there a simpler solution, or an equation I'm missing that takes account of the y error in the error for the slope?
You are doing it the hard way. If the errors are normally distributed (with mean 0 and common---but unknown---variance), there are standard formulas for confidence intervals in the slope, intercept and predicted y(x) value. See, eg., http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis , especially the later section called "Confidence Intervals in Simple Linear Regression". This has all the needed formulas and works through the details on some examples.

If your errors are not normally-distributed you may need to resort to "resampling methods", such as bootstrapping, jacknifing, etc. See, eg.,
http://wise.cgu.edu/downloads/Introduction%20to%20Resampling%20Techniques%20110901.pdf [Broken]
for an introduction to the concepts.
 
Last edited by a moderator:
  • #7
BvU
Science Advisor
Homework Helper
2019 Award
13,007
3,005
You are doing it the hard way. If the errors are normally distributed (with mean 0 and common---but unknown---variance), there are standard formulas for confidence intervals in the slope, intercept and predicted y(x) value. See, eg., http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis , especially the later section called "Confidence Intervals in Simple Linear Regression". This has all the needed formulas and works through the details on some examples.

If your errors are not normally-distributed you may need to resort to "resampling methods", such as bootstrapping, jacknifing, etc. See, eg.,
http://wise.cgu.edu/downloads/Introduction%20to%20Resampling%20Techniques%20110901.pdf [Broken]
for an introduction to the concepts.
What poster wants is the simple expressions if the errors are normally distributed but not equal for all points. They exist and are alike Kirchners.
(Don't have time now - perhaps tomorrow)
Resampling and such are way overdone.
 
Last edited by a moderator:
  • #8
BvU
Science Advisor
Homework Helper
2019 Award
13,007
3,005
Hope we haven't lost h. But I promised something, so here goes:

Dear h,

Let
$$
\quad \overline Y = \sum {y_i\over \sigma_i^2} \Big / \sum {1\over \sigma_i^2}, \\
\quad \overline X = \sum {x_i\over \sigma_i^2} \Big / \sum {1\over \sigma_i^2},\\
\quad \overline {XY} = \sum {x_i y_i\over \sigma_i^2} \Big/ \sum {1\over \sigma_i^2}, \\
\quad \overline {Y^2} = \sum {y_i^2\over \sigma_i^2} \Big/ \sum {1\over \sigma_i^2} \\
\quad \overline {X^2} = \sum {x_i^2\over \sigma_i^2} \Big/ \sum {1\over \sigma_i^2}
$$
Then let
$$
\quad SS_X = \overline {X^2} - \overline {X}^2,\qquad SS_Y = \overline {X^2} - \overline {X}^2,\qquad SS_{XY} = \overline {XY^2} - \overline {X}\;\overline {Y}
$$
These are Kirchners (11)-(13) but divided by n, so from here on we can use his expressions as long as we have the same power in numerator and denominator.

For the record:
$$
\quad r^2 = {S_{XY}^2 \over SS_X\;SS_Y} \\
\quad b = r\; {S_Y\over S_X} \qquad \left ( \; = {S_{XY} \over S_X^2} \right )\\
\quad a = \overline Y - b \overline X
$$
And now come the all important ##\sigma##:
$$
\qquad\sigma_b^2 = {SS_Y/SS_X - b^2\over n-2}\\ \ \\
\qquad\sigma_a^2 = \sigma_b^2 \left ( {SS_X\over n} +\overline {X}^2 \right )
$$
The Bevington reference by QD (mine is from the seventies :) ) is really excellent: it has everything, even clearer and more extensive (at various levels), plus a fortran (!) listing that shouldn't be too difficult to re-engineer to python.

Again: if possible use physics sense: your y look like calculation results; systematic errors don't average out, so be sure you keep common factors separate. You can even analyze the results for this: if weighted and unweighted show quite different results, there might be something wrong with the error estimates.

Your results don't really need ten digit accuracy. And you have to ask if your sigma's are really distinct: the relative accuracy of a sigma based on averaging k measurements is around ##1/\sqrt k ##. The 0.02 differs considerably from the 0.08 -- there might be an experimental reason for that.

To top it all off, I add a few pictures from what I got using your data.
Red dot is center of gravity unweighted, green weighted. Unweighted result is identical to excel linear trend. Dashed lines are fit (middle one ) and idem +/- uncertainty in predicted ##y_i## (Kirchner (20)).

I'll be glad to share the numerical results (in the sense of: sharing. You show your working and then I'll do the same :) ). I am also interested in the context: what's x, y, how did the y and ##\sigma_i## come about ?

Oh, and: anyone with comments/corrections: very welcome!
 

Attachments

Last edited:

Related Threads for: Error on regression line slope

Replies
2
Views
1K
  • Last Post
Replies
3
Views
2K
  • Last Post
Replies
10
Views
4K
  • Last Post
Replies
3
Views
1K
  • Last Post
Replies
1
Views
868
  • Last Post
Replies
3
Views
5K
Replies
2
Views
873
  • Last Post
Replies
2
Views
4K
Top