How can I calculate the error in a slope with variance in x and y values?

Click For Summary
To calculate the error in a slope when dealing with averages and standard deviations of x and y values, one can use Ordinary Least Squares (OLS) for unbiased slope estimates, but this may not account for standard errors effectively. If group sizes for each observation are known, weighted least squares can provide a more accurate standard error. Measurement error in x-values can introduce bias, and simulating individual observations based on averages and deviations can help in addressing this issue. A method involving plotting maximum and minimum y-values to estimate standard error is suggested, although its validity is debated. Understanding how Excel's LINEST function calculates slope and standard deviation, especially when forcing the slope through zero, is crucial for accurate analysis.
Salish99
Messages
28
Reaction score
0
I am looking for methods to calculate the error in a slope.
the caveat is that my values themselves are averages with a STDEV.
E.g.
x
1+-1%
2+-1%
3+-1%
y
0.14+-0.01
0.27+-0.02
0.42+-0.02
(using http://www.cartage.org.lb/en/themes/sciences/chemistry/Miscellenous/Helpfile/Erroranalysis/MultiplicationDivision/MultiplicationDivision.htm as the calculation method for these errors)

This could be simplified by assuming the x-values have no deviation.

Now I can just plot the average slope of those three values, I can make a simple linear regression analysis, obtain the least square values as shown here
https://www.physicsforums.com/showthread.php?t=194616 and somewhat related here
https://www.physicsforums.com/showthread.php?t=173827

or use the Excel Linestatistics
(http://www.trentu.ca/academic/physics/linestdemo.html)
I obtain a value for the slope of my averaged slope, and a STDEV, based on the Least Square algorithm. But this does not take into account at all my initial STDEV.
Is there a more general algorithm that can take STDEVs in the initial (at least) y values, or both x and y values, and how would I calculate that?

Thank you.
 
Last edited by a moderator:
Physics news on Phys.org
You seem to have a problem that involves a combination of:

1. grouped data, whereby you observe the group statistics (means and variances) but not individual observations for the dependent variable (y). In this case OLS is typically unbiased but inefficient. Since OLS is unbiased, if all you care about is the sign & the size of the coefficient and not its standard error, you can go with your coefficient estimates from OLS.

OTOH, if you care about the coefficient's standard error and you happen to know the group sizes in each "bin" (i.e. for each of your "average" observations) you can use those as weights and compute a weighted least squares. If you don't have the group sizes, you can assume a uniform group size for each observation (say, g = 100) and simulate three sets of 100 individual observations with means = the y's in your data and the ranges = the +/- factors around each y.

2. measurement error: each of your x's is measured with error, and that can be a source of bias. If you know the "cause" of the variation in the x variable, you can include that as an additional variable; if you don't know or have the "cause" you can at least include the variation around each x as a second variable. You have too few observations and that's a problem, especially for including additional independent variables alongside of x. But if you decide to run the simulation in step 1 above, which will generate many "fictional" observations, this will no longer be a problem.

EnumaElish
 
Thanks enuma.

I will disregard the x-axis uncertainty, which derives from the measurement setup.
Just for future reference, how can I convert deviations in the x-values into independent y-(or z-)values?

I am very interested in the standard error of the resulting function.
One suggestion I got was to simply make 2 lines through the graph, one using all max values (all y- values +the error), and one using all min values (all y- values -the error), then taking the average of those two, and using a simple STDEV as as standard error. I somewhat disagree with that (and would be happy for comments).


Anyways, the linear OLS function in excel from LINEST gives the slope m out as \frac{\sum(x-\bar{x})(y-\bar{y})}{\sum(x-\bar{x})}. and the intercept with the y-axis b as b=y(av)-mx(av)
so far, so good.
So for a simple sample set of three points (x;y)=(1;1), (2;2.1), (3;2.9).
the slope m = 0.95, and b=0.1
Using Linest (y, x,,TRUE), I can create an array that gives me exactly those values, with a STDEV for m=0.086603 and for b=0.187
1) How are those values calculated?
2) How come the st.deviation for b is 80 larger than its original value?!?

Furthermore, if I now force the slope to go through 0, I use (y, x,0,TRUE), my slope becomes 0.9928 and b=o (obviously), with STDEV(m)=0.026245.
But if I use abovementioned calculation, and just add a fourth point (0;0), I get a slope of 0.98, and a b of 0.03, NOT what excel does by forcing it through zero.

Any help with how both slope-forced through zero, and all STDEVs are actually calculated?
Thanks.
 
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 19 ·
Replies
19
Views
7K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 37 ·
2
Replies
37
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
7
Views
8K