How can I calculate the error in a slope with variance in x and y values?

  • Context: Graduate 
  • Thread starter Thread starter Salish99
  • Start date Start date
  • Tags Tags
    Error Slope Variance
Click For Summary

Discussion Overview

The discussion revolves around methods for calculating the error in a slope derived from averaged x and y values that include standard deviations. Participants explore the implications of measurement errors, the use of ordinary least squares (OLS) regression, and alternative approaches to account for uncertainties in both x and y values.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant seeks methods to calculate the error in a slope when using averaged values with standard deviations, questioning the adequacy of standard OLS regression.
  • Another participant suggests that OLS is unbiased but may be inefficient for grouped data, proposing the use of weighted least squares if group sizes are known.
  • Concerns are raised about measurement error in x values, with a suggestion to simulate additional observations to mitigate issues arising from limited data.
  • A participant expresses interest in converting deviations in x-values into independent y-values and questions the validity of a proposed method for estimating standard error using extreme values.
  • Further inquiries are made about the calculations of slope and intercept standard deviations in Excel's LINEST function, including confusion over the relationship between the standard deviation of the intercept and its value.
  • Discussion includes a comparison of results from forcing the slope through zero versus adding a point at the origin, highlighting discrepancies in the slope and intercept values obtained.

Areas of Agreement / Disagreement

Participants express differing views on how to handle uncertainties in x and y values, with no consensus reached on the best approach for calculating the error in the slope. The discussion remains unresolved regarding the most appropriate methods and calculations.

Contextual Notes

Limitations include the assumption of uniform group sizes for simulations, the potential bias introduced by measurement errors in x values, and the lack of individual observations for y values, which complicates the analysis.

Salish99
Messages
28
Reaction score
0
I am looking for methods to calculate the error in a slope.
the caveat is that my values themselves are averages with a STDEV.
E.g.
x
1+-1%
2+-1%
3+-1%
y
0.14+-0.01
0.27+-0.02
0.42+-0.02
(using http://www.cartage.org.lb/en/themes/sciences/chemistry/Miscellenous/Helpfile/Erroranalysis/MultiplicationDivision/MultiplicationDivision.htm as the calculation method for these errors)

This could be simplified by assuming the x-values have no deviation.

Now I can just plot the average slope of those three values, I can make a simple linear regression analysis, obtain the least square values as shown here
https://www.physicsforums.com/showthread.php?t=194616 and somewhat related here
https://www.physicsforums.com/showthread.php?t=173827

or use the Excel Linestatistics
(http://www.trentu.ca/academic/physics/linestdemo.html)
I obtain a value for the slope of my averaged slope, and a STDEV, based on the Least Square algorithm. But this does not take into account at all my initial STDEV.
Is there a more general algorithm that can take STDEVs in the initial (at least) y values, or both x and y values, and how would I calculate that?

Thank you.
 
Last edited by a moderator:
Physics news on Phys.org
You seem to have a problem that involves a combination of:

1. grouped data, whereby you observe the group statistics (means and variances) but not individual observations for the dependent variable (y). In this case OLS is typically unbiased but inefficient. Since OLS is unbiased, if all you care about is the sign & the size of the coefficient and not its standard error, you can go with your coefficient estimates from OLS.

OTOH, if you care about the coefficient's standard error and you happen to know the group sizes in each "bin" (i.e. for each of your "average" observations) you can use those as weights and compute a weighted least squares. If you don't have the group sizes, you can assume a uniform group size for each observation (say, g = 100) and simulate three sets of 100 individual observations with means = the y's in your data and the ranges = the +/- factors around each y.

2. measurement error: each of your x's is measured with error, and that can be a source of bias. If you know the "cause" of the variation in the x variable, you can include that as an additional variable; if you don't know or have the "cause" you can at least include the variation around each x as a second variable. You have too few observations and that's a problem, especially for including additional independent variables alongside of x. But if you decide to run the simulation in step 1 above, which will generate many "fictional" observations, this will no longer be a problem.

EnumaElish
 
Thanks enuma.

I will disregard the x-axis uncertainty, which derives from the measurement setup.
Just for future reference, how can I convert deviations in the x-values into independent y-(or z-)values?

I am very interested in the standard error of the resulting function.
One suggestion I got was to simply make 2 lines through the graph, one using all max values (all y- values +the error), and one using all min values (all y- values -the error), then taking the average of those two, and using a simple STDEV as as standard error. I somewhat disagree with that (and would be happy for comments).


Anyways, the linear OLS function in excel from LINEST gives the slope m out as [tex]\frac{\sum(x-\bar{x})(y-\bar{y})}{\sum(x-\bar{x})}[/tex]. and the intercept with the y-axis b as b=y(av)-mx(av)
so far, so good.
So for a simple sample set of three points (x;y)=(1;1), (2;2.1), (3;2.9).
the slope m = 0.95, and b=0.1
Using Linest (y, x,,TRUE), I can create an array that gives me exactly those values, with a STDEV for m=0.086603 and for b=0.187
1) How are those values calculated?
2) How come the st.deviation for b is 80 larger than its original value?!?

Furthermore, if I now force the slope to go through 0, I use (y, x,0,TRUE), my slope becomes 0.9928 and b=o (obviously), with STDEV(m)=0.026245.
But if I use abovementioned calculation, and just add a fourth point (0;0), I get a slope of 0.98, and a b of 0.03, NOT what excel does by forcing it through zero.

Any help with how both slope-forced through zero, and all STDEVs are actually calculated?
Thanks.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
36K
  • · Replies 7 ·
Replies
7
Views
3K
Replies
2
Views
3K
  • · Replies 10 ·
Replies
10
Views
8K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 8 ·
Replies
8
Views
14K
  • · Replies 2 ·
Replies
2
Views
2K