How can I calculate the error in a slope with variance in x and y values?

Salish99 · Dec 3, 2008

I am looking for methods to calculate the error in a slope.
the caveat is that my values themselves are averages with a STDEV.
E.g.
x
1+-1%
2+-1%
3+-1%
y
0.14+-0.01
0.27+-0.02
0.42+-0.02
(using http://www.cartage.org.lb/en/themes/sciences/chemistry/Miscellenous/Helpfile/Erroranalysis/MultiplicationDivision/MultiplicationDivision.htm as the calculation method for these errors)

This could be simplified by assuming the x-values have no deviation.

Now I can just plot the average slope of those three values, I can make a simple linear regression analysis, obtain the least square values as shown here
https://www.physicsforums.com/showthread.php?t=194616 and somewhat related here
https://www.physicsforums.com/showthread.php?t=173827

or use the Excel Linestatistics
(http://www.trentu.ca/academic/physics/linestdemo.html)
I obtain a value for the slope of my averaged slope, and a STDEV, based on the Least Square algorithm. But this does not take into account at all my initial STDEV.
Is there a more general algorithm that can take STDEVs in the initial (at least) y values, or both x and y values, and how would I calculate that?

Thank you.

Enuma_Elish · Dec 8, 2008

You seem to have a problem that involves a combination of:

1. grouped data, whereby you observe the group statistics (means and variances) but not individual observations for the dependent variable (y). In this case OLS is typically unbiased but inefficient. Since OLS is unbiased, if all you care about is the sign & the size of the coefficient and not its standard error, you can go with your coefficient estimates from OLS.

OTOH, if you care about the coefficient's standard error and you happen to know the group sizes in each "bin" (i.e. for each of your "average" observations) you can use those as weights and compute a weighted least squares. If you don't have the group sizes, you can assume a uniform group size for each observation (say, g = 100) and simulate three sets of 100 individual observations with means = the y's in your data and the ranges = the +/- factors around each y.

2. measurement error: each of your x's is measured with error, and that can be a source of bias. If you know the "cause" of the variation in the x variable, you can include that as an additional variable; if you don't know or have the "cause" you can at least include the variation around each x as a second variable. You have too few observations and that's a problem, especially for including additional independent variables alongside of x. But if you decide to run the simulation in step 1 above, which will generate many "fictional" observations, this will no longer be a problem.

EnumaElish

Salish99 · Dec 8, 2008

Thanks enuma.

I will disregard the x-axis uncertainty, which derives from the measurement setup.
Just for future reference, how can I convert deviations in the x-values into independent y-(or z-)values?

I am very interested in the standard error of the resulting function.
One suggestion I got was to simply make 2 lines through the graph, one using all max values (all y- values +the error), and one using all min values (all y- values -the error), then taking the average of those two, and using a simple STDEV as as standard error. I somewhat disagree with that (and would be happy for comments).

Anyways, the linear OLS function in excel from LINEST gives the slope m out as \frac{\sum(x-\bar{x})(y-\bar{y})}{\sum(x-\bar{x})}. and the intercept with the y-axis b as b=y(av)-mx(av)
so far, so good.
So for a simple sample set of three points (x;y)=(1;1), (2;2.1), (3;2.9).
the slope m = 0.95, and b=0.1
Using Linest (y, x,,TRUE), I can create an array that gives me exactly those values, with a STDEV for m=0.086603 and for b=0.187
1) How are those values calculated?
2) How come the st.deviation for b is 80 larger than its original value?!?

Furthermore, if I now force the slope to go through 0, I use (y, x,0,TRUE), my slope becomes 0.9928 and b=o (obviously), with STDEV(m)=0.026245.
But if I use abovementioned calculation, and just add a fourth point (0;0), I get a slope of 0.98, and a b of 0.03, NOT what excel does by forcing it through zero.

Any help with how both slope-forced through zero, and all STDEVs are actually calculated?
Thanks.

Enuma_Elish · Dec 9, 2008

Two starting points are:
http://en.wikipedia.org/wiki/Linear_least_squares
http://mathworld.wolfram.com/LeastSquaresFitting.html

How can I calculate the error in a slope with variance in x and y values?

Thread 'Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense'

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers