Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Linear regression, error in both variables

  1. Apr 10, 2011 #1
    Hi y'all, wondering if you could help me with this. I have a data set with a linear relationship between the independent and dependent variables. Both the depended and independent variables have error due to measurement and this error is not constant.

    For example,

    {x1, x2, x3, x4, x5}
    {y1, y2, y3, y4, y5}

    {dx1, dx2, dx3, dx4, dx5}
    {dy1, dy2, dy3, dy4, dy5}

    where one data point would be (x1±dx1, y±dy1), and so on.

    Assuming the relationship is of the form,

    y = ax + b, I need both the best value for a, and its uncertainty, (a ± da).

    I've been scouring the internet for more information on total least squares methods, and generalized method of moments, etc. but I can't find something that works for the case where the error in x and y is just some arbitrary value, like in my case.

    helpful hints?
  2. jcsd
  3. Apr 11, 2011 #2

    Stephen Tashi

    User Avatar
    Science Advisor

    I think what you are trying to say is that the variance of the distribution of the errors is not constant with respect to X and Y.

    You must define what you mean by "best". I'll try to put some words in your mouth.
    We want the line y = ax + b that minimizes the expected error between data points and the line, when we average these errors over the whole line between X = (some minimum value of interest) and X = (some maximum value of interest), giving all those parts of the line equal weight in this averaging. The error between a data point (x_i,y_i) and the line will be measured by the perpendicular distance (x_i, y_i) and the the line.

    Let's try to define what you mean by "something that works". Do you mean a computer program that could (by trial and error if necessary) estimate the line? Or do you require some symbolic formula that you can use in a math paper?

    I assume you are talking about the variances of the errors at various values of (x,y).
    What exactly do you know about this? For example, if we have a data point (10.0, 50.2), do you have a lot data with similar values, so that we can estimate the variance in X and Y around the value (10.0,50.2)? Or do you only have data with widely separated X and Y values and are basing your assertion that the variances in the errors change with X and Y because of the overall scattered appearance of the data?
  4. Apr 11, 2011 #3


    User Avatar
    Homework Helper

    Last edited: Apr 11, 2011
  5. Jun 2, 2011 #4
    I was hoping to find help on the same topic! Any ideas?

    yes and yes.

    thanks in advance
  6. Jun 3, 2011 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    Look at the Wikipedia article on Total Least Squares http://en.wikipedia.org/wiki/Total_least_squares. I've only scanned the article myself, but it looks like what you want. It has an example written in Octave, which is a free Matlab work-alike.
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Similar Discussions: Linear regression, error in both variables
  1. Linear regression (Replies: 7)