Linear regression, error in both variables

  1. Hi y'all, wondering if you could help me with this. I have a data set with a linear relationship between the independent and dependent variables. Both the depended and independent variables have error due to measurement and this error is not constant.

    For example,

    {x1, x2, x3, x4, x5}
    {y1, y2, y3, y4, y5}

    {dx1, dx2, dx3, dx4, dx5}
    {dy1, dy2, dy3, dy4, dy5}

    where one data point would be (x1±dx1, y±dy1), and so on.

    Assuming the relationship is of the form,

    y = ax + b, I need both the best value for a, and its uncertainty, (a ± da).

    I've been scouring the internet for more information on total least squares methods, and generalized method of moments, etc. but I can't find something that works for the case where the error in x and y is just some arbitrary value, like in my case.

    helpful hints?
     
  2. jcsd
  3. Stephen Tashi

    Stephen Tashi 4,330
    Science Advisor
    2014 Award

    I think what you are trying to say is that the variance of the distribution of the errors is not constant with respect to X and Y.

    You must define what you mean by "best". I'll try to put some words in your mouth.
    We want the line y = ax + b that minimizes the expected error between data points and the line, when we average these errors over the whole line between X = (some minimum value of interest) and X = (some maximum value of interest), giving all those parts of the line equal weight in this averaging. The error between a data point (x_i,y_i) and the line will be measured by the perpendicular distance (x_i, y_i) and the the line.

    Let's try to define what you mean by "something that works". Do you mean a computer program that could (by trial and error if necessary) estimate the line? Or do you require some symbolic formula that you can use in a math paper?

    I assume you are talking about the variances of the errors at various values of (x,y).
    What exactly do you know about this? For example, if we have a data point (10.0, 50.2), do you have a lot data with similar values, so that we can estimate the variance in X and Y around the value (10.0,50.2)? Or do you only have data with widely separated X and Y values and are basing your assertion that the variances in the errors change with X and Y because of the overall scattered appearance of the data?
     
  4. hotvette

    hotvette 931
    Homework Helper

    Last edited: Apr 11, 2011
  5. I was hoping to find help on the same topic! Any ideas?

    yes and yes.

    thanks in advance
     
  6. Stephen Tashi

    Stephen Tashi 4,330
    Science Advisor
    2014 Award

    Look at the Wikipedia article on Total Least Squares http://en.wikipedia.org/wiki/Total_least_squares. I've only scanned the article myself, but it looks like what you want. It has an example written in Octave, which is a free Matlab work-alike.
     
Know someone interested in this topic? Share a link to this question via email, Google+, Twitter, or Facebook

Have something to add?