Linear regression, error in both variables

In summary, the conversation is about finding the best value for the slope of a linear relationship between two variables, taking into consideration the errors in both variables. The person is looking for a method that can handle varying errors in both variables and is seeking help and suggestions for finding the best solution, either through a computer program or a mathematical formula. The idea of using weighted and total least squares is mentioned as a potential solution.
  • #1
fhqwgads2005
23
0
Hi y'all, wondering if you could help me with this. I have a data set with a linear relationship between the independent and dependent variables. Both the depended and independent variables have error due to measurement and this error is not constant.

For example,

{x1, x2, x3, x4, x5}
{y1, y2, y3, y4, y5}

{dx1, dx2, dx3, dx4, dx5}
{dy1, dy2, dy3, dy4, dy5}

where one data point would be (x1±dx1, y±dy1), and so on.

Assuming the relationship is of the form,

y = ax + b, I need both the best value for a, and its uncertainty, (a ± da).

I've been scouring the internet for more information on total least squares methods, and generalized method of moments, etc. but I can't find something that works for the case where the error in x and y is just some arbitrary value, like in my case.

helpful hints?
 
Physics news on Phys.org
  • #2
fhqwgads2005 said:
this error is not constant.

I think what you are trying to say is that the variance of the distribution of the errors is not constant with respect to X and Y.

y = ax + b, I need both the best value for a, and its uncertainty, (a ± da).

You must define what you mean by "best". I'll try to put some words in your mouth.
We want the line y = ax + b that minimizes the expected error between data points and the line, when we average these errors over the whole line between X = (some minimum value of interest) and X = (some maximum value of interest), giving all those parts of the line equal weight in this averaging. The error between a data point (x_i,y_i) and the line will be measured by the perpendicular distance (x_i, y_i) and the the line.

I but I can't find something that works

Let's try to define what you mean by "something that works". Do you mean a computer program that could (by trial and error if necessary) estimate the line? Or do you require some symbolic formula that you can use in a math paper?

for the case where the error in x and y is just some arbitrary value, like in my case.

I assume you are talking about the variances of the errors at various values of (x,y).
What exactly do you know about this? For example, if we have a data point (10.0, 50.2), do you have a lot data with similar values, so that we can estimate the variance in X and Y around the value (10.0,50.2)? Or do you only have data with widely separated X and Y values and are basing your assertion that the variances in the errors change with X and Y because of the overall scattered appearance of the data?
 
  • #3
Last edited:
  • #4
I was hoping to find help on the same topic! Any ideas?

Stephen Tashi said:
We want the line y = ax + b that minimizes the expected error between data points and the line, when we average these errors over the whole line between X = (some minimum value of interest) and X = (some maximum value of interest), giving all those parts of the line equal weight in this averaging. The error between a data point (x_i,y_i) and the line will be measured by the perpendicular distance (x_i, y_i) and the the line.



Do you mean a computer program that could (by trial and error if necessary) estimate the line?

yes and yes.

thanks in advance
 
  • #5
Look at the Wikipedia article on Total Least Squares http://en.wikipedia.org/wiki/Total_least_squares. I've only scanned the article myself, but it looks like what you want. It has an example written in Octave, which is a free Matlab work-alike.
 

1. What is linear regression?

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the variables and uses that to make predictions about the dependent variable.

2. What is the error in both variables in linear regression?

The error in both variables, also known as the errors-in-variables problem, refers to the presence of measurement error in both the dependent and independent variables in a regression model. This can result in biased estimates and affect the accuracy of the predictions.

3. How is the error in both variables addressed in linear regression?

The error in both variables can be addressed by using methods such as instrumental variables, errors-in-variables regression, or using more accurate measurement techniques. These methods aim to minimize the bias caused by the measurement error in the variables.

4. What are some limitations of linear regression with error in both variables?

Some limitations of linear regression with error in both variables include the assumption of a linear relationship between the variables, the difficulty in accurately measuring the error in the variables, and the potential for biased estimates if the error is not properly addressed.

5. When is linear regression with error in both variables most useful?

Linear regression with error in both variables is most useful when there is a need to study the relationship between two or more variables and the presence of measurement error cannot be avoided. It is commonly used in fields such as economics, social sciences, and engineering.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
424
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
25
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
885
Back
Top