Linear regression with asymmetric error bars

In summary, the conversation revolves around performing a linear regression on data with asymmetric x and y error bars, where the error bars have different magnitudes for each data point. The individual discussing the issue is trying to determine the correlation between the x and y variables and has been unable to find a satisfactory solution. They suggest using a piecewise Gaussian variable or the maximum likelihood method, but also mention a simpler approach of converting to log-space and performing a non-linear regression. This method only works if the error is symmetric in log-space, which is often the case for errors that are considered symmetrical in regular space.
  • #1
tmj143
4
0
I've been trying to figure out how to do a linear regression on data with asymmetric x and y error bars (different for each data point). Any help would be much appreciated.
 
Physics news on Phys.org
  • #2
:smile:That is the mean for x and y.
 
  • #3
xiaoB said:
:smile:That is the mean for x and y.

? I don't think you understand. There's some probability distribution which says that each data point can lie somewhere in between some x1 and x2 and some y1 and y2; these uncertainties are of different magnitudes for each data point, and the fact that they are different means that the data points need to be weighted differently in the regression calculation. But, because the error bars are asymmetric, I can't just do a straight weighted fit...
 
  • #4
It's probably best if you give more details and also explain why you are determined to do a linear regression instead of another type of estimation.
 
  • #5
What other details do you want? I want to see if there's a linear correlation between the x and y variables, and have data points that look like:
l​
l​
l​
------x-------------------
l​

Or something like that where x is the data point and the l's/-'s represent the error (but of different magnitudes for each point, like I said). I don't know how I can better describe this...
 
  • #6
tmj143 said:
I don't know how I can better describe this...

If you find a way, please post it. I'm too busy to conduct a detailed interrogation. If you really know what you're doing, your question will have an answer. If you don't know what you're doing (for example, if you just think regression and correlation are the "right" thing to do, but you don't understand what your trying to optimize by using them) then you are beyond help.
 
  • #7
Stephen Tashi said:
If you find a way, please post it. I'm too busy to conduct a detailed interrogation. If you really know what you're doing, your question will have an answer. If you don't know what you're doing (for example, if you just think regression and correlation are the "right" thing to do, but you don't understand what your trying to optimize by using them) then you are beyond help.

Look, this is purely a statistical question; if you want me to go into science details, I could, but they're absolutely irrelevant. I have data. I need to figure out if there is a correlation between the x and y variables/the slope of said line in the case of a linear regression. I'm sure that I could do some sort of complicated simulation to randomly sample imaginary data points from within my error bars and calculate fits for all of them to see if I get anything significant, but that is far more complicated than something I want to deal with.

I know that if the error bars were the same, I could do a weighted least squares fit. But they're not. So all I'm asking for is if anyone knows how to deal with the asymmetric error bars in such instances... I'm sure people do such fits all the time, but my ability to google any sort of explanation hasn't been successful.
 
  • #8
If it's correlation you're after, you don't have to deal with regression lines. The correlation is defined in terms of the covariance and the standard deviations of the two variables.
 
  • #10
Perhaps you can use a model of a piecewise Gaussian variable. Suppose the variable has a mean [itex]a[/itex] and different standard deviation for [itex]x > a[/itex] and [itex]x < a[/itex], i.e. its distribution is:

[tex]
\varphi(x) = \left\{\begin{array}{ll}
A_{1} \exp\left(-\frac{(x - a)^{2}}{2 \sigma^{2}_{1}}\right)&, x > a \\

A_{2} \exp\left(-\frac{(x - a)^{2}}{2 \sigma^{2}_{2}}\right)&, x < a
\end{array}\right.
[/tex]

You have to adjust [itex]A_{1}[/itex] and [itex]A_{2}[/itex] so that:

[tex]
E(X) - a = \int_{-\infty}^{\infty}{(x - a) \varphi(x) \, dx} = 0 \Rightarrow A_{1} \int_{0}^{\infty}{t e^{-\frac{t^{2}}{2 \sigma^{2}_{1}}} \, dt} = A_{2} \int_{0}^{\infty}{t e^{-\frac{t^{2}}{2 \sigma^{2}_{2}}} \, dt} \Rightarrow A_{1} \, \sigma^{2}_{1} = A_{2} \, \sigma^{2}_{2}
[/tex]

Of course, the probability density must be normalized:

[tex]
\int_{-\infty}^{\infty}{\varphi(x) \, dx} = 1 \Rightarrow A_{1} \, \int^{\infty}_{0}{e^{-\frac{t^{2}}{2\sigma^{2}_{1}} \, dt} + A_{2} \, \int^{\infty}_{0}{e^{-\frac{t^{2}}{2\sigma^{2}_{2}} \, dt} = 1 \Rightarrow \sqrt{\frac{\pi}{2}} \left(A_{1} \, \sigma_{1} + A_{2} \, \sigma_{2} \right) = 1
[/tex]

These two equations allow you to express [itex]A_{1/2}[/itex] in terms of [itex]\sigma_{1/2}[/itex]. Try to find the variance of the variable.

Next, consider the variable:

[tex]
\varepsilon_{i} = a \, X_{i} + b \, Y_{i} + c, \; a^{2} + b^{2} = 1, \; i = 1, \ldots, N
[/tex]

If [itex]X_{i}[/itex] and [itex]Y_{i}[/itex] have the above distribution, what is the expectation value and variance for [itex]\varepsilon_{i}[/itex]?

Approximate these variables as having an approximately Normal distribution with the above expectaion values and variances and use the maximum likelihood method, which would reduce to a least-squares method to estimate the parameters of the general linear dependence:

[tex]
a \, x + b \, y + c = 0, \; a^{2} + b^{2} = 1
[/tex]
 
Last edited:
  • #11
I'm dealing with a similar problem. My case is a bit simpler because I only have error in y and because my error becomes symmetric when converted into log space. So to do the linear regression, I just convert into log-space, do a non-linear regression with log(mx+b) as my model curve, and convert back out of log space. There are a couple ways to do the nonlinear regression; I used the NonlinearModelFit command in Mathematica, which allows you to set weights to your points.

This method only works when your error is symmetric in log space, but this is the main kind of asymmetric error I usually run across. In fact, a lot of the error that we call symmetrical is really actually symmetrical only in log space, which makes it close to symmetrical for small errors, but quite asymmetrical for larger ones. Very often when we say +/-25%, we really mean */÷1.25, which actually works out to +25%/-20%, of course.
 
  • #12
"Look, this is purely a statistical question..."

Ha. That's amusing.

The problem is that there's no right way to do this without knowing where those error bars come from. Error bars by themselves have no definite meaning. But when people use symmetric error bars, we know by convention that they probably represent something like root-mean-squared error, or some multiple of it. There's no such universal meaning of asymmetric error bars, so without more information about the error distribution they're intended to summarize, it's hard to say how to handle them correctly.
 

What is linear regression with asymmetric error bars?

Linear regression with asymmetric error bars is a statistical method used to analyze the relationship between two variables. It involves drawing a line of best fit through a set of data points and calculating the uncertainty in the slope and intercept of the line using asymmetric error bars.

Why are asymmetric error bars used in linear regression?

Asymmetric error bars are used in linear regression because they allow for a more accurate representation of the uncertainty in the data. Unlike symmetric error bars, which assume equal uncertainty in both directions, asymmetric error bars account for the asymmetry in the data and provide a more precise estimate of the true values.

How are asymmetric error bars calculated in linear regression?

Asymmetric error bars in linear regression are typically calculated using the bootstrap method or by using the standard error of the regression coefficients. The bootstrap method involves repeatedly sampling from the data set and calculating the regression coefficients, while the standard error method uses the formula: SE = sqrt(MSE * (1/n + (x-x̅)^2/∑(x-x̅)^2)), where MSE is the mean squared error and x is the value of the independent variable.

What are the advantages of using linear regression with asymmetric error bars?

Linear regression with asymmetric error bars allows for a more accurate and precise analysis of the relationship between variables. It also takes into account the asymmetry in the data, which can lead to more realistic and reliable results. Additionally, it provides a visual representation of the uncertainty in the data, making it easier to interpret and communicate the results.

What are the limitations of linear regression with asymmetric error bars?

One limitation of using linear regression with asymmetric error bars is that it assumes a linear relationship between the variables, which may not always be the case. It also requires a large enough data set to accurately estimate the uncertainty in the data. Additionally, the interpretation of the results may be affected by outliers or influential data points.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
828
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
470
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
912
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
Back
Top