# Linear regression with asymmetric error bars

I've been trying to figure out how to do a linear regression on data with asymmetric x and y error bars (different for each data point). Any help would be much appreciated.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
That is the mean for x and y.

That is the mean for x and y.
? I don't think you understand. There's some probability distribution which says that each data point can lie somewhere in between some x1 and x2 and some y1 and y2; these uncertainties are of different magnitudes for each data point, and the fact that they are different means that the data points need to be weighted differently in the regression calculation. But, because the error bars are asymmetric, I can't just do a straight weighted fit...

Stephen Tashi
It's probably best if you give more details and also explain why you are determined to do a linear regression instead of another type of estimation.

What other details do you want? I want to see if there's a linear correlation between the x and y variables, and have data points that look like:
l​
l​
l​
------x-------------------
l​

Or something like that where x is the data point and the l's/-'s represent the error (but of different magnitudes for each point, like I said). I don't know how I can better describe this...

Stephen Tashi
I don't know how I can better describe this...
If you find a way, please post it. I'm too busy to conduct a detailed interrogation. If you really know what you're doing, your question will have an answer. If you don't know what you're doing (for example, if you just think regression and correlation are the "right" thing to do, but you don't understand what your trying to optimize by using them) then you are beyond help.

If you find a way, please post it. I'm too busy to conduct a detailed interrogation. If you really know what you're doing, your question will have an answer. If you don't know what you're doing (for example, if you just think regression and correlation are the "right" thing to do, but you don't understand what your trying to optimize by using them) then you are beyond help.
Look, this is purely a statistical question; if you want me to go into science details, I could, but they're absolutely irrelevant. I have data. I need to figure out if there is a correlation between the x and y variables/the slope of said line in the case of a linear regression. I'm sure that I could do some sort of complicated simulation to randomly sample imaginary data points from within my error bars and calculate fits for all of them to see if I get anything significant, but that is far more complicated than something I want to deal with.

I know that if the error bars were the same, I could do a weighted least squares fit. But they're not. So all I'm asking for is if anyone knows how to deal with the asymmetric error bars in such instances... I'm sure people do such fits all the time, but my ability to google any sort of explanation hasn't been successful.

Stephen Tashi
If it's correlation you're after, you don't have to deal with regression lines. The correlation is defined in terms of the covariance and the standard deviations of the two variables.

Perhaps you can use a model of a piecewise Gaussian variable. Suppose the variable has a mean $a$ and different standard deviation for $x > a$ and $x < a$, i.e. its distribution is:

$$\varphi(x) = \left\{\begin{array}{ll} A_{1} \exp\left(-\frac{(x - a)^{2}}{2 \sigma^{2}_{1}}\right)&, x > a \\ A_{2} \exp\left(-\frac{(x - a)^{2}}{2 \sigma^{2}_{2}}\right)&, x < a \end{array}\right.$$

You have to adjust $A_{1}$ and $A_{2}$ so that:

$$E(X) - a = \int_{-\infty}^{\infty}{(x - a) \varphi(x) \, dx} = 0 \Rightarrow A_{1} \int_{0}^{\infty}{t e^{-\frac{t^{2}}{2 \sigma^{2}_{1}}} \, dt} = A_{2} \int_{0}^{\infty}{t e^{-\frac{t^{2}}{2 \sigma^{2}_{2}}} \, dt} \Rightarrow A_{1} \, \sigma^{2}_{1} = A_{2} \, \sigma^{2}_{2}$$

Of course, the probability density must be normalized:

$$\int_{-\infty}^{\infty}{\varphi(x) \, dx} = 1 \Rightarrow A_{1} \, \int^{\infty}_{0}{e^{-\frac{t^{2}}{2\sigma^{2}_{1}} \, dt} + A_{2} \, \int^{\infty}_{0}{e^{-\frac{t^{2}}{2\sigma^{2}_{2}} \, dt} = 1 \Rightarrow \sqrt{\frac{\pi}{2}} \left(A_{1} \, \sigma_{1} + A_{2} \, \sigma_{2} \right) = 1$$

These two equations allow you to express $A_{1/2}$ in terms of $\sigma_{1/2}$. Try to find the variance of the variable.

Next, consider the variable:

$$\varepsilon_{i} = a \, X_{i} + b \, Y_{i} + c, \; a^{2} + b^{2} = 1, \; i = 1, \ldots, N$$

If $X_{i}$ and $Y_{i}$ have the above distribution, what is the expectation value and variance for $\varepsilon_{i}$?

Approximate these variables as having an approximately Normal distribution with the above expectaion values and variances and use the maximum likelihood method, which would reduce to a least-squares method to estimate the parameters of the general linear dependence:

$$a \, x + b \, y + c = 0, \; a^{2} + b^{2} = 1$$

Last edited:
I'm dealing with a similar problem. My case is a bit simpler because I only have error in y and because my error becomes symmetric when converted into log space. So to do the linear regression, I just convert into log-space, do a non-linear regression with log(mx+b) as my model curve, and convert back out of log space. There are a couple ways to do the nonlinear regression; I used the NonlinearModelFit command in Mathematica, which allows you to set weights to your points.

This method only works when your error is symmetric in log space, but this is the main kind of asymmetric error I usually run across. In fact, a lot of the error that we call symmetrical is really actually symmetrical only in log space, which makes it close to symmetrical for small errors, but quite asymmetrical for larger ones. Very often when we say +/-25%, we really mean */÷1.25, which actually works out to +25%/-20%, of course.

"Look, this is purely a statistical question..."

Ha. That's amusing.

The problem is that there's no right way to do this without knowing where those error bars come from. Error bars by themselves have no definite meaning. But when people use symmetric error bars, we know by convention that they probably represent something like root-mean-squared error, or some multiple of it. There's no such universal meaning of asymmetric error bars, so without more information about the error distribution they're intended to summarize, it's hard to say how to handle them correctly.