Error on non-linearity in a linear fit

Malamala · Jun 12, 2021

Hello! If I have some data points, with error bars on both x and y, and I would like to fit them with a function f(x). How can I write the chi-squared in this case? For errors only on y, I would have ##\chi^2 = \sum_i(\frac{f(x)-y}{\sigma_y})^2##, but I am not sure how to include ##\sigma_x##. Thank you!

BvU · Jun 12, 2021

Hello again,

I see you haven't replied to help given earlier, so after a year of fruitless waiting, I'm not sure I want to spend any serious time on this.
The same questions apply now as well. Funny, isn't it ?

BvU said:

Did you make a plot ?
Tell us how the data were obtained and what they represent. Especially the ##y_{err}##.
And how can you obtain such unbelievably accurate estimates of ##y_{err}##. Billions of observations, or just mindless copying calculation results ?
Are you aware of the role of systematic errors ?

I'd like to know if you understood the help given in earlier threads, or just lost interest and never reacted any more. I don't mind repeating things, but it should have a reasonable purpose.

Show your data.

There is no simple mechanism for what you want. You could fold the ##\sigma_x## into the ##\sigma_y## using ##f'(x)##.

##\ ##

Malamala · Jun 13, 2021

BvU said:

Hello again,

I see you haven't replied to help given earlier, so after a year of fruitless waiting, I'm not sure I want to spend any serious time on this.
The same questions apply now as well. Funny, isn't it ?
Show your data.

There is no simple mechanism for what you want. You could fold the ##\sigma_x## into the ##\sigma_y## using ##f'(x)##.

##\ ##

I apologize for the previous post. Honestly I got confused a bit about the answers, but I realized that the problem I was trying to solve was easier than I thought, so I didn't need all that. Now I am pretty sure I do. Here is a paper similar to what I need. Figure 2 is what I want to fit to (for some reason they show the error bars diagonal, but you can assume they are both on x and y directions). What I want to do is fit this with something of the form ##y=ax+b+f(x)##, where for ##f(x)## I can try different things depending on the physics model I test. In the end I am interested in what is the error on this ##f(x)##.

BvU · Jun 13, 2021

Malamala said:

I apologize for the previous post.

Thank you. I'm sure it wasn't intentional and it often happens difficult questions get a difficult answer before we discover that in fact a much simpler quesion was meant all along.

Malamala said:

Figure 2 is what I want to fit to

That's not your data, that's their data. They have the orthogonal distance available in something that looks like a parity plot. Are you convinced that situation is the same in your data ? Can you show ?

I don't have access to thehttps://epubs.siam.org/doi/pdf/10.1137/0908085, but it seems a bit more general.

Before embarking on an expedition, I would convince myself that the ordinary least squares (OLS) approach, where all errors are attributed to the dependent variable, is absolutely unusable:

systematic errors do not belong in the error bars -- all errors must be uncorrelated
compare $${\sum (y_i - <y>)^2 \over \sum {\sigma_{y_i}}^2 }\qquad \text {and} \qquad {\sum (x_i - <x>)^2 \over \sum {\sigma_{x_i}}^2}$$are they really approximately the same ?
outliers and/or observations with really small errrors quickly ruin results
Does the OLS result really look nonsensical ?
If so, does it help to fold in the errors in the independent variable as I mentioned in #2 ? I.e. use ##{\sigma'_{y_i}}^2 = {\sigma_{y_i}}^2 + \Bigl (f'(x_i) \,\sigma_{x_i}\Bigr ) ^2 \ ##

[edit] depending on magnitude of ##f'## wrt magnitude of a -- use a, f' or even a+f'

Malamala said:

What I want to do is fit this with something of the form ##y=ax+b+f(x)##, where for ##f(x)## I can try different things depending on the physics model I test. In the end I am interested in what is the error on this ##f(x)##.

In simple LSQ your ##f(x)## in ##y=ax+b+f(x)## is a Gaussian with average zero and a variance related to the estimated errors, so what you are basically trying to do is extract higher orders of ##f## from the noise

-- correct me if I am wrong. (The 0th and 1st terms of a Taylor series are in a and b)

Unless of course your data is completeley different (and y is far from linear), as when we try to subtract background (linear or quadratic) from an observed peak in a spectrum. Then the signal/noise ratio determines the accuracy of the background estimate. Different game.

If ##f## has a few parameters too, you will need a whole lot of accurate data to do sensibe statistics ...

If your data aren't really normally distributed the error estimates aren't worth much, nor is the least-squares method ...

If this is serious, I recommend to run Monte Carlo simulations on simulated data to establish the effects of the various analysis methods.

##\ ##

Malamala · Jun 13, 2021

BvU said:

Thank you. I'm sure it wasn't intentional and it often happens difficult questions get a difficult answer before we discover that in fact a much simpler quesion was meant all along.
View attachment 284415

That's not your data, that's their data. They have the orthogonal distance available in something that looks like a parity plot. Are you convinced that situation is the same in your data ? Can you show ?

I don't have access to thehttps://epubs.siam.org/doi/pdf/10.1137/0908085, but it seems a bit more general.

Before embarking on an expedition, I would convince myself that the ordinary least squares (OLS) approach, where all errors are attributed to the dependent variable, is absolutely unusable:

systematic errors do not belong in the error bars -- all errors must be uncorrelated

compare $${\sum (y_i - <y>)^2 \over \sum {\sigma_{y_i}}^2 }\qquad \text {and} \qquad {\sum (x_i - <x>)^2 \over \sum {\sigma_{x_i}}^2}$$are they really approximately the same ?

outliers and/or observations with really small errrors quickly ruin results

Does the OLS result really look nonsensical ?

If so, does it help to fold in the errors in the independent variable as I mentioned in #2 ? I.e. use ##{\sigma'_{y_i}}^2 = {\sigma_{y_i}}^2 + \Bigl (f'(x_i) \,\sigma_{x_i}\Bigr ) ^2 \ ##

[edit] depending on magnitude of ##f'## wrt magnitude of a -- use a, f' or even a+f'

In simple LSQ your ##f(x)## in ##y=ax+b+f(x)## is a Gaussian with average zero and a variance related to the estimated errors, so what you are basically trying to do is extract higher orders of ##f## from the noise -- correct me if I am wrong. (The 0th and 1st terms of a Taylor series are in a and b)

Unless of course your data is completeley different (and y is far from linear), as when we try to subtract background (linear or quadratic) from an observed peak in a spectrum. Then the signal/noise ratio determines the accuracy of the background estimate. Different game.

If ##f## has a few parameters too, you will need a whole lot of accurate data to do sensibe statistics ...

If your data aren't really normally distributed the error estimates aren't worth much, nor is the least-squares method ...

If this is serious, I recommend to run Monte Carlo simulations on simulated data to establish the effects of the various analysis methods.

##\ ##

Here are some of my data points: y = [-508.89,531.11,1190.36,1888.80], error_y = [0.09,0.09,0.49,0.11], x = [-954.76, 1000.28, 2286.75, 3655.38], error_x = [0.11,0.12,0.39,0.20]. The errors are only statistical. As I said, I want to fit this with something of the form y=ax+b+f(x). Physically, f(x) contains new physics. Given the big error on the values I have, f(x) will (most probably) be consistent with zero. What I need to do is to find a 95% exclusion interval for f(x), something like f(x)<10−10 at 95% confidence level. This is what they do in Figure 3 in that paper. Basically this is what is done in literature (I can send you several other papers if it is useful). We measure these 2 values on the x and y, and try to set limits on f(x). The hope is that by reducing the errors on x and y at a point we would be able to actually see a deviation from linearity and set an actual value, not just a bound on f(x). Here is a paper which actually claims a 3σ deviation from linearity in exactly the same type of plot.

Twigg · Jun 13, 2021

BvU said:

That's not your data, that's their data. They have the orthogonal distance available in something that looks like a parity plot. Are you convinced that situation is the same in your data ? Can you show ?

Malamala said:

Here are some of my data points: y = [-508.89,531.11,1190.36,1888.80], error_y = [0.09,0.09,0.49,0.11], x = [-954.76, 1000.28, 2286.75, 3655.38], error_x = [0.11,0.12,0.39,0.20].

OP, we appreciate your transparency and this will certainly help us help you, but I just want to say you always have the right to say "no" when sharing data. If this data is unpublished, I encourage you to be a little more protective as it represents years of work for your entire group. No judgement here, just friendly advice! If this data was published and you are just doing a re-analysis, then please disregard this comment. Just trying to look out for you!

There was actually a thread recently about this very subject. The OP was looking for an expression for King non-linearity (and the associated propagated error) on a King plot with 4 or more points. The same paper by Solaro et al was cited. Not exactly the same thing, but I encourage you to skim through starting on page 2 (the first page was a bunch of misunderstandings about what was being asked).

I'm not an expert on linear regression with measurement error (that's the fancy name for error on the x-axis), but it sounds like @BvU can help you. I also suspect @Dale is someone who could help you. Once you have a method for doing linear regression with measurement error, you can apply this to a non-linear method like Levenburg-Marquardt regression which use linear regression in their algorithm.

Of course, the fool-proof method is to contact Ian Counts or Cyrille Solaro and ask them directly how they handled the error propagation. The paper writing process for precision measurement can be grueling, and they have probably spent ~100 hours thinking about this. If you are doing research in this field, getting to know them won't hurt!

Malamala · Jun 13, 2021

Twigg said:

OP, we appreciate your transparency and this will certainly help us help you, but I just want to say you always have the right to say "no" when sharing data. If this data is unpublished, I encourage you to be a little more protective as it represents years of work for your entire group. No judgement here, just friendly advice! If this data was published and you are just doing a re-analysis, then please disregard this comment. Just trying to look out for you!

There was actually a thread recently about this very subject. The OP was looking for an expression for King non-linearity (and the associated propagated error) on a King plot with 4 or more points. The same paper by Solaro et al was cited. Not exactly the same thing, but I encourage you to skim through starting on page 2 (the first page was a bunch of misunderstandings about what was being asked).

I'm not an expert on linear regression with measurement error (that's the fancy name for error on the x-axis), but it sounds like @BvU can help you. I also suspect @Dale is someone who could help you. Once you have a method for doing linear regression with measurement error, you can apply this to a non-linear method like Levenburg-Marquardt regression which use linear regression in their algorithm.

Of course, the fool-proof method is to contact Ian Counts or Cyrille Solaro and ask them directly how they handled the error propagation. The paper writing process for precision measurement can be grueling, and they have probably spent ~100 hours thinking about this. If you are doing research in this field, getting to know them won't hurt!

Thank you for pointing me towards that thread (and all the other info), I will take a look into it. About data, well they kept insisting on the ACTUAL data that I use, even if it is exactly the same as in the paper I referenced, for the purpose of my question. But it's fine, it is out of context (and not all the measured points) so I assume it's not usable in this form. But than you for advice!

BvU · Jun 14, 2021

Malamala said:

Thank you for pointing me towards that thread (and all the other info), I will take a look into it. About data, well they kept insisting on the ACTUAL data that I use, even if it is exactly the same as in the paper I referenced, for the purpose of my question. But it's fine, it is out of context (and not all the measured points) so I assume it's not usable in this form. But than you for advice!

My apologies for insisting... but for me it did help. Certainly in combination with the context of the links in #3 and later ones in #6 and the thread. @Twigg is too kind in

Twigg said:

I'm not an expert on linear regression with measurement error (that's the fancy name for error on the x-axis), but it sounds like @BvU can help you. I also suspect @Dale is someone who could help you. Once you have a method for doing linear regression with measurement error, you can apply this to a non-linear method like Levenburg-Marquardt regression which use linear regression in their algorithm.

but I've gradually come to an "impression"

that this has little to do with the usual LSQ and LSQ error handling: you have a large number of observations and group results to end up with four points, each with a ##\sigma_x## and a ##\sigma_y##. And here's me in #4, blabbing on about OLS and ODR and folding in dependent variable errors etcetera.

BvU said:

In simple LSQ your ##f(x)## in ##y=ax+b+f(x)## is a Gaussian with average zero and a variance related to the estimated errors, so what you are basically trying to do is extract higher orders of ##f## from the noise -- correct me if I am wrong. (The 0th and 1st terms of a Taylor series are in a and b)

I can put in a plea that the drive was to help and answer the questions as good and quickly as possible. Something of the golden hammer phenomenon is seeping through...

Where in fact the best advice may well have been in the loose comments near the end:

BvU said:

Unless of course your data is completeley different (and y is far from linear), as when we try to subtract background (linear or quadratic) from an observed peak in a spectrum. Then the signal/noise ratio determines the accuracy of the background estimate. Different game.

If ##f## has a few parameters too, you will need a whole lot of accurate data to do sensibe statistics ...

If your data aren't really normally distributed the error estimates aren't worth much, nor is the least-squares method ...

If this is serious, I recommend to run Monte Carlo simulations on simulated data to establish the effects of the various analysis methods.

##\ ##

So, after a good night's sleep and fresh coffee I propose to consider myself completely unqualified to help with error analysis in the context of King plots. I never even knew they existed or what they represent. @Twigg , @Dale and others are light-years ahead of me.

But as a curious physicist I can't stop myself from commenting and asking further questions

You have a large set of N observations and a model ##y=ax_i+b+f(x_i)## with ##N >> 10##. You use two degrees of freedom to extract and subtract ##a## and ##b## (Kirchner (3) and (9) -- with ##a## and ##b## switched o0)

) and are left with something that looks like

but with a small cloud of points at each of the four locations (instead of four single points with puny vertical and horizontal error bars, the latter of which I can't even draw with my old excel -- and if I could they would be invisibly small anyway).

I did unweighted regression on four points ##y-<y>## vs ##x-<x>## and got 0.520 ##\pm## 0.004 as slope and 0 ##\pm## 7 as intercept -- you can do slightly better with N points. Thanks to the subtraction those errors are now uncorrelated.

You have a model ##f(x)## which I suspect is discrete in both ##x## and ##y##. (I can barely read the isotope shift papers, let alone understand what's going on :cry:

). And you have N-2 degrees of freedom to fit features of ##f##.

At this point I'm stuck for I don't know if you can simulate the measurements by calculating a ##y## for every ##x_i## or not. If you can, the thing to minimize is ##\sum (y_i-y_{i,\text{calc}})^2## (if necessary weighted). But Murphy makes it likely you can't -- and then I don't know what to do with ## x_i-x_{i,\text{calc}} ##.

Monte Carlo is what comes to mind -- but I'd better shut up :nb)

until I know what I'm talking about !

Impressed by your progress, Best of luck and let us know !

##\ ##

Twigg · Jun 14, 2021

I was thinking about this more over night, and it dawned on me what the folks in the https://epubs.siam.org/doi/pdf/10.1137/0908085 are doing (I think!). I also don't have access, so this is a guess based on the abstract.

In the Solaro paper, there's a quote that stuck out at me (and they say this very emphatically):

We emphasize that ##\delta \nu_{732}^{A,40}## is deduced from measurements of ##\delta \nu_{729}^{A,40}## and ##\delta \nu_{DSIS}^{A,40}##, and that ##\delta \nu_{729}^{A,40} \gg \delta \nu_{DSIS}^{A,40}##. Consequently, the measurement uncertainties on ##\delta \nu_{729}^{A,40}## and ##\delta \nu_{DSIS}^{A,40}## translate into errors bars essentially parallel and perpendicular to the fitted line, illustrating that the analysis is limited nearly exclusively by the achieved accuracy on ##\delta \nu_{DSIS}^{A,40}##.

This makes their error analysis very unique. Not only do you have measurement errors, but the error on the y-axis is partially correlated with the error on the x axis. This means you can take all the cookbook rules for linear regression and chuck them right out the window.

What I believe Solaro et al. do (and this is a guess) is they have a non-linear fitting algorithm. I couldn't tell you exactly what the algorithm is, but I can tell you one algorithm that will do the job (perhaps not efficiently, but it gets the job done). This algorithm would be identical to Levenburg-Marquardt but with the linear regression replaced by Deming regression with ##\delta = 1## (aka orthogonal distance regression or ODR). They probably entered only the uncertainty ##\delta \nu_{DSIS}^{A,40}## into the algorithm, leaving out the uncertainty on ##\delta \nu_{729}^{A,40}## entirely. This is because, as discussed above, ##\delta \nu_{DSIS}^{A,40}## represents the error orthogonal to the line. I'm guessing the "weighting" that Solaro et al mentioned was inverse variance of ##\delta \nu_{DSIS}^{A,40}##.

The problem is, looking at your data's error bars, sometimes ##\sigma_x > \sigma_y## and sometimes ##\sigma_x < \sigma_y##, so clearly you measured these isotope shifts independently. If you had measured the shift between excited states like in the Solaro paper, we would expect to see something of the form ##\sigma_y = \sqrt{\sigma_x^2 + \sigma_{exc}^2}## where ##\sigma_{exc}## is the uncertainty on the isotope shift between excited states (##\delta \nu_{DSIS}^{A,40}## in Solaro et al). Since we see ##\sigma_y < \sigma_x## for some of your data, this cannot be the case.

However, I notice that for your data, ##\sigma_x \approx \sigma_y## at least within a factor of 2 or 3. So I believe you can still use nonlinear ODR but you will need to calculate the error perpendicular to the fitting line differently. I'm still pondering the right way to do that.

Error on non-linearity in a linear fit

1. What is non-linearity in a linear fit?

2. Why is non-linearity a problem in linear regression?

3. How can I detect non-linearity in my data?

4. How can I address non-linearity in my linear regression model?

5. Is it always necessary to address non-linearity in a linear regression model?

Similar threads

Hot Threads

Recent Insights