Thank you. I'm sure it wasn't intentional and it often happens difficult questions get a difficult answer before we discover that in fact a much simpler quesion was meant all along.
That's not your data, that's their data. They have the orthogonal distance available in something that looks like a parity plot. Are you convinced that situation is the same in your data ? Can you show ?
I don't have access to thehttps://epubs.siam.org/doi/pdf/10.1137/0908085, but it seems a bit more general.
Before embarking on an expedition, I would convince myself that the ordinary least squares (OLS) approach, where all errors are attributed to the dependent variable, is absolutely unusable:
- systematic errors do not belong in the error bars -- all errors must be uncorrelated
- compare $${\sum (y_i - <y>)^2 \over \sum {\sigma_{y_i}}^2 }\qquad \text {and} \qquad {\sum (x_i - <x>)^2 \over \sum {\sigma_{x_i}}^2}$$are they really approximately the same ?
- outliers and/or observations with really small errrors quickly ruin results
- Does the OLS result really look nonsensical ?
- If so, does it help to fold in the errors in the independent variable as I mentioned in #2 ? I.e. use ##{\sigma'_{y_i}}^2 = {\sigma_{y_i}}^2 + \Bigl (f'(x_i) \,\sigma_{x_i}\Bigr ) ^2 \ ##
[edit] depending on magnitude of ##f'## wrt magnitude of a -- use a, f' or even a+f'
In simple LSQ your ##f(x)## in ##y=ax+b+f(x)## is a Gaussian with average zero and a variance related to the estimated errors, so what you are basically trying to do is extract higher orders of ##f## from the noise

-- correct me if I am wrong. (The 0th and 1st terms of a Taylor series are in a and b)
Unless of course your data is completeley different (and y is far from linear), as when we try to subtract background (linear or quadratic) from an observed peak in a spectrum. Then the signal/noise ratio determines the accuracy of the background estimate. Different game.
If ##f## has a few parameters too, you will need a whole lot of accurate data to do sensibe statistics ...
If your data aren't really normally distributed the error estimates aren't worth much, nor is the least-squares method ...
If this is serious, I recommend to run Monte Carlo simulations on simulated data to establish the effects of the various analysis methods.
##\ ##