Right way to do a linear fit

  • I
  • Thread starter BillKet
  • Start date
  • #1
84
8
Hello! I have some data of the form (x,y,z) which I know it is described by a function of the form: ##z=y(a+bx)##, where a and b are parameters to be fitted for. z and y have error associated to them while x doesn't (x is actually an integer going from 0 to 3 for each value of y). I tried to do the fit in 2 different ways. I first made a linear fit of the form ##z=yA## for A (I used this package which accounts for the error on both z and y: https://docs.scipy.org/doc/scipy/reference/odr.html) for each value of x, then I made a fit of the form ##A=a+bx## for a and b, with the error on A obtained from the first fit. In the end I get a value and error for a and b. A second method I used was to fit directly ##z=y(a+bx)## to the whole data at once (it is not really a linear fit anymore, but it can be easily done in Python, with the same package as mentioned above). Now I get a new set of values and errors for a and b. The values obtained using the 2 methods are consistent with each other (within the errors on a and b), but using the first method gives a smaller error than in the second method. Is there anything I am missing? Shouldn't I get the exactly same result both ways? And in case the answer is no, which method should I use and why? Thank you!
 

Answers and Replies

  • #2
12,089
5,768
Can you provide some context here? What is the data you’re fitting? Where does it come from?

Knowing that, we might find that certain fields of research prefer certain methods to be used over other methods.
 
  • #3
30,137
6,573
Shouldn't I get the exactly same result both ways?
No, definitely not. If different methods always gave exactly the same result then there would be no point in having different methods at all.

And in case the answer is no, which method should I use and why?
The errors in y, are they large or can they be neglected?
 
  • Like
Likes jedishrfu
  • #4
84
8
Can you provide some context here? What is the data you’re fitting? Where does it come from?

Knowing that, we might find that certain fields of research prefer certain methods to be used over other methods.
The data is from a molecular spectroscopy experiment. For people working in the field, this is similar to a King plot fit, but for molecular terms (when the field shift is important). z corresponds to a frequency shift between different molecules, y is the change in radius of one of the atoms of the molecules between different molecules and x is the frequency level that is being tested.
 
  • #5
84
8
No, definitely not. If different methods always gave exactly the same result then there would be no point in having different methods at all.

The errors in y, are they large or can they be neglected?
Thank you for your reply! To be honest I wasn't even sure if they can count as different, I assumed they are the same method, but it one case it do it in 2 steps while in the other in one step only.

The errors on y are a lot smaller than the errors on z. From what I've seen ignoring them doesn't produce a big difference. The errors on z contain also systematic uncertainties and the statistics for them are a lot lower, so the error is quite big.
 
  • #6
30,137
6,573
The errors on y are a lot smaller than the errors on z.
Then doing a standard least squares fit should be fine. Stepwise first are always a little sketchy, so I would avoid it. The smaller error is most likely producing a larger bias.

I would probably fit to the following model ##z= ay + bx + cxy + d## with a standard linear model. In R this model would be written
Code:
z~x*y
where the inclusion of the other terms is so standard that they are simply assumed. Leaving out intercept terms and lower order terms can introduce bias. This model will give you the best unbiased linear estimator.
 
  • #7
84
8
Then doing a standard least squares fit should be fine. Stepwise first are always a little sketchy, so I would avoid it. The smaller error is most likely producing a larger bias.

I would probably fit to the following model ##z= ay + bx + cxy + d## with a standard linear model. In R this model would be written
Code:
z~x*y
where the inclusion of the other terms is so standard that they are simply assumed. Leaving out intercept terms and lower order terms can introduce bias. This model will give you the best unbiased linear estimator.
Oh I see! So if the fit is good b and d should be consistent with zero, right? Thanks a lot! Could you please explain to me a bit more why doing it in 2 steps gives me a different error (it is actually ~3 times smaller)?
 
  • #8
30,137
6,573
Oh I see! So if the fit is good b and d should be consistent with zero, right? Thanks a lot! Could you please explain to me a bit more why doing it in 2 steps gives me a different error (it is actually ~3 times smaller)?
I am surprised that it is that much different. Without the data I can’t really tell. There might be some substantial covariance or multicolinearity that is constrained away in the stepwise approach.
 
  • #9
84
8
I am surprised that it is that much different. Without the data I can’t really tell. There might be some substantial covariance or multicolinearity that is constrained away in the stepwise approach.
Please find the data I am using below. The errors are combined statistical and systematic. They come from different experiments (hence the different range of errors). Just to give a bit more details, the function I actually need to fit is this ##z=y(a+b(x+0.5)/4.186)## (just a redefinition of a and b for completeness). Each sub-array of z corresponds to a value of x. For example the second entry of z should be written as: ##0.176=-0.216(a+b(0+0.5)/4.186)## Please let me know if I can provide further details.

$$y = [-0.312, -0.216, -0.080, 0. , 0.210 ]$$
$$y_{err}=[0.015, 0.010, 0.004, 0.00001,0.01]$$
$$x=[0,1,2,3]$$
$$z = [[ 0.268, 0.176, 0.117 , -0. , -0.184],
[ 0.277, 0.177, 0.100, -0. , -0.179]
[ 0.274, 0.178, 0.121, -0. , -0.250]
[ 0.298, 0.063, 0.001, -0. , -0.374 ]]$$
$$z_{err}=[[0.008, 0.015, 0.028, 0.008, 0.021],
[0.005, 0.013 , 0.018, 0.004, 0.012],
[0.014, 0.016, 0.053, 0.016, 0.042],
[0.059, 0.088, 0.163, 0.055, 0.151]]$$
 

Related Threads on Right way to do a linear fit

  • Last Post
2
Replies
28
Views
1K
Replies
2
Views
2K
  • Last Post
Replies
1
Views
8K
  • Last Post
Replies
3
Views
300
  • Last Post
Replies
16
Views
335
Replies
6
Views
14K
Replies
1
Views
152
Replies
3
Views
3K
Replies
30
Views
4K
Replies
26
Views
816
Top