# Error in least squares fit: how to include error of points?

ORF
Hello

I have a doubt with the least squares fitting (linear fitting).

The low-level statistics textbooks only take into account the statistical error of fitting, but not the error of the fitted points.

How is the error of the fitted points taken into account, and included in the total error of the fitting parameters?

My English is not very good looking, so if something is unclear, I will try to explain it better.

Thank you for your time :)

Greetings!

rumborak
Usually, what you call the fitting error is *taken* to be the measurement error.

ORF
Hello

Thank you for answering so quickly.

Let's see an example: I have n-points, (x1,y1), ... (xn,yn), each one with an error, ie, (err-x1,err-y1), ... (err-xn,err-yn).

The fitting by least squares just takes into account the values (x1,y1), ... (xn,yn). The fitting parameters error is caused by the statistical distribution of the points, but it doesn't take into account the errors associated with the points, ie, (err-x1,err-y1), ... (err-xn,err-yn).

So, how can I include the error associated with the points? (ie, (err-x1,err-y1), ... (err-xn,err-yn))

Thank you for your time :)

Greetings!

rumborak
Let me ask you first, how do you know the error of the measurements? I mean, if you knew the error of each single measurement then you could obviously just use the *true* value (measured minus error). If you just have an idea of the error distribution, and you want to have it considered separately from the fitting error, you need a a more complex model, e.g. a Kalman filter.

Staff Emeritus
Going one step beyond that, you can use the residuals as a test of whether your least squares fit makes sense. Least squares in and of itself says nothing regarding whether the fit is a good fit. Suppose (for example) you are looking at modeling $y=c_0$ vs. $y=c_0+c_1x$ vs. $y=c_0+c_1x+c_2x^2$ vs. $y=c_0+c_1x+c_2x^2+c_3x^3$ vs. $y=c_0+c_1x+c_2x^2+c_3x^3+c_4x^4$. Suppose that weighted least squares fits respectively explain 10%, 30%, 99%, 99.1%, and 99.2% of the observed uncertainties. Which model should you use? The answer in this case is the quadratic model. The constant and linear models are garbage. Both are lousy predictors. Those huge jumps from the constant model to the linear model to the quadratic model should tell you that you are capturing some essential characteristics of the underlying process with those progressively higher order terms. The tiny jumps after the quadratic model should tell you that you those even higher order models most likely are just overfitting noise.