# Error in least squares fit: how to include error of points?

• ORF
In summary: Additionally, you can use the residuals to evaluate the quality of your fit and determine if it makes sense to include higher order terms. In summary, least squares fitting considers the measurement errors under certain assumptions, but there are other models that can be used if you have more information about the errors. Additionally, the residuals can be used to evaluate the quality of the fit and determine if higher order terms should be included.

#### ORF

Hello

I have a doubt with the least squares fitting (linear fitting).

The low-level statistics textbooks only take into account the statistical error of fitting, but not the error of the fitted points.

How is the error of the fitted points taken into account, and included in the total error of the fitting parameters?

My English is not very good looking, so if something is unclear, I will try to explain it better.

Thank you for your time :)

Greetings!

Usually, what you call the fitting error is *taken* to be the measurement error.

Hello

Thank you for answering so quickly.

Let's see an example: I have n-points, (x1,y1), ... (xn,yn), each one with an error, ie, (err-x1,err-y1), ... (err-xn,err-yn).

The fitting by least squares just takes into account the values (x1,y1), ... (xn,yn). The fitting parameters error is caused by the statistical distribution of the points, but it doesn't take into account the errors associated with the points, ie, (err-x1,err-y1), ... (err-xn,err-yn).

So, how can I include the error associated with the points? (ie, (err-x1,err-y1), ... (err-xn,err-yn))

Thank you for your time :)

Greetings!

Let me ask you first, how do you know the error of the measurements? I mean, if you knew the error of each single measurement then you could obviously just use the *true* value (measured minus error). If you just have an idea of the error distribution, and you want to have it considered separately from the fitting error, you need a a more complex model, e.g. a Kalman filter.

That's the very basic least squares algorithm.

There are various weighted least square algorithms (google that term) that account for the fact that different measurements have different uncertainties. You certainly don't want to let those measurements whose uncertainty is (for example) a centimeter have the same weight as some other measurements whose uncertainty is less than a millimeter. Those high precision measurements should dominate over the lesser precision measurements.

Going one step beyond that, you can use the residuals as a test of whether your least squares fit makes sense. Least squares in and of itself says nothing regarding whether the fit is a good fit. Suppose (for example) you are looking at modeling $y=c_0$ vs. $y=c_0+c_1x$ vs. $y=c_0+c_1x+c_2x^2$ vs. $y=c_0+c_1x+c_2x^2+c_3x^3$ vs. $y=c_0+c_1x+c_2x^2+c_3x^3+c_4x^4$. Suppose that weighted least squares fits respectively explain 10%, 30%, 99%, 99.1%, and 99.2% of the observed uncertainties. Which model should you use? The answer in this case is the quadratic model. The constant and linear models are garbage. Both are lousy predictors. Those huge jumps from the constant model to the linear model to the quadratic model should tell you that you are capturing some essential characteristics of the underlying process with those progressively higher order terms. The tiny jumps after the quadratic model should tell you that you those even higher order models most likely are just overfitting noise.

Least squares fitting does take into account the errors of the individual points under the assumption that they are normally distributed, and independent with 0 mean and constant variance. If you have more information about the errors then you should use a different model.

## 1. How is the error in a least squares fit calculated?

The error in a least squares fit is calculated by finding the difference between the actual data points and the predicted values from the least squares regression line. This difference, also known as the residual, is then squared and summed to obtain the total error.

## 2. Can the error of individual data points be included in a least squares fit?

Yes, the error of individual data points can be included in a least squares fit by using weighted least squares regression. This takes into account the uncertainty or variability in each data point and adjusts the regression line accordingly.

## 3. How does including the error of points affect the accuracy of the least squares fit?

Including the error of points in a least squares fit can improve the accuracy of the fit as it takes into account the variability in the data. However, this also depends on the quality and reliability of the error estimates for each data point.

## 4. Can the error of points be included in all types of least squares fits?

Yes, the error of points can be included in all types of least squares fits, including linear, polynomial, and exponential regression. However, the method of including the error may differ depending on the type of regression being performed.

## 5. Is it necessary to include the error of points in a least squares fit?

Including the error of points in a least squares fit is not always necessary, but it can provide more accurate and reliable results, especially when dealing with large datasets or data with high variability. It ultimately depends on the specific needs and goals of the analysis.

• Other Physics Topics
Replies
3
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
• Programming and Computer Science
Replies
4
Views
446
• Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
8
Views
4K
• Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
26
Views
2K