Errors in fitting to data, relationship to residue

In summary, fitting errors are discrepancies between actual data and predicted values from a mathematical model. They can arise from measurement uncertainties, model assumptions, and data outliers, and can affect data analysis by distorting results and making it difficult to interpret relationships between variables. While fitting errors cannot be completely eliminated, they can be reduced through the use of robust statistical methods, outlier removal, and careful model selection. The residue, which is the difference between observed data and model predictions, is another measure of fitting errors. To assess their significance, fitting errors can be compared to expected error or uncertainty in the measurement, and statistical tests can be used to determine if they are significantly different from zero. Common sources of fitting errors include measurement errors, incorrect model assumptions, and
  • #1
mikelee8a
2
0
Hi,

I'd like to fit a straight line to some data which is noisey with gaussian noise with some st dev.

Using least squares, I can estimate the slope and intercept. I'd like to know the uncertainty in these numbers. I can find the residue, I believe this is a measure of the variance of the noise.

Using a simulation with N points, each with noise st dev. σ, I find the variation in estimated slope is proportional to σ/(√N^3), which I can't explain, I'd have expected sigma over root N, as for the standard error.

Any help would be fantastic. I just want to know what error to quote with my fitted gradient.

Mike
 
Physics news on Phys.org
  • #2


Dear Mike,

Thank you for your post. It seems like you have already made some good progress in fitting a straight line to your noisy data using least squares. To answer your question about the uncertainty in the slope and intercept, you will need to calculate the standard error of the estimates. This can be done by taking the square root of the variance of the slope and intercept, which can be found using the residual sum of squares (RSS) and the degrees of freedom (N-2) as follows:

Standard error of slope = √(RSS/(N-2))
Standard error of intercept = √(RSS*(∑x^2)/(N*(N-2)))

The variation in the estimated slope that you observed in your simulation is likely due to the fact that the standard error is proportional to σ/(√N^3). This means that as the number of points (N) increases, the standard error decreases at a rate faster than 1/√N. This is because the standard error is also influenced by the variance of the noise (σ), which is squared in the formula. Therefore, as N increases, the effect of the noise on the standard error decreases at a faster rate than the effect of the sample size.

I hope this helps to explain the unexpected behavior of the standard error in your simulation. Remember to always report the standard error along with your estimated slope and intercept to accurately convey the uncertainty in your fitted line. Good luck with your research!


 

1. What are fitting errors and how do they affect data analysis?

Fitting errors refer to the difference between the actual data points and the predicted values from a mathematical model. These errors can arise due to various factors such as measurement uncertainties, model assumptions, and data outliers. They can affect data analysis by skewing the results and making it difficult to accurately interpret the relationship between variables.

2. Can fitting errors be eliminated completely?

No, fitting errors cannot be completely eliminated. However, they can be reduced by using robust statistical methods, removing outliers, and carefully selecting appropriate models for the data. It is important to keep in mind that a small amount of error is expected in any data analysis and does not necessarily invalidate the results.

3. What is the relationship between fitting errors and residue?

The residue is the difference between the observed data points and the predicted values from the model. It is essentially the same as the fitting error. Therefore, the relationship between fitting errors and residue is that they are both measures of the difference between the data and the model.

4. How can we assess the significance of fitting errors?

The significance of fitting errors can be assessed by comparing them to the expected error or uncertainty in the measurement. This can be done by calculating the standard error or confidence intervals for the data. Additionally, statistical tests such as ANOVA or chi-square can be used to determine if the fitting errors are significantly different from zero.

5. What are some common sources of fitting errors?

Some common sources of fitting errors include measurement errors, model assumptions, and data outliers. Measurement errors can arise from equipment limitations, human error, or natural variability in the data. Model assumptions, such as linearity or normality, may not always hold true for real-world data and can contribute to fitting errors. Outliers, which are data points that are significantly different from the rest of the data, can also greatly impact the accuracy of the model and result in fitting errors.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
465
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
37
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
25
Views
2K
Back
Top