Individual Measurement Uncertainty vs. Standard Error of Regression

In summary, the student conducts a simple experiment with 10 trials at each x value and collects data over 30 x values, resulting in 300 total trials. They make a linear plot in Excel using the average y values and standard deviations for error bars. When adding a trendline and calculating the slope and standard error in Excel, they notice that the standard error does not use the individual y uncertainties in its calculation. However, this is because Excel uses a simple least squares method for regression, where the 'error bar' input is only used for displaying error bars on the chart. The standard error in the slope may not fully represent the uncertainty in the slope, but there is no other underlying concept of uncertainty in the slope. If the student
  • #1
highschoolphys
21
0
Let's say a student does a simple experiment where she conducts 10 trials at each x value (at each value of the independent variable). She collects data over 30 x values, giving her 300 total trials. For each of the 30 x values, she averages the 10 y values and she calculates the standard deviation in those 10 y values. She makes a plot of average y vs. x in Excel, and uses the standard deviations for y error bars. Assume the plot is linear. Assume there's no error/uncertainty in individual x values. Perhaps also assume the individual y errors are all uniformly distributed and equal. I want this to be a simplest possible case.

Next, the student (a) adds a linear trendline in Excel, (b) has Excel calculate the slope of her line of best fit, and (c) has Excel calculate the standard error in the slope.

I have three questions:

  1. Why doesn't the standard error in the slope use the individual y uncertainties as inputs to its calculation? What is the theoretical basis for using a standard error calculation that ignores the individual y measurement errors/uncertainties?
  2. Does the standard error in the slope fail to represent something crucial about the uncertainty in the slope, due to the fact that the standard error calculation ignores the individual y measurement uncertainties?
  3. Assume the student uses the equation from the top line (below) to calculate manually the slope of her best fit line. She then uses the rules of uncertainty propagation to propagate the individual y measurement uncertainties through this equation. In so doing, she obtains a value of uncertainty in the slope. How would this value compare to the Excel-calculated standard error in the slope?

55aaff885a8e1dd20266d8d6c898cd35.png
 
Physics news on Phys.org
  • #2
high schoolphys said:
Why doesn't the standard error in the slope use the individual y uncertainties as inputs to its calculation?

There is a method that uses the uncertainties in the individual data points: a weighted least-squares fit. See here, for example, and scroll down to page 8 ("Weighted Least Squares Straight Line Fitting"):

https://www.che.udel.edu/pdf/FittingData.pdf
 
Last edited:
  • #3
Thanks for the reply! As I understand it, a weighted least-squares fit is used only if the y errors (and x errors, if using an orthogonal regression) differ among different data points. Under the simplest circumstances, the individual y errors are all the same, and a weighted least-squares fit simplifies into a regular unweighted regression.

I'm interested in is the simplest case where weighting is unnecessary. Even in this simplest case, I think there is a theoretical reason for seemingly ignoring the individual y measurement uncertainty when calculating the standard error of the slope.

I think the correct reason is this: if the individual y error/uncertainty is the same for all data points, then--across the entire data set--the y values will fluctuate within that fixed y uncertainty. The standard error of the regression, also called the standard error of the estimate, uses the residuals to estimate the average y error in the data set. Hence, the individual y errors are not ignored in the standard error of the regression. Rather, the standard error of the regression represents the average amount of y error in each measurement in either direction (i.e., not distinguishing between positive error wherein the y measurement is above the true value and negative error wherein the y measurement is below the true value). If this is correct, then it would answer question #1 from my original post, since I believe the standard error of the slope is calculated from the standard error of the regression.

I believe questions #2 and #3 in my original post are still unanswered. Any guidance is much appreciated!
 
Last edited:
  • #4
Please also correct any improper conflation of the terms "residual" and "error" in my posts, along with any other incorrect usage of statistics terms.
 
Last edited:
  • #5
I think the point is that in most situations you don't know the standard error of y but you have to estimate it from your data. Linear regression does exactly this.
 
  • #6
Thanks! Many physicists use the standard error in the slope (which I believe is calculated from the standard error of the regression or the SEM) as the uncertainty in the slope. This practice is what I'm interested in, particularly since in simple manipulations of data the uncertainty is propagated through the operation. In regression the approach of propagating uncertainty appears to be abandoned despite the fact that an operation is being performed to calculate the slope from other values which have uncertainty. I'm trying to better understand the justification for abandoning the uncertainty propagation rules in favor of the standard deviation value, which is calculated from the residuals.

NOTE: If I stated previously that the individual y measurement errors are KNOWN, I misspoke. In my original example, I intend for the individual y standard deviations (uncertainties) to be known and for the individual y errors to be unknown prior to the regression.
 
Last edited:
  • #7
high schoolphys said:
Why doesn't the standard error in the slope use the individual y uncertainties as inputs to its calculation? What is the theoretical basis for using a standard error calculation that ignores the individual y measurement errors/uncertainties?

Because Excel fits the slope to the inputs ##\{(x, \bar{y})\}## using a simple least squares method. The 'error bar' input is only used for drawing error bars on the chart, it is not used for any statistical analysis.

high schoolphys said:
Does the standard error in the slope fail to represent something crucial about the uncertainty in the slope, due to the fact that the standard error calculation ignores the individual y measurement uncertainties?

You are assuming that there is some quantifiable underlying concept of "uncertainty in the slope" independent of the statistical method that is used to estimate the slope - there isn't.

high schoolphys said:
Assume the student uses the equation from the top line (below) to calculate manually the slope of her best fit line.

That's one form of the equation for OLS (Ordinary Least Squares) regression, which is what Excel uses.

high schoolphys said:
She then uses the rules of uncertainty propagation to propagate the individual y measurement uncertainties through this equation. In so doing, she obtains a value of uncertainty in the slope. How would this value compare to the Excel-calculated standard error in the slope?

It would differ by the difference between the method she has used and OLS, which is what Excel uses. Is there any statistical basis for her calculation?

If you want a more sophisticated analysis of your data, you need to use a more sophisticated tool than OLS - either weighted least squares or some better method.
 
  • #8
Thanks so much MrAchovy!

MrAnchovy said:
You are assuming that there is some quantifiable underlying concept of "uncertainty in the slope" independent of the statistical method that is used to estimate the slope - there isn't.

This is very helpful to me.

MrAnchovy said:
It would differ by the difference between the method she has used and OLS, which is what Excel uses. Is there any statistical basis for her calculation?

I am unaware of the statistical basis for the rules of uncertainty propagation. Here are the rules I'm thinking of (pasted in from the Wikipedia article):

f72f7a61c57c58a0635e7197697fad45.png


If I understand you correctly, the equation from my original post for calculating the slope is the version of OLS that Excel uses. According to that equation, the estimated slope of the best fit line is a function of the individual x and y measurements. Consider that the student used Excel to calculate the slope and the standard error of the slope. The student then used the uncertainty propagation rules pictured above to calculate the uncertainty in the slope based upon the uncertainty in the individual x and y measurements.

How would the standard error of the slope (as reported by Excel) compare to the uncertainty in the slope (as calculated by the student using the error propagation rules)?

If the two values are different, then my follow-up questions are: (a) why is there a discrepancy, and (b) which of the two values is better--the standard error of the slope or the uncertainty generated by the propagation rules cited above in this post?
 
Last edited:
  • #9
I don't think the independence conditions of that rule apply to the OLS calculation - think about it, the more points you add the more confident you should become in the fit (unless they are outliers) but in that equation the more points you add the greater ## s_f ## becomes.

If you want to measure the goodness of fit to all the data, search for "linear regression goodness of fit"; I think an f-test is probably a better place to start than the regression coefficient.
 
  • #10
How do you estimate the individual uncertainties? Probably as ##\sum_j(y_ij-\bar{y}_i)^2/(n_i-1)## however, the ##\bar{y}_i## are not independent from each other, as they are bound to lie on the regression line. So you need to solve the regression equation first. Also, by assumption, the variances are equal, so you can get a better estimate by using the combined estimate from the linear regression.
 
  • #11
THANKS to all! This is very helpful to me.
 

1. What is the difference between individual measurement uncertainty and standard error of regression?

Individual measurement uncertainty is a measure of the variation in a single measurement, while standard error of regression is a measure of the accuracy of a regression line in predicting values from a dataset.

2. How are individual measurement uncertainty and standard error of regression calculated?

Individual measurement uncertainty is typically calculated by taking the standard deviation of a set of repeated measurements, while standard error of regression is calculated using statistical methods such as least squares regression.

3. Which one is more useful for assessing the reliability of a measurement?

Individual measurement uncertainty is more useful for assessing the reliability of a single measurement, while standard error of regression is more useful for assessing the overall accuracy of a regression model.

4. Can individual measurement uncertainty and standard error of regression be used interchangeably?

No, they are two different measures that serve different purposes and cannot be used interchangeably.

5. How can the results of individual measurement uncertainty and standard error of regression be used in practical applications?

The results of individual measurement uncertainty can be used to determine the precision of a single measurement and to identify sources of error, while the results of standard error of regression can be used to evaluate the accuracy of a regression model and make predictions based on the model.

Similar threads

  • Other Physics Topics
Replies
1
Views
2K
  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
920
  • Other Physics Topics
Replies
3
Views
6K
  • Other Physics Topics
Replies
13
Views
3K
  • Calculus and Beyond Homework Help
Replies
3
Views
913
  • Other Physics Topics
Replies
6
Views
2K
  • Other Physics Topics
Replies
3
Views
1K
Replies
3
Views
1K
  • Introductory Physics Homework Help
Replies
15
Views
1K
Back
Top