Root Mean Square Error, a straight line fit and a gradient issue

Click For Summary

Homework Help Overview

The discussion revolves around the application of Root Mean Square Error (RMSE) in the context of a straight line fit for data obtained from a physics lab experiment. The original poster is exploring the theoretical aspects of RMSE and its implications when working with a small sample size of three data points.

Discussion Character

  • Conceptual clarification, Assumption checking

Approaches and Questions Raised

  • The original poster attempts to understand how to apply a correction factor to RMSE when dealing with a small number of data points. Some participants question the interpretation of the accuracy of the measurements and the implications of having only three readings.

Discussion Status

Participants are exploring the theoretical underpinnings of RMSE and its application to small sample sizes. There is a recognition of the challenges posed by limited data points, and some guidance is offered regarding the use of correction factors, though no consensus has been reached on the specific application of these factors.

Contextual Notes

The original poster notes that the RMSE is large due to the small sample size, and there is uncertainty regarding the derivation of the RMSE equation for a straight line fit. Additionally, there are constraints related to the number of readings taken, which may affect the reliability of the results.

K29
Messages
103
Reaction score
0
I have some measurements from a physics lab experiment and I am coding in Matlab a fit for the data. [Note this is not a problem with Matlab, my problem here is theory]

In normal regression of statistics the RMSE is given by:

s=\frac{\sigma}{\sqrt{n}} =\sqrt{\frac{\Sigma (\epsilon _i)^2}{n(n-1)}}
where \sigma is the standard deviation or Root Mean Square Deviation.

Now, according to my physics lab manual:

"For large n the standard error of the mean implies 68% confidence interval. For small n this is not reliable and it is necessary to multiply \sigma by a certain factor t, to obtain the appropriate confidence interval."

They then give a table with t= 12.7 for n = 2; t = 4.3 for n =3 (t is reduced by a factor of 1/3.6 for each n)

Onwards...

The root mean square error for the straight line fit is given by:
S_{y}=\sqrt{\frac{\Sigma(\delta y_{i}^{2})}{n-2}}

The error in the gradient of the straight line fit is:
S_{m}=S_{y}\sqrt{\frac{\Sigma x_{i}^{2}}{n \Sigma (x_{i}^{2})-(\Sigma x_{i})^2 }}

Now for my plot I have only 3 data points. They are however, very accurate. The root square is about 0.98. (the fit explains 98% of the total variation in the data about the average.)

But the RMSE is quite large due to there being only 3 data points. My error for gradient is therefore ridiculously large. I can not find anywhere how the RMSE equation for the graph is actually derived, therefore I am having difficulty working out how/if/where I am to multiply the factor t into the RMSE equation for a straight line.

Can anyone please help? Thanks in advance
 
Physics news on Phys.org
The statistics of your experiment come from the fact that you took 3 readings. Think about it like this: out of the "universe of possible readings" you picked 3 of them. If you take enough readings, you expect some randomness to show up.

Your statement that they are "very accurate" may be true. It sounds like it may be based on your domain knowledge. Perhaps, you can only take 3 readings due to time or cost constraints. You should document your procedure, so that anybody else, who wants to repeat the procedure, can get comparable results.
 
  • Like
  • Love
Likes   Reactions: jim mcnamara and malawi_glenn
A technical point. Standard deviation, at least to my knowledge, often denoted as ##\sigma##, is used for the population parameter, while S.E is used as the standard deviation of a random variable, obtained from sample data.
 
K29 said:
Now for my plot I have only 3 data points. They are however, very accurate. The root square is about 0.98. (the fit explains 98% of the total variation in the data about the average.)
By "very accurate" do you mean that you measured them very accurately or that you have some subject-matter reason to think that there is very little random variation in the results, or that the R-squared of the linear regression is large? Those three ways to interpret "very accurate" have very different implications.
K29 said:
But the RMSE is quite large due to there being only 3 data points. My error for gradient is therefore ridiculously large. I can not find anywhere how the RMSE equation for the graph is actually derived, therefore I am having difficulty working out how/if/where I am to multiply the factor t into the RMSE equation for a straight line.
I am not familiar with those multipliers for such small samples, but it sounds like you should just multiply ##S_y## by them.
 
  • Like
Likes   Reactions: scottdave

Similar threads

  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 42 ·
2
Replies
42
Views
6K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 11 ·
Replies
11
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
3
Views
2K