Justhanging said:
I have couple questions about this and I was hoping someone with some stats knowledge could clarify.
First, when people report numbers such as 10 plus or minus 5, what does the 5 mean? Is it the standard deviation or the confidence interval or the variance? What is the relationship between all these terms?
Secondly, when a linear regression is found on excel (or some other software) and the standard error of the slope and intercept are calculated, how do I get from this value to the plus or minus value used above. Basically, what I'm asking is how is standard error related to the standard deviation, confidence interval, or plus or minus values?
Also, how do I use the propagation of error equations? What do I use for the uncertainty in each variable?
There is a lot of jargon here that I don't really understand, can someone clarify?
In a linear regression formula based on software; you are generally dealing with "sample" deviation and not a true standard deviation. You *CAN* get an unbiased estimate of the standard deviation from a sample deviation. The formula is listed in some software I wrote, here:
https://www.physicsforums.com/showthread.php?t=561799
I made small mistakes in post #1, and post #4 (I was trying to work and test the problem out as I went along) Please see post #5 for the correct formula.
I do discuss the nomenclature of +- as specified by the National Institute of Standards and Technology (NIST) which is a published specification in American (USA) common usage. So if you come across a number like 506(1), that would indicate 506 with a *standard deviation* of 1 in the last digit. That means (in this example) that the last digit (6) would be '7' or '5' or closer to '6', about 68.2% of the time.
In propagation of error equations, the typical ones, use standard deviation.
For example, 32(3) + 11(2) + 5(1) would equal {32+11+5}( √(3**2 + 2**2 + 1**2) );
The errors (standard deviations) add as if they were orthogonal axii. (Pythagorean).
If your error isn't reported in NIST format (for repeated measurements); then the other poster's comments apply -- this basic formula only works when the data is un-correlated.
In order to check if your data is un-correlated, you need to look at the residuals {eg: the difference between each data point and the "fitting" line.} Correlation is visually noticed as clusters of data, or a curvature of the data in a predictable way.
The ideal residual pattern of *uncorrelated data* is white random noise; eg: the residuals will fill a roughly rectangular area when plotted and the individual data points will "stipple" out the rectangle evenly across the entire line-fit; ( However, there will be no *other* rhyme nor reason to the data point's locations.)
Typical linear regression formulas base the slope of the line on the ratio of the sample deviation on the y axis, to that of the x axis. In software, however, other techniques such as iterative random variation are often used -- there are *many* variations on that theme. I don't even know what Excel does, myself!
For correlated data, the error propagation formulas become more complex; In the simpler form, (Pearson statistic), the correlation causes the simple sum of squares of sample deviations into an equation which is quadratic in nature. Eg: it adds product terms between each pair of squares in proportion to the Pearson "correlation" value.
(Covariance matrix values).
Error propagation (AKA numbers with uncertainties); can also take the path of choosing error bounds, which is what I think you are asking about when you say "confidence interval"; In that case, the "error bounds" are often directly added; and no assumptions are made concerning the correlation of the data. The sum of squares formula is discarded.
Caution: When multiplying two numbers with uncertainty, where each is assumed to have a standard or sample deviation; The result is *not* a normal distribution, and the deviations are in fact dealing with mildly correlated data.
This is a problem I am still trying to solve and understand well myself. I have discovered that the typical error propagation formulas for multiplication can be *quite* inaccurate depending on the magnitude of the data, and that of the variation (error).
I haven't solved that problem yet myself ... so if you learn anything useful, please pass it on...
