# Errors and consistency

1. Apr 10, 2015

### Astudious

The inspiration for this thread is the following question:

"Without further calculation, say whether the observations are consistent with
the set of values [1.215; 1.216; 1.209; 1.212] independently reported for c."

I had previously found that c = 1.150259067 = 1.15 (3sf) and from error analysis that STDEV(c) = 0.0631654689 = 0.06 (1sf).

Now, I am wondering what, fundamentally, we would do to find out whether data is in agreement with a value we have estimated by some other means. I am tempted to (as a general way of handling such queries) take the mean of the data set (here it would be 1.213) and compare to c +/- 1*STDEV(c) (here it would be 1.213424536 which is larger than 1.213, hence we would say "agreement" on this basis) but something does not ring right about this approach.

The data set has two values outside 1 standard dev of the predicted mean. Not only that, but the error of the data set is 3.5*10-3 and the mean value found is nowhere near this close to the c value found earlier. Just from inspection, we can see how odd these measurements would be if our value of c were correct. I suspect this would barely pass a 2% right-tail hypothesis test.

So how do we judge whether the calculated value is consistent with the newly measured data?

2. Apr 11, 2015

### Simon Bridge

Note, a value is consistent or not within a confidence interval ... it cannot be consistent all by itself.
A common confidence interval to call something "consistent" is 95% or 2 standard deviations from the mean.
If you are comparing two measurements, each with their own uncertainty, then you may want to look at the distribution of the difference... i.e. is the difference within 2sd of 0?

"the error on the data set" does not make sense in this context ... you have 4 independently quoted values ... maybe from 4 different experimenters using the same method as you but probably using different methods. You are not told what the uncertainty on each measurement was - you are not told that they all come from the same distribution (method). Implicitly, the uncertainty on each value in the data set is $\pm0.0005$ since they are quoted to 3dp.

By rule of thumb:

You have an experimental value $1.20\pm0.06$ units, so any value between 1.08 and 1.32 units will be consistent within 95% confidence limits.
(It is uncommon to quote a value to the same dp as the uncertainty unless the uncertainty significant figure is a 1 or a 2 - much the same reasoning as why you usually only report 1sig fig for the uncertainty.)

The closer the figures to the value you got, the better the agreement, so 68% confidence limits are better ... that would be values between 1.14 and 1.26 ... all the values fall within those.

3. Apr 11, 2015

### Astudious

I thought that the error from a set of data (for variable x) was to be estimated as (1/2)(xmax-xmin) where xmax is the largest value of x and xmin the smallest value of x in the set?

The experimental value here was actually $1.15\pm0.06$ units. Hence why I put in exact values ($1.150259067 \pm 0.0631654689$ to be exact).

Some of the individual values are consistent within the 68% confidence limit (1 standard dev) but some are not. That's why the question arises, how to deal with this problem in general. As I noted in the OP, I tried looking at the mean of the data set, which is just about within 1 standard dev of the experimental value, but I'm not convinced. From the perspective of the data set, the experimentally determined value is nowhere near within 1 standard-dev of that (if you take the uncertainty from the data-set as (1/2)(cmax-cmin)).

4. Apr 11, 2015

### Simon Bridge

Do not be fooled by the large number of decimal places, they are meaningless. These values are not exact and by quoting them you are not being exact.
You may keep 2dp if it makes you feel more comfortable.

This is not correct. Well, I don't know what you were told to do but this would be a poor approach to use. (It is a common method taught to beginners to propagate errors through calculations, not how you estimate errors from data.)

Note. all measurements and their errors are estimations and there are many ways to get that estimation.

To get your value you probably got a data set from repeated measurements of the same thing by the same method. From that data you estimate the actual value by the mean of all the measurements in the data set. You estimate the uncertainty, how bad that estimate was, by the standard deviation of the data set divided by the square root of the number of data points.

If you want to know if four more data points are consistent with this, you are basically asking if they all could have come from the same distribution that gave rise to the data set you used. There are a number of different ways to go about this. The fastest is to compare each point. Which is appropriate depends on what you think the values in the data set represent.

I already told you how to deal with the problem in general last post... but don't take my word for it: