Moppy said:
Please correct me where I am wrong but it seems to me that you could generate a very high value of sigma (e.g. 6 sigma accuracy) from a very small sample size. How then is sigma on its own reliable?
You are not being precise, that is where you are wrong. "Standard deviation" has many different meanings. Among these are:
1. The standard deviation of a probability distribution (also also called the "population standard deviation"
2. The function that defines how to compute the standard deviation of a sample. This is called the "sample standard deviation". (There are at least two different definitions for which formula to use, depending on which books you consult.)
3. A specific value of the sample standard deviation function, as in the phrase "the sample standard deviation" is 26.52".
4. A function that defines a formula for estimating the standard deviation of the population using as its inputs the values in a sample. This is "an estimator for the standard deviation"
5. A specific value of an estimator for the standard deviation, as in the phrase "the standard deviation is 26.52" . (It's better to say "An estimate for the standard deviation is 26.52".)
You haven't said what you mean by 6 sigma "accuracy". Presumably you are alluding to the fact that the probability is very high that that one random sample from a normal distribution will be within plus or minus six standard deviations of the mean, where "standard devation" means the standard deviation of the distribution and "mean" means the mean of the distribution. I suspect that you think that a similar statement also applies to the "sample standard deviation" and "sample mean". It does not apply.
Let me see if I understand sigma.
To determine the standard deviation I first compute the mean average. For each data point, I take the difference from the mean, square it, determine the average of the squares, and then take the square root. Is this that one sigma?
That is one way to define the "sample standard deviation" and it is one estimator for the population standard deviation. This estimator, on the average, underestimates the population standard deviation.
If it is one sigma, with a mean of 20 and standard deviation of 2, values of 18-22 represent one sigma accuracy? Values of <8 or >32 represent 6 sigma accuracy?
Are you taling about the sample mean and sample standard deviation or the population mean and population standard deviation? And we again have the question of what you mean by "six sigma accuracy".
What additional checks or sample size must accompany the sigma value to make it reliable?
If you are talking about specific numerical values of the sample standard deviation and the sample mean, it is not reliable to think that the probability is nearly 1 that a random sample from the distribution has a probability of nearly 1 of being within 6 of those particular sample sigmas from that particlar sample mean. This kind of thinking can't be made reliable.