Confused about statistics terms

Click For Summary

Discussion Overview

The discussion revolves around the definitions and calculations of standard deviation in statistics, specifically the differences between population and sample standard deviation. Participants explore the formulas used for each type and the implications of using N versus N-1 in calculations.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • One participant expresses confusion about the definition of standard deviation, noting discrepancies between a physics text and other sources regarding the formulas used.
  • Another participant clarifies that the division by N corresponds to the population standard deviation, while division by N-1 corresponds to the sample standard deviation.
  • It is proposed that calculators typically assume a list of data represents a sample, thus using the sample standard deviation formula.
  • A participant emphasizes that the N-1 version is an unbiased estimator of the population standard deviation, not the population standard deviation itself.
  • Another participant discusses the implications of using N versus N-1, stating that the sample standard deviation is an approximation of the population standard deviation.
  • One participant requests further clarification on the distinction between the two types of standard deviation, indicating a lack of understanding from their introductory statistics course.
  • A later reply suggests that while the sample mean is an unbiased estimator of the population mean, the sample standard deviation does not directly yield the population standard deviation.
  • Another participant asserts that the true population variance can only be known by sampling every member of the population, and that repeated sampling improves the estimate.

Areas of Agreement / Disagreement

Participants express differing views on the definitions and implications of standard deviation calculations. There is no consensus on the interpretations of the formulas or the terminology used, indicating ongoing debate and uncertainty.

Contextual Notes

Participants highlight the importance of understanding the context in which standard deviation is calculated, noting that the choice between N and N-1 affects the bias of the estimates. There are also references to the limitations of calculators in determining which standard deviation formula to use.

Will
I have always thought that the operation:
((sum(x1-(x-ave))^2)...XN-(x-ave))^2/(N-1))^.5
was known as the standard deviation. But now my physics text says that it is:
((sum(x1-(x-ave))^2)...XN-(x-ave))^2/(N))^.5
and another website I went to says that yes, the second equation is the correct terminology and that the first equation is the square root of the bias-corrected variance, and that the two are often confused. So what operation does the "standard deviation"
operation on my TI-89 do? and how do I do the other one?
 
Physics news on Phys.org
I've only had an introductory course in Statistics, but I was taught that the division by N corresponded to the population's standard deviation whereas the division by n-1 corresponded to the sample's standard deviation.

i.e.,

\sigma = \sqrt{\frac{\sum{(x-\mu)^2}}{N}} will be used to find the standard deviation of a population,

and

S = \sqrt{\frac{\sum{(x-\overline{x})^2}}{n-1}} will be used to find the standard deviation of a sample of the population.

I would assume that your calculator would assume that a list is a sample, not a population, and so would use the second equation. However, I know that the TI-83+ will give you both if you perform 1-Var Stats on the list.

Perhaps someone more educated can help you more.

cookiemonster
 
Last edited:
cookie monster is totally correct.
Imagine you wanted to calculate the standard deviation of peoples heights. If you had every single persons height you calculate the population standard deviation with formula is
((sum(x1-(x-ave))^2)...XN-(x-ave))^2/(N))^.5.

More realistically you would estimate the standard deviation using a sample of 100 peoples heights. In this case use the sample standard deviation formula
((sum(x1-(x-ave))^2)...XN-(x-ave))^2/(n-1))^.5
cleary sample standard deviation is the default definition of standard deviation.

Sample standard deviation is usually denoted as "s" or "sigma (n-1)"
and population standard devitation is denoted by "sigma (n)". Your calculator certainaly would have "sigma(n-1)" button.
 
The difference between dividing by N vs. dividing by N-1 results from the fact that for the entire population, you can calculate the exact average, while for a sample you have an approximate average. When you calculate the mathematical expectation of the sample deviation (with N-1) it will equal the population deviation.
 
People appear to be making claims here that aren't true. The N-1 version does not give you the Population S.D. It is the unbiased estimator of the population deviation - ie the best we can do for certain constraints. The unbiased estimator of population mean is the sample mean, fortunately.

Just imagine we draw two samples from the same population. From what was written above one would think there are two S.D. of the population, one coming from each sample.
 
matt, would you explain the difference a bit more? I don't quite understand what you're getting at and my intro class never really got into enough detail to warrant a thorough treatment.

cookiemonster
 
Ok, so we have a population with unknown mean and standard deviation.We take a sample from it and we want to work out some statistics. We've got the mean and the ordinary standrad deviation lying around (the one dividing by n). The question asked is 'do these form an unbiased estimate of the population mean and s.d.?'

firstly X is an unbiased esitmator of a parameter Y if E(X)=Ythe sample mean is an unbiased estimator of the population mean but if you actually work out the mean of the standard deviation it is not the population s.d..

So we use the n-1 quantity instead which in our calculation above we found coming into it as a measure of the bais of our estimate.

My nit-picking was that you gave the ipmression that it IS the standard deviation of the population - it isn't that is unknown, it is an unbiased estimator of it, and is often called by abuse of notation the pop. s.d., which is subtly different, as there is an implication there that we mean more.CORRECTION

The square of the satistic we are referring to as n-1 is the unbiased estimators of the pop. variance, it is not in general itself an unbiased estimator of the pop. s.d., see eg the wolfram entry. My memory is getting terrible these days. At least I think it might be, I don't recall clearly.
 
Last edited:
So, if I'm reading Mathworld correctly and remembering the class correctly, if we repeatedly sampled a population and repeatedly calculated s_{N-1}^2 of these samples, the average of these values would yield the true variance?

cookiemonster
 
Not exactly. The only way to *know* the *true* population variance is to sample every member of the population. The more samples you take from a population the better the estimate will be of this.
 

Similar threads

  • · Replies 42 ·
2
Replies
42
Views
6K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 18 ·
Replies
18
Views
4K
  • · Replies 22 ·
Replies
22
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K