# Standard deviation

• I
I know that the standard deviation of sample = standrad deviation of population divided by sqrt(n) ...

However , in the following question , i don't know how to identify whether the standard deviation given is standard deviation of sample or standard deviation of population ... Can anyone help me to idenitify it ?

At a large university , the mean age of students is 22.3 years and the standard deviation is 4 years . A random sample of 64 students is drawn . What is the probability that the average of these students is greater than 23 years ?

Based on the author , the 4 given is standard deviation of population .
Th standard deviation of mean is 4 / sqrt(64) ..

Why is it so ?
I think it's wrong because we only picked 64 students out of the population , so the standard deviation we get is the standard deviation of sample , not the standard deviation of population

Staff Emeritus
Homework Helper
Gold Member
I know that the standard deviation of sample = standrad deviation of population divided by sqrt(n) ...
I think you better unlearn that. You may be thinking of the standard deviation in the estimate of the population mean.

The sample itself will (on average) have the same standard deviation as the population.

In addition, the question gives too little information to be solved. It also depends on how student ages are distibuted. The author may implicitly imply that the distribution is Gaussian, but I see no reason why this would be the case.

tzx9633
I think you better unlearn that. You may be thinking of the standard deviation in the estimate of the population mean.

The sample itself will (on average) have the same standard deviation as the population.

In addition, the question gives too little information to be solved. It also depends on how student ages are distibuted. The author may implicitly imply that the distribution is Gaussian, but I see no reason why this would be the case.
Why ? Can you explain further ?

I think you better unlearn that. You may be thinking of the standard deviation in the estimate of the population mean.

The sample itself will (on average) have the same standard deviation as the population.

In addition, the question gives too little information to be solved. It also depends on how student ages are distibuted. The author may implicitly imply that the distribution is Gaussian, but I see no reason why this would be the case.
Isn't that this case is we need to estimate the population standard deviation from sample standard deviation ??

Staff Emeritus
Homework Helper
Gold Member
Isn't that this case is we need to estimate the population standard deviation from sample standard deviation ??
No. This thread should really go in the homework section, where you need to show what you have done and explain your own thought about the problem in detail.

tzx9633
No. This thread should really go in the homework section, where you need to show what you have done and explain your own thought about the problem in detail.
What i think is :

We only take some random smaple from the population , it's quite impossible to take the whole population ...
So , the standard deviation 4 means standard deviation of the sample ...Not standard deviation of population

Staff Emeritus
Homework Helper
Gold Member
What i think is :

We only take some random smaple from the population , it's quite impossible to take the whole population ...
So , the standard deviation 4 means standard deviation of the sample ...Not standard deviation of population
As I said, the standard deviation of the sample is typically going to be the same as that of the population. Do not confuse it with the standard deviation in the estimate of the mean.

As I said, the standard deviation of the sample is typically going to be the same as that of the population. Do not confuse it with the standard deviation in the estimate of the mean.
Why the standard deviation of the population is going to be the same as the standard deviation of the sample ?

Staff Emeritus
Homework Helper
Gold Member
Because the standard deviation of a sample is defined in such a way that it is a good estimator for the standard deviation of the stochastic variable being sampled. This should be in any basic textbook on statistics.

Mentor
2022 Award
The standard deviation of a sample is in general an estimator of the standard deviation of the random variable. Here is where we need additional information about its distribution, since it is a random variable again. Best case it is identically distributed. There are relations between the number a sample needs in dependency on required confidence intervals and the overall standard deviation. I haven't checked your figures and didn't see, how the students' ages are distributed. Maybe these information are sufficient to get the required equation.

Last edited:
Gold Member
What i think is :

We only take some random smaple from the population , it's quite impossible to take the whole population ...
So , the standard deviation 4 means standard deviation of the sample ...Not standard deviation of population
I think it is assumed in the problem -- don't know how realistically -- that the population s.d is somehow known to be 4.

Gold Member
The sample itself will (on average) have the same standard deviation as the population.

It's a technical point, but this is subtly wrong. On average your variance estimate should line up with that of the population. (Why? Because you have linearity with respect to independent samples/trials and estimating ##X^2## and ##X##, and can then apply weak law of large numbers.)

Setting aside the zero variance case (and assuming a finite second moment):

The square root function ##g(v) = \sqrt{v}## is strictly negative convex over positive numbers and which means ##E\big[g(v)\big] \lt g\big(E[v]\big)##. Hence if variance is right on average, standard deviation cannot be.

As a note to OP: this is one of many reasons to generally work with variance, not standard deviation.

Staff Emeritus
Homework Helper
Gold Member
It's a technical point, but this is subtly wrong. On average your variance estimate should line up with that of the population. (Why? Because you have linearity with respect to independent samples/trials and estimating X2X2X^2 and XXX, and can then apply weak law of large numbers.)
I disagree, there is nothing subtle about it. I would agree that it is just "wrong". I glossed over this point based on the OP level, but errors should be corrected.

fresh_42
Gold Member
The square root function g(v)=√vg(v) = \sqrt{v} is strictly negative convex over positive numbers and which means E[g(v)]<g(E[v])E\big[g(v)\big] \lt g\big(E[v]\big). Hence if variance is right on average, standard deviation cannot be.
Hmm, why do you use < instead of <= ? In which case I don't quite agree with the "cannot be" (definite statement), and instead would say "may not be".

Gold Member
Hmm, why do you use < instead of <= ? In which case I don't quite agree with the "cannot be" (definite statement), and instead would say "may not be".

Look up strict convexity, its implications for use of Jensen's Inequality.

ChrisVer
Gold Member
Look up strict convexity, its implications for use of Jensen's Inequality.
OK makes sense now, thanks.

Relating statistical problems to the theory of probability is often difficult because of ambiguities in terminology. (These ambiguities are traditional in the field of statistics and not the fault of students.)

One difficulty is that a statistic computed from a "sample" of a given size can also be considered to be a "population".

If we consider the means of samples of size n to be a "population" , that population has a certain distribution. This distribution has its own standard deviation. With that interpretation, the statement:

I know that the standard deviation of sample = standard deviation of population divided by sqrt(n) ...

is correct if "standard deviation of sample" signifies "standard deviation of the population of sample means".

However, keep in mind that the phrases "sample standard deviation" and "standard deviation of the sample" can have other interpretations. They might refer one of the various formulae for estimating the standard deviation of the population from the data in a sample. (Have you studied "estimators" yet?) They might also refer to a single number, such as in the statement "the standard deviation of the sample was 13.97".

(As an example of the tangles caused by ambiguous terminology, see the old thread: https://www.physicsforums.com/threads/standard-deviation-in-excel.371424/ )

The standard deviation of mean is 4 / sqrt(64) ..

Why is it so ?
I think it's wrong because we only picked 64 students out of the population , so the standard deviation we get is the standard deviation of sample , not the standard deviation of population

By itself, the term "mean" is ambiguous. It might signify the mean of the distribution of the ages of students or it might refer to the the mean of the population of means of samples of size n. You are correct that 4/sqrt(64) is not the standard deviation of the distribution of individual student's ages. The author is correct that 4/sqrt(64) is the standard deviation of the population of means of samples of size 64.

WWGD
Number Nine
The sample itself will (on average) have the same standard deviation as the population.

The "sample standard deviation" (the sum of squared deviations from the mean, divided by n) is a biased estimator of the population SD. It's actually very difficult to construct an unbiased estimator of the standard deviation, even for a normal distribution.

Staff Emeritus