The Standard Deviation of Sample vs. Population in Probability Calculations

In summary, the standard deviation of a sample is typically going to be the same as that of the population. However, in the given conversation, the author states that the standard deviation of 4 is the standard deviation of the population. This may imply that the population standard deviation is somehow known. In general, the standard deviation of a sample is an estimator for the standard deviation of the stochastic variable being sampled, but it may not always line up with the population standard deviation on average. Therefore, it is recommended to work with variance instead of standard deviation.
  • #1
tzx9633
I know that the standard deviation of sample = standrad deviation of population divided by sqrt(n) ...

However , in the following question , i don't know how to identify whether the standard deviation given is standard deviation of sample or standard deviation of population ... Can anyone help me to idenitify it ?

At a large university , the mean age of students is 22.3 years and the standard deviation is 4 years . A random sample of 64 students is drawn . What is the probability that the average of these students is greater than 23 years ?

Based on the author , the 4 given is standard deviation of population .
Th standard deviation of mean is 4 / sqrt(64) ..

Why is it so ?
I think it's wrong because we only picked 64 students out of the population , so the standard deviation we get is the standard deviation of sample , not the standard deviation of population
 
Physics news on Phys.org
  • #2
tzx9633 said:
I know that the standard deviation of sample = standrad deviation of population divided by sqrt(n) ...
I think you better unlearn that. You may be thinking of the standard deviation in the estimate of the population mean.

The sample itself will (on average) have the same standard deviation as the population.

In addition, the question gives too little information to be solved. It also depends on how student ages are distibuted. The author may implicitly imply that the distribution is Gaussian, but I see no reason why this would be the case.
 
  • Like
Likes tzx9633
  • #3
Orodruin said:
I think you better unlearn that. You may be thinking of the standard deviation in the estimate of the population mean.

The sample itself will (on average) have the same standard deviation as the population.

In addition, the question gives too little information to be solved. It also depends on how student ages are distibuted. The author may implicitly imply that the distribution is Gaussian, but I see no reason why this would be the case.
Why ? Can you explain further ?
 
  • #4
Orodruin said:
I think you better unlearn that. You may be thinking of the standard deviation in the estimate of the population mean.

The sample itself will (on average) have the same standard deviation as the population.

In addition, the question gives too little information to be solved. It also depends on how student ages are distibuted. The author may implicitly imply that the distribution is Gaussian, but I see no reason why this would be the case.
Isn't that this case is we need to estimate the population standard deviation from sample standard deviation ??
 
  • #5
tzx9633 said:
Isn't that this case is we need to estimate the population standard deviation from sample standard deviation ??
No. This thread should really go in the homework section, where you need to show what you have done and explain your own thought about the problem in detail.
 
  • Like
Likes tzx9633
  • #6
Orodruin said:
No. This thread should really go in the homework section, where you need to show what you have done and explain your own thought about the problem in detail.
What i think is :

We only take some random smaple from the population , it's quite impossible to take the whole population ...
So , the standard deviation 4 means standard deviation of the sample ...Not standard deviation of population
 
  • #7
tzx9633 said:
What i think is :

We only take some random smaple from the population , it's quite impossible to take the whole population ...
So , the standard deviation 4 means standard deviation of the sample ...Not standard deviation of population
As I said, the standard deviation of the sample is typically going to be the same as that of the population. Do not confuse it with the standard deviation in the estimate of the mean.
 
  • #8
Orodruin said:
As I said, the standard deviation of the sample is typically going to be the same as that of the population. Do not confuse it with the standard deviation in the estimate of the mean.
Why the standard deviation of the population is going to be the same as the standard deviation of the sample ?
 
  • #9
Because the standard deviation of a sample is defined in such a way that it is a good estimator for the standard deviation of the stochastic variable being sampled. This should be in any basic textbook on statistics.
 
  • #10
The standard deviation of a sample is in general an estimator of the standard deviation of the random variable. Here is where we need additional information about its distribution, since it is a random variable again. Best case it is identically distributed. There are relations between the number a sample needs in dependency on required confidence intervals and the overall standard deviation. I haven't checked your figures and didn't see, how the students' ages are distributed. Maybe these information are sufficient to get the required equation.
 
Last edited:
  • #11
tzx9633 said:
What i think is :

We only take some random smaple from the population , it's quite impossible to take the whole population ...
So , the standard deviation 4 means standard deviation of the sample ...Not standard deviation of population
I think it is assumed in the problem -- don't know how realistically -- that the population s.d is somehow known to be 4.
 
  • #12
Orodruin said:
The sample itself will (on average) have the same standard deviation as the population.

It's a technical point, but this is subtly wrong. On average your variance estimate should line up with that of the population. (Why? Because you have linearity with respect to independent samples/trials and estimating ##X^2## and ##X##, and can then apply weak law of large numbers.)

Setting aside the zero variance case (and assuming a finite second moment):

The square root function ##g(v) = \sqrt{v}## is strictly negative convex over positive numbers and which means ##E\big[g(v)\big] \lt g\big(E[v]\big)##. Hence if variance is right on average, standard deviation cannot be.

As a note to OP: this is one of many reasons to generally work with variance, not standard deviation.
 
  • #13
StoneTemplePython said:
It's a technical point, but this is subtly wrong. On average your variance estimate should line up with that of the population. (Why? Because you have linearity with respect to independent samples/trials and estimating X2X2X^2 and XXX, and can then apply weak law of large numbers.)
I disagree, there is nothing subtle about it. I would agree that it is just "wrong". I glossed over this point based on the OP level, but errors should be corrected.
 
  • Like
Likes fresh_42
  • #14
StoneTemplePython said:
The square root function g(v)=√vg(v) = \sqrt{v} is strictly negative convex over positive numbers and which means E[g(v)]<g(E[v])E\big[g(v)\big] \lt g\big(E[v]\big). Hence if variance is right on average, standard deviation cannot be.
Hmm, why do you use < instead of <= ? In which case I don't quite agree with the "cannot be" (definite statement), and instead would say "may not be".
 
  • #15
ChrisVer said:
Hmm, why do you use < instead of <= ? In which case I don't quite agree with the "cannot be" (definite statement), and instead would say "may not be".

Look up strict convexity, its implications for use of Jensen's Inequality.
 
  • Like
Likes ChrisVer
  • #16
StoneTemplePython said:
Look up strict convexity, its implications for use of Jensen's Inequality.
OK makes sense now, thanks.
 
  • #17
Relating statistical problems to the theory of probability is often difficult because of ambiguities in terminology. (These ambiguities are traditional in the field of statistics and not the fault of students.)

One difficulty is that a statistic computed from a "sample" of a given size can also be considered to be a "population".

If we consider the means of samples of size n to be a "population" , that population has a certain distribution. This distribution has its own standard deviation. With that interpretation, the statement:

I know that the standard deviation of sample = standard deviation of population divided by sqrt(n) ...

is correct if "standard deviation of sample" signifies "standard deviation of the population of sample means".

However, keep in mind that the phrases "sample standard deviation" and "standard deviation of the sample" can have other interpretations. They might refer one of the various formulae for estimating the standard deviation of the population from the data in a sample. (Have you studied "estimators" yet?) They might also refer to a single number, such as in the statement "the standard deviation of the sample was 13.97".

(As an example of the tangles caused by ambiguous terminology, see the old thread: https://www.physicsforums.com/threads/standard-deviation-in-excel.371424/ )

The standard deviation of mean is 4 / sqrt(64) ..

Why is it so ?
I think it's wrong because we only picked 64 students out of the population , so the standard deviation we get is the standard deviation of sample , not the standard deviation of population

By itself, the term "mean" is ambiguous. It might signify the mean of the distribution of the ages of students or it might refer to the the mean of the population of means of samples of size n. You are correct that 4/sqrt(64) is not the standard deviation of the distribution of individual student's ages. The author is correct that 4/sqrt(64) is the standard deviation of the population of means of samples of size 64.
 
  • Like
Likes WWGD
  • #18
Orodruin said:
The sample itself will (on average) have the same standard deviation as the population.

The "sample standard deviation" (the sum of squared deviations from the mean, divided by n) is a biased estimator of the population SD. It's actually very difficult to construct an unbiased estimator of the standard deviation, even for a normal distribution.
 
  • #19
Number Nine said:
The "sample standard deviation" (the sum of squared deviations from the mean, divided by n) is a biased estimator of the population SD. It's actually very difficult to construct an unbiased estimator of the standard deviation, even for a normal distribution.
This was already discussed and sorted out. See posts #12 and #13.
 

What is the Standard Deviation of a Sample?

The Standard Deviation of a Sample is a measure of how spread out the data points in a sample are from the mean. It is calculated by finding the difference between each data point and the mean, squaring those differences, finding the average of those squared differences, and then taking the square root of that average.

What is the Standard Deviation of a Population?

The Standard Deviation of a Population is a measure of how spread out the data points in a population are from the mean. It is calculated in the same way as the Standard Deviation of a Sample, but it takes into account all of the data points in the population, not just a subset of them.

Why do we use Standard Deviation in Probability Calculations?

Standard Deviation is used in Probability Calculations because it helps us understand the spread of data and how likely it is for a data point to fall within a certain range. It is particularly useful in normal distribution curves, where we know that a certain percentage of data points will fall within one, two, or three standard deviations from the mean.

What is the difference between the Standard Deviation of a Sample and a Population?

The main difference between the Standard Deviation of a Sample and a Population is that the Sample Standard Deviation is an estimation of the Population Standard Deviation. This is because we are using a smaller subset of data points in the sample, rather than the entire population. However, as the sample size increases, the Sample Standard Deviation will become closer to the Population Standard Deviation.

How do we calculate the Standard Deviation of a Sample and a Population?

The formula for calculating the Standard Deviation of a Sample and a Population is the same. However, the data points used in the calculation will be different. For a sample, we use a subset of the data points, while for a population, we use all of the data points. The calculation involves finding the difference between each data point and the mean, squaring those differences, finding the average of those squared differences, and then taking the square root of that average.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
737
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
2K
Replies
1
Views
695
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
5K
Back
Top