The Standard Deviation of Sample vs. Population in Probability Calculations

Click For Summary

Discussion Overview

The discussion revolves around the distinction between the standard deviation of a sample and that of a population in the context of probability calculations. Participants explore how to identify which standard deviation is being referred to in a specific problem involving the ages of university students. The conversation includes theoretical considerations, interpretations of statistical terms, and the implications of sample size on standard deviation.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • Some participants assert that the standard deviation of a sample is equal to the standard deviation of the population divided by the square root of the sample size, while others challenge this interpretation.
  • One participant argues that the standard deviation given in the problem (4 years) should be considered as the standard deviation of the sample, not the population.
  • Another participant suggests that the sample's standard deviation will, on average, match that of the population, but this is contested by others who emphasize the need for clarity on the distribution of ages.
  • There is a discussion about the implications of estimating population parameters from sample statistics, with some participants noting that additional information about the distribution is necessary for accurate calculations.
  • Technical points are raised regarding the relationship between variance and standard deviation, with some participants discussing the properties of estimators and the implications of Jensen's Inequality.
  • One participant highlights the ambiguity in terminology within statistics, noting that a statistic computed from a sample can also be viewed as a population, which complicates the discussion.

Areas of Agreement / Disagreement

Participants express differing views on whether the standard deviation in the problem is that of the sample or the population. There is no consensus on the correct interpretation, and multiple competing perspectives remain throughout the discussion.

Contextual Notes

Participants note that the lack of information regarding the distribution of student ages adds complexity to the problem. Additionally, the discussion reveals potential ambiguities in the terminology used in statistics, which may lead to misunderstandings about standard deviation and its estimation.

tzx9633
I know that the standard deviation of sample = standrad deviation of population divided by sqrt(n) ...

However , in the following question , i don't know how to identify whether the standard deviation given is standard deviation of sample or standard deviation of population ... Can anyone help me to idenitify it ?

At a large university , the mean age of students is 22.3 years and the standard deviation is 4 years . A random sample of 64 students is drawn . What is the probability that the average of these students is greater than 23 years ?

Based on the author , the 4 given is standard deviation of population .
Th standard deviation of mean is 4 / sqrt(64) ..

Why is it so ?
I think it's wrong because we only picked 64 students out of the population , so the standard deviation we get is the standard deviation of sample , not the standard deviation of population
 
Physics news on Phys.org
tzx9633 said:
I know that the standard deviation of sample = standrad deviation of population divided by sqrt(n) ...
I think you better unlearn that. You may be thinking of the standard deviation in the estimate of the population mean.

The sample itself will (on average) have the same standard deviation as the population.

In addition, the question gives too little information to be solved. It also depends on how student ages are distibuted. The author may implicitly imply that the distribution is Gaussian, but I see no reason why this would be the case.
 
  • Like
Likes   Reactions: tzx9633
Orodruin said:
I think you better unlearn that. You may be thinking of the standard deviation in the estimate of the population mean.

The sample itself will (on average) have the same standard deviation as the population.

In addition, the question gives too little information to be solved. It also depends on how student ages are distibuted. The author may implicitly imply that the distribution is Gaussian, but I see no reason why this would be the case.
Why ? Can you explain further ?
 
Orodruin said:
I think you better unlearn that. You may be thinking of the standard deviation in the estimate of the population mean.

The sample itself will (on average) have the same standard deviation as the population.

In addition, the question gives too little information to be solved. It also depends on how student ages are distibuted. The author may implicitly imply that the distribution is Gaussian, but I see no reason why this would be the case.
Isn't that this case is we need to estimate the population standard deviation from sample standard deviation ??
 
tzx9633 said:
Isn't that this case is we need to estimate the population standard deviation from sample standard deviation ??
No. This thread should really go in the homework section, where you need to show what you have done and explain your own thought about the problem in detail.
 
  • Like
Likes   Reactions: tzx9633
Orodruin said:
No. This thread should really go in the homework section, where you need to show what you have done and explain your own thought about the problem in detail.
What i think is :

We only take some random smaple from the population , it's quite impossible to take the whole population ...
So , the standard deviation 4 means standard deviation of the sample ...Not standard deviation of population
 
tzx9633 said:
What i think is :

We only take some random smaple from the population , it's quite impossible to take the whole population ...
So , the standard deviation 4 means standard deviation of the sample ...Not standard deviation of population
As I said, the standard deviation of the sample is typically going to be the same as that of the population. Do not confuse it with the standard deviation in the estimate of the mean.
 
Orodruin said:
As I said, the standard deviation of the sample is typically going to be the same as that of the population. Do not confuse it with the standard deviation in the estimate of the mean.
Why the standard deviation of the population is going to be the same as the standard deviation of the sample ?
 
Because the standard deviation of a sample is defined in such a way that it is a good estimator for the standard deviation of the stochastic variable being sampled. This should be in any basic textbook on statistics.
 
  • #10
The standard deviation of a sample is in general an estimator of the standard deviation of the random variable. Here is where we need additional information about its distribution, since it is a random variable again. Best case it is identically distributed. There are relations between the number a sample needs in dependency on required confidence intervals and the overall standard deviation. I haven't checked your figures and didn't see, how the students' ages are distributed. Maybe these information are sufficient to get the required equation.
 
Last edited:
  • #11
tzx9633 said:
What i think is :

We only take some random smaple from the population , it's quite impossible to take the whole population ...
So , the standard deviation 4 means standard deviation of the sample ...Not standard deviation of population
I think it is assumed in the problem -- don't know how realistically -- that the population s.d is somehow known to be 4.
 
  • #12
Orodruin said:
The sample itself will (on average) have the same standard deviation as the population.

It's a technical point, but this is subtly wrong. On average your variance estimate should line up with that of the population. (Why? Because you have linearity with respect to independent samples/trials and estimating ##X^2## and ##X##, and can then apply weak law of large numbers.)

Setting aside the zero variance case (and assuming a finite second moment):

The square root function ##g(v) = \sqrt{v}## is strictly negative convex over positive numbers and which means ##E\big[g(v)\big] \lt g\big(E[v]\big)##. Hence if variance is right on average, standard deviation cannot be.

As a note to OP: this is one of many reasons to generally work with variance, not standard deviation.
 
  • #13
StoneTemplePython said:
It's a technical point, but this is subtly wrong. On average your variance estimate should line up with that of the population. (Why? Because you have linearity with respect to independent samples/trials and estimating X2X2X^2 and XXX, and can then apply weak law of large numbers.)
I disagree, there is nothing subtle about it. I would agree that it is just "wrong". I glossed over this point based on the OP level, but errors should be corrected.
 
  • Like
Likes   Reactions: fresh_42
  • #14
StoneTemplePython said:
The square root function g(v)=√vg(v) = \sqrt{v} is strictly negative convex over positive numbers and which means E[g(v)]<g(E[v])E\big[g(v)\big] \lt g\big(E[v]\big). Hence if variance is right on average, standard deviation cannot be.
Hmm, why do you use < instead of <= ? In which case I don't quite agree with the "cannot be" (definite statement), and instead would say "may not be".
 
  • #15
ChrisVer said:
Hmm, why do you use < instead of <= ? In which case I don't quite agree with the "cannot be" (definite statement), and instead would say "may not be".

Look up strict convexity, its implications for use of Jensen's Inequality.
 
  • Like
Likes   Reactions: ChrisVer
  • #16
StoneTemplePython said:
Look up strict convexity, its implications for use of Jensen's Inequality.
OK makes sense now, thanks.
 
  • #17
Relating statistical problems to the theory of probability is often difficult because of ambiguities in terminology. (These ambiguities are traditional in the field of statistics and not the fault of students.)

One difficulty is that a statistic computed from a "sample" of a given size can also be considered to be a "population".

If we consider the means of samples of size n to be a "population" , that population has a certain distribution. This distribution has its own standard deviation. With that interpretation, the statement:

I know that the standard deviation of sample = standard deviation of population divided by sqrt(n) ...

is correct if "standard deviation of sample" signifies "standard deviation of the population of sample means".

However, keep in mind that the phrases "sample standard deviation" and "standard deviation of the sample" can have other interpretations. They might refer one of the various formulae for estimating the standard deviation of the population from the data in a sample. (Have you studied "estimators" yet?) They might also refer to a single number, such as in the statement "the standard deviation of the sample was 13.97".

(As an example of the tangles caused by ambiguous terminology, see the old thread: https://www.physicsforums.com/threads/standard-deviation-in-excel.371424/ )

The standard deviation of mean is 4 / sqrt(64) ..

Why is it so ?
I think it's wrong because we only picked 64 students out of the population , so the standard deviation we get is the standard deviation of sample , not the standard deviation of population

By itself, the term "mean" is ambiguous. It might signify the mean of the distribution of the ages of students or it might refer to the the mean of the population of means of samples of size n. You are correct that 4/sqrt(64) is not the standard deviation of the distribution of individual student's ages. The author is correct that 4/sqrt(64) is the standard deviation of the population of means of samples of size 64.
 
  • Like
Likes   Reactions: WWGD
  • #18
Orodruin said:
The sample itself will (on average) have the same standard deviation as the population.

The "sample standard deviation" (the sum of squared deviations from the mean, divided by n) is a biased estimator of the population SD. It's actually very difficult to construct an unbiased estimator of the standard deviation, even for a normal distribution.
 
  • #19
Number Nine said:
The "sample standard deviation" (the sum of squared deviations from the mean, divided by n) is a biased estimator of the population SD. It's actually very difficult to construct an unbiased estimator of the standard deviation, even for a normal distribution.
This was already discussed and sorted out. See posts #12 and #13.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 25 ·
Replies
25
Views
12K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K