# Standard Deviation as Function of Sample Size

In high school, I was taught that the standard deviation drops as you increase the sample size. For this reason, larger sample sizes produce less fluctuation. At the time, I didn't question this because it made sense.

Then, I was taught that the standard deviation does not drop as you increase sample size. Rather, it was the standard error that dropped. Sounded fine to me. And most of the resources I find agree -- the standard deviation might fluctuate slightly, but it does not drop with increasing sample size.

But today I came across the ISOBudgets: http://www.isobudgets.com/introduction-statistics-uncertainty-analysis/#sample-size. Here, it states "Have you ever wanted to reduce the magnitude of your standard deviation? Well, if you know how small you want the standard deviation to be, you can use this function to tell you how many samples you will need to collect to achieve your goal." It goes on to provide such a function for finding this minimum n, namely that √n = (desired confidence level) X (current standard deviation) / ( margin of error).

Am I misreading this? I used a random number generator to punch out about 500 random numbers on a normalized distribution and the standard deviation does not drop. What am I missing?

StoneTemplePython
Gold Member
This is one of those things that probably needs written out mathematically --

standard deviation of what exactly?

I'd also suggest for now: focus on variance, not standard deviation, as variance plays better with linearity. (You can always take a square root at the end in the privacy of your own home.)

For example you could be talking about the variance of a sum of random variables. Or the variance of a rescaled (say divide by n) sum of random variables. Or not the variance in the underlying random variables but in your estimate of some attribute of them. Or ...

Klystron
FactChecker
Gold Member
CORRECTION: This post is wrong. Please ignore it.

The population standard deviation, ##\sigma##, of the probability distribution does not change. However, the sample standard deviation that estimates ##\sigma## using the formula
$$S = \sqrt{ \frac {{\sum_{i=1}^N (x_i - \bar x)^2}}{N-1}}$$
does decrease as N increases. It becomes a better estimater of ##\sigma##.

The population standard deviation, ##\sigma##, of the probability distribution does not change. However, the sample standard deviation that estimates ##\sigma## using the formula
$$S = \sqrt{ \frac {{\sum_{i=1}^N (x_i - \bar x)^2}}{N-1}}$$
does decrease as N increases. It becomes a better estimater of ##\sigma##.

But isn't the only difference between the two the fact that you divide √N in one case and √(N-1) in the other? That means for large N they pretty much behave the same. Or am I misunderstanding your point?

This is one of those things that probably needs written out mathematically --

standard deviation of what exactly?

I'd also suggest for now: focus on variance, not standard deviation, as variance plays better with linearity. (You can always take a square root at the end in the privacy of your own home.)

For example you could be talking about the variance of a sum of random variables. Or the variance of a rescaled (say divide by n) sum of random variables. Or not the variance in the underlying random variables but in your estimate of some attribute of them. Or ...

As for your first question, I am referring to the standard deviation of N measurements of (say) the mass of an object, with an underlying normal probability distribution understood.

As for the variance, why would it behave any different than the standard deviation? I mean, if the variance stays constant for increasing N, wouldn't the standard deviation as well?

FactChecker
Gold Member
I'm sorry, I got sloppy. My post was wrong. I would like to delete it.

I'm sorry, I got sloppy. My post was wrong. I would like to delete it.

That's cool. You should be able to, or at least edit out the text.

StoneTemplePython
Gold Member
As for your first question, I am referring to the standard deviation of N measurements of (say) the mass of an object, with an underlying normal probability distribution understood.
Can you write it out mathematically? I think I know what you mean but it is not what you just said here.

As for the variance, why would it behave any different than the standard deviation? I mean, if the variance stays constant for increasing N, wouldn't the standard deviation as well?

In this realm, people trip up on Jensen's Inequality and Triangle Inequality, a lot. Trying to preserve linearity is worth it. (Then just take square root at the end).

caveat: I may decide your question is different than what I thought and decide std deviation is nice to work with directly -- rare but it happens.

Can you write it out mathematically? I think I know what you mean but it is not what you just said here.

In this realm, people trip up on Jensen's Inequality and Triangle Inequality, a lot. Trying to preserve linearity is worth it. (Then just take square root at the end).

Sure, here is the standard deviation:

$$\sigma = \sqrt{ \frac{\sum_{i=1}^N (x_i-\langle x\rangle)^2}{N-1} }$$

where $N$ is the sample size, $x_i$ is an individual measurement, and $\langle x\rangle$ is the mean of all measurements.

Sure, here is the standard deviation:

$$\sigma = \sqrt{ \frac{\sum_{i=1}^N (x_i-\langle x\rangle)^2}{N-1} }$$

where $N$ is the sample size, $x_i$ is an individual measurement, and $\langle x\rangle$ is the mean of all measurements.

As for Jensen's inequality and such, I'm not saying that the variance and standard deviations would rise by the same amount as N increases, only that if one increases the other must do as well and that if one stays constant, the other does as well.

StoneTemplePython
Gold Member
Sure, here is the standard deviation:

$$\sigma = \sqrt{ \frac{\sum_{i=1}^N (x_i-\langle x\rangle)^2}{N-1} }$$

where $N$ is the sample size, $x_i$ is an individual measurement, and $\langle x\rangle$ is the mean of all measurements.

This is getting closer but still doesn't mathematically state your problem. So you have a random variable ##X## with a variances called ##\sigma_X^2##. I think you are talking about sampling -- specifically ##n## iid trials, and you want to estimate ##E\big[X\big]## and ##\sigma_X^2 = E\big[X^2\big] - E\big[X\big]^2##. Is that what your goal is?

The idea here is you need to estimate the mean ##E\big[X\big]## and the second moment ##E\big[X^2\big]## or ##E\big[Y\big]## where ##Y = X^2## if you like. I assume both of these exist.

As you get larger and larger samples, your estimates will concentrate about the mean. Pick a favorite limit law. As estimates concentrate about the mean, the 'distance' between the estimates and the correct value goes down. Hence the variance of your estimate (read: squared 2 norm of difference between estimates and true value) goes down.

I wouldn't worry about the divide by ##n## vs divide by ##n-1## issue here.
- - - -
It could be instructive to work through a fully fleshed problem, both the math and a simulation, involving coin tossing and estimating mean and variance. Since coin tosses have bounded (specifically 0 or 1 only in this case) results, you can get very sharp estimates on concentration about the mean via Chernoff bounds.

This is getting closer but still doesn't mathematically state your problem. So you have a random variable ##X## with a variances called ##\sigma_X^2##. I think you are talking about sampling -- specifically ##n## iid trials, and you want to estimate ##E\big[X\big]## and ##\sigma_X^2 = E\big[X^2\big] - E\big[X\big]^2##. Is that what your goal is?

The idea here is you need to estimate the mean ##E\big[X\big]## and the second moment ##E\big[X^2\big]## or ##E\big[Y\big]## where ##Y = X^2## if you like. I assume both of these exist.

As you get larger and larger samples, your estimates will concentrate about the mean. Pick a favorite limit law. As estimates concentrate about the mean, the 'distance' between the estimates and the correct value goes down. Hence the variance of your estimate (read: squared 2 norm of difference between estimates and true value) goes down.

I wouldn't worry about the divide by ##n## vs divide by ##n-1## issue here.
- - - -
It could be instructive to work through a fully fleshed problem, both the math and a simulation, involving coin tossing and estimating mean and variance. Since coin tosses have bounded (specifically 0 or 1 only in this case) results, you can get very sharp estimates on concentration about the mean via Chernoff bounds.

I used the NORMINV[] function in Excel to generate random numbers that are distributed normally about a mean. The distribution is continuous, unlike coin flips. As more and more numbers are generated, the standard deviation doesn't drop. Most say that as I increase the sample size, that the estimates will concentrate closer to the mean. But it doesn't appear that is happening.

StoneTemplePython
Gold Member
I used the NORMINV[] function in Excel to generate random numbers that are distributed normally about a mean. The distribution is continuous, unlike coin flips. As more and more numbers are generated, the standard deviation doesn't drop. Most say that as I increase the sample size, that the estimates will concentrate closer to the mean. But it doesn't appear that is happening.

I think you're making a mistake (a) using Excel (hard for others to replicate, not known for being good in simulations -- a free excel add-on called PopTools is decent though) and (b) starting with a normal distribution instead of something simpler, and in particular coin tossing (and yes that can be normal approximated but that's a different topic).

Assuming the moments exist, these limit laws are ironclad things which tells me you're doing something wrong here but I can't read minds. The point is the variance/variation in your estimates comes down for bigger sample sizes. You may consider trying a simulation with ##n=10## data points instead of ##n=500## and comparing the mean and variances estimates as well as the variance in said estimates after running many trials. My gut tells me that you're not calculating things in the way I would, but this is an ironclad difference between pasting in a few lines of code for others to look at, vs working in Excel.

- - - -
it also occurs to me that the normal distribution may be too 'nice' and hence you can miss subtle differences. Again with coin tossing in mind, consider a very biased coin that has a value of 1 with probability ##p = 10^{-2}## and a value of 0 aka tails with probability ##1 - p##. Consider the variation in your estimates of the mean and variance of said coin when you run 10,000 trials with each trial having say ##100## tosses, vs running 10,000 trials with each trial having ##10,000## tosses in it.

Thanks, I'll look into it.

Stephen Tashi
As for your first question, I am referring to the standard deviation of N measurements of (say) the mass of an object, with an underlying normal probability distribution understood.
You're failing to cope with the complicated vocabulary of statistics.

Sure, here is the standard deviation:

$$\sigma = \sqrt{ \frac{\sum_{i=1}^N (x_i-\langle x\rangle)^2}{N-1} }$$

That is an estimator for the standard deviation of the mean of N measurements. Some also call it the "sample standard deviation". Other's reserve the term "sample standard deviation" for the similar expression with N instead of N-1 in the denominator.

The mean of a sample of N measurements is a random variable. It has a standard deviation (namely, the standard deviation of its probability distribution). That standard deviation not a function of the values ##X_i## obtained in one particular sample.

Then, I was taught that the standard deviation does not drop as you increase sample size.
Which standard deviation are you talking about?

If ##X## is a random variable with standard deviation ##\sigma_X## then taking 100 samples of ##X## does not change the standard deviation of ##X##, but the random variable ##Y## defined by the mean of 100 samples of ##X## has a smaller standard deviation than ##X##. Neither of these standard deviations is the same as an estimator of a standard deviation. An estimator of a standard deviation is itself a random variable. It isn't a constant parameter associated with a probability distribution.

Interesting. I appreciate the feedback. So let me ask, how would YOU define the standard deviation?

Stephen Tashi
Interesting. I appreciate the feedback. So let me ask, how would YOU define the standard deviation?

That's like asking "How do you find the place?". It isn't a specific question.

The use of "the" in the phase "the standard deviation" suggests that it can only refer to a single thing. That is not the case. The phrase "the standard deviation" is unspecific. As @StoneTemplePython said in post #2, standard deviation of what exactly?

Terms like "standad deviation", "average", "mean" etc. have at least 3 possible meanings

1. They may refer to speciific number obtained in a specific sample -e.g. "The mean weight in the sample 5 apples was .2 kg"
2. They may refer to a specific number that gives the value of the parameter associated with a probability distribution "e.g. We assume the population of apples has a normal distribution with mean 0.2"
3. They may refer to a random variable. For example "The distribution of the mean of a sample of 5 apples taken from a population of apples with mean 0.2 also has a mean of 0.2. (So we can speak of "the mean of a mean", "the standard deviation of a mean", "the standard deviation of a standard deviation" etc.)

When you wrote "That is an estimator for the standard deviation of the mean of N measurements," what definition were you using?

Stephen Tashi
When you wrote "That is an estimator for the standard deviation of the mean of N measurements," what definition were you using?

An estimator is a function of the values obtained in a sample. For example,
$$\sigma = \sqrt{ \frac{\sum_{i=1}^N (x_i-\langle x\rangle)^2}{N-1} }$$
is an estimator because it depends on the values ##x_i## obtained in a sample.

Since the practical use of estimators is to estimate the parameters of proability distsributions, we can speak of an "estimator of the standard deviation" or "an estimator of the variance" etc.

As you know, the mean of sample of N things chosen from a population need not be exactly equal to the mean of the population. The mean of the sample is used as an estimator of the mean of the population. Likewise we can define estimators for the standard deviation and variance.

A complicated question in statistics is decide what formula is the "best" estimator for a population parameter. More vocabulary is needed in order to be specific about what is meant by "best". There are "unbiased estimators", "minimum variance estimators", "consistent estimators", "maximum liklihood estimators".

Furthermore , an estimator is itself a random variable because it depends on the random values that occur in a sample. So an estimator has a mean, variance, standard deviation etc. just like other random variables.

Okay, let's go back to the equation:

$$\sigma = \sqrt{ \frac{\sum_{i=1}^N (x_i-\langle x\rangle)^2}{N-1} }$$

Let's not even use the term "standard deviation." Here, N is the number of samples selected from a population. The average in the equation refers to the sample mean. The x_i's are generated from a process modeled by a normal distribution.

What likely happens to $\sigma$ in the above equation as we increase N, that is, we select more and more samples from the population? Yes, the sample mean changes in value. Understood. If it cannot be said one way or the other (that is, it could go up or down), I'm good with that.

Stephen Tashi
What likely happens to $\sigma$ in the above equation as we increase N, that is, we select more and more samples from the population?

( It's traditional to put a "hat" on random variables representing estimators. So ##\hat{\sigma}## would be a better notation. However, let's use your notation.)

The graph of the probability density of the estimator ##\sigma## (as you define ##\sigma##) has a peak near the value equal to the standard deviation ##\sigma_p## of the normal distribution from which the ##x_i## are chosen. As ##N## gets larger, this peak gets taller and narrower. Hence , as N gets larger, it is more probable that ##\sigma## will have a value near ##\sigma_p##.

We can contrast this with the estimator ##\mu = \frac{ \sum_{i=1}^N x_i}{N}##. The probability density of ##\mu## has a peak at ##\mu_p## = the mean of the normal distribution from which the ##x_i## are chosen. As ##N## increases this peak becomes taller and narrower. The narrowness of the peak is indicated by the standard deviation of ##\mu##, which is ##\sigma_{\mu} = \frac{ \sigma_p}{\sqrt{N}}##. The standard deviation of the distribution of ##\mu## gets smaller as ##N## becomes larger. As ##N## becomes larger it becomes more probable that the value of ##\mu## will be close to ##\mu_p##.

( It's traditional to put a "hat" on random variables representing estimators. So ##\hat{\sigma}## would be a better notation. However, let's use your notation.)

The graph of the probability density of the estimator ##\sigma## (as you define ##\sigma##) has a peak near the value equal to the standard deviation ##\sigma_p## of the normal distribution from which the ##x_i## are chosen. As ##N## gets larger, this peak gets taller and narrower. Hence , as N gets larger, it is more probable that ##\sigma## will have a value near ##\sigma_p##.

We can contrast this with the estimator ##\mu = \frac{ \sum_{i=1}^N x_i}{N}##. The probability density of ##\mu## has a peak at ##\mu_p## = the mean of the normal distribution from which the ##x_i## are chosen. As ##N## increases this peak becomes taller and narrower. The narrowness of the peak is indicated by the standard deviation of ##\mu##, which is ##\sigma_{\mu} = \frac{ \sigma_p}{\sqrt{N}}##. The standard deviation of the distribution of ##\mu## gets smaller as ##N## becomes larger. As ##N## becomes larger it becomes more probable that the value of ##\mu## will be close to ##\mu_p##.

Okay, so you state that "As ##N## gets larger, this peak gets taller and narrower." That makes sense and what I always thought. Would this not mean that ##\sigma## drops as N increases? I would normally think that a tall, narrow probability distribution would correspond to a small ##\sigma##.

Stephen Tashi
Would this not mean that ##\sigma## drops as N increases?

No. ##\sigma## is not a single number. ##\sigma## is a random variable. The graph of the probability distribution of ##\sigma## has peak in probability (the ##y## value) near the ##x## value ## \sigma_p##

You can see graphs of probability densities for ##s = \sqrt{ \frac{ \sum_{i=1}^N (x_i - <x>)^2}{N}} ## at http://mathworld.wolfram.com/StandardDeviationDistribution.html That page makes the tacit assumption that ##\sigma_p = 1##. Those graphs are similar to the probability densities for ##\sigma##.

So is the mean a random variable and not a number as well?

Stephen Tashi