Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Sample Size and Standard Deviation of the Sampling Distribution of the Mean

  1. Apr 6, 2012 #1

    I am doing an undergraduate introductory statistics course and I'm trying to understand some basic concepts.

    I'm trying to understand why the sample size (n) affects the standard deviation of the sampling distribution of the mean (σ[itex]_{M}[/itex])

    I understand how a sample size affects the sampling distribution of the mean. I've been shown that with larger sample sizes the standard deviation decreases. This can be seen graphically the normal distribution curve of the samples mean becoming more narrow as the sample size increases.

    σ[itex]_{M}[/itex] = σ[itex]/\sqrt{n}[/itex]

    What I don't understand is why this is happening.

    I have this intuitive feeling that if you take an infinite number of samples means they should have a fixed mean and standard deviation and that this shouldn't be different if you take samples of n=10 or n=100. I've been shown that this is wrong but I don't understand why.
  2. jcsd
  3. Apr 6, 2012 #2


    User Avatar
    Science Advisor

    Hey nraic and welcome to the forums.

    Let's assume we have a consistent, unbiased estimator. If you don't know what these are then I suggest you try wikipedia and maybe other books. These are the estimators that are used in practice because they are actually useful for statistics.

    The simple idea for the variance of the estimator is to shrink is that more data gives us a better estimate, and that if we have more data, then the range of where most of the probable values for the estimator to lie in shrinks and this is reflected in the variance shrinking as a result of dividing by square root of n.

    The idea is based on the law of large numbers.

    What this says intuitively is that the more data we collect for something and then average out the sum, the closer and closer that this average goes to the true average of the distribution and not just for a specific distribution, but for any distribution. Basically it's a convergence result that says as you collect more and more data about the distribution then if you average all the numbers you'll get a better approximation for the mean.

    Now think about this in terms of the variance: this means that if the estimator is consistent and unbiased then as we get more information the variance shrinks which means we should get more certainty in our information for trying to estimate the mean.

    Since variance is one measure for measuring uncertainty, it is no surprise intuitively that the variance gets lower as we get more information in the terms of number of data points.

    If the variance went up then that would mean that our guess for the mean would become more uncertain which means that we would be better off getting less information! This doesn't make sense intuitively.

    Also if the variance stayed the same then it means that no extra information would give us a more accurate guess for the mean given some fixed level of confidence which means it would be pointless getting more data.

    So think about it in terms of the fact that getting more data should tell us more about what we are trying to measure and reduce uncertainty about trying to measure it and this translates into lower variance as one measure of reduced uncertainty.

    I want to say something though that I think you should hear because you may get the wrong idea if I don't say it.

    The thing about the estimator is that no matter how many data points we get from an unknown process (what the data is talking about), you will never be able to 100% say what the mean is at a point or even in an interval. Your 95% interval will shrink a lot when you have collected 100,000 data points, but the 100% interval will always be every point on the real line.

    To understand this think about if you have a process and the real mean is 0. In the first hundred thousand data points they could all be positive and give an estimate that is also positive. This could happen for a million, a billion, even a googleplex number of times!

    But then after this you might get a whole lot of negative values for the same number of times and if this is taken into account then your estimate shifts from positive towards zero.

    This is why for a truly unknown process you will never be able to have a fixed interval for 100% confidence.

    So in conclusion: more information about a process (higher sample) should by most means give us more certainty (not not 100% certainty of course) about where we would most likely expect the mean to lie (because variance shrinks).
  4. Apr 6, 2012 #3
    Hi Chiro,

    Thanks for the quick reply. First I have to say I don't understand your explanation but I guess I shouldn't expect to since I have only read it once.

    While I try and get my head around your answer, if anyone else wants to try explain it from a different angle please do so.

    Chiro, I will post questions about your answer when I understand it more clearly.

    One thing I should add is I haven't covered estimations of σ yet. So please try and explain assuming that the standard deviation of the population is known and as used with Z values and not estimated as used with T values.
  5. Apr 6, 2012 #4


    User Avatar
    Science Advisor

    No worries.

    I can explain it mathematically later on when you are ready and have read about estimators, but until then I think it's better you learn about estimators first before I give a mathematical idea of why this is the case.

    Also this is important: we are not estimating σ, we are estimating μ and our σ is actually a parameter (constant) and not a random variable. If this isn't cleared up you'll get even more confused later on.
  6. Apr 6, 2012 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    A primitive intuition is that the more samples you have, the more "extremes" or "errors" tend to cancel each other out. There are not many real pheonomena in our experience where independent samples "add" in the sense of actual arithmetic addition, but you can think of things where something like that occurs. For example, the molecules in gases have a distribution of velocities, but you don't feel much variation in air pressure over a short period of time.
  7. Apr 6, 2012 #6
    OK, how about a real-life example? Let's say I roll a 6-sided dice 200 times. I can be about 95% confident the sample average will be between 3.0958 and 3.9042. Now let's say I roll the same dice 100,000 times. I can be about 95% confident that the average will fall between 3.4819 and 3.5181.

    It might help you understand if you simulated dice rolls, or any random variable that has an average. Build a simulator, say on your graphing calculator (I can help you on a TI-84 Plus) or statistics program (I can help you with Excel or iWork Numbers).
  8. Apr 7, 2012 #7
    Moonman, Stephen and Chiro, thank you for taking the time to post your explanations. I thought about what each said and I think it helped me come up with my own explanation.

    I think that in my original post I should of said that I was more interested in purely theoretical explanation. So maybe I should have phrased the question more like this:

    Why does the standard deviation of the sampling distribution of mean decrease as sample size (n) increases as described by this formula?

    σ[itex]_{M}[/itex] = σ / [itex]\sqrt{n}[/itex]

    Where σ[itex]_{M}[/itex] is the standard deviation of the sampling distribution of mean


    σ is the population standard deviation from which we take samples and calculate means

    Ok, so the answer I have come up with to my own question starts like this:

    If you have a population with say a μ=100 and σ=2 and you take an infinite number of samples of size n=1, calculate the mean of each sample and graph this sampling distribution of mean it will look exactly like the population. It will also have a μ=100 and σ=2.

    Why? Because when you take the mean of each sample n=1 it will be the same as the one any only number in that sample.

    Say sample 1 = {98} the mean of this will be 98.
    and, samlpe 2 = {102} the mean of this will be 102.

    Population = { ..., 98, 102, ...}
    Mean of samples = { ..., 98, 102, ...}

    If you keep doing this and you will just end up with the same infinite set that you had for the populations normal distribution,

    Now, lets increase the sample size to n=2

    Say samlpe 1 = {98, 101} the mean of this now be 99.5
    and sample 2 = {95, 100} the mean of this now be 97.5

    Population = { ..., 95, 98, 100, 101, ...}
    Mean of samples = { ...,97.5, 99.5, ...}

    If you keep doing this you will end up with a different infinite set of numbers to that of the population and your new set of numbers will have a different σ. Why will the standard deviation be different?

    The process of taking a mean of each sample has created a set of values that are closer together than the values of the population and thus the sampling distribution of the mean will have a smaller standard deviation than the population if n > 1.

    Now why this is equal to exactly σ[itex]_{M}[/itex] = σ / [itex]\sqrt{n}[/itex] I can't explain but I'm happy to accept it at this point.

    My explanation is pretty rough and will probably only make sense to me but if anyone does follow what I'm saying please feel free to point out any mistakes or improvements.
  9. Apr 7, 2012 #8


    User Avatar
    Science Advisor

    Mathematically you need to consider the variance of your estimator.

    Your estimator is going to be Y = (X1 + X2 + ... + XN)/N where all the X's come from the same underlying distribution. Your estimator for the mean is going Y and what you have to find out is E[Y] for 'point estimate' (the estimator is still a random variable but one of the properties of a good estimator is that no matter how many data points you have, E[Y] never changes) as well as Var[Y].

    Now remember that all the X's are independent and identically distributed from the same underlying distribution. We know that σ2 is a known parameter for the underlying distribution which means VAR[X1] = VAR[X2] = ... = VAR[Xn] = σ2 for this example.

    Now given that Var[aX + bY] = a^2Var[X] + b^2Var[Y] if X and Y are independent, then for the estimator Y above, what is the variance of Y: in other words what is Var[Y] for (X1 + X2 + ... + XN)/N and subsequently what is the standard deviation for the estimator Y?
  10. Sep 13, 2013 #9
    If you want an example of the law of large numbers, check out this simulator

    http://www.btwaters.com/probab/dice/dicemain3D.html [Broken]

    Set the number of dice to 1, the number of rolls to 1, and the results to session.

    Notice that the expected value is one sixth.

    click auto roll once, and observe the changes in the table. Repeat as many times as you can and watch the results start to normalize. Refresh the page and then follow the same procedure except leave the number of rolls per click at its default of 1000. How does this change your results?
    Last edited by a moderator: May 6, 2017
  11. Oct 15, 2014 #10
    I had a question regarding this as well but i have no idea how to post here because i'm new, i'm just going to go ahead and post the question here. I have an assignment and I don't understand how to go about this: with a sample size of 1000 Ontario high school students you find that the mean reading score is 530 with a standard deviation of 90.
    The question is "what if, instead of a sample, you obtained the same mean and standard deviation when testing every Ontario high school student? How would the standard error of estimate and the 95 and 99 percent confidence intervals for the estimate of the mean change?" Thank you in advance.

    - Yumna
  12. Oct 18, 2014 #11


    User Avatar
    Science Advisor

    A standard deviation needs a sample (i.e. one with more than one data point).

    The intervals rely on the mean and standard error if you are using a normal distribution (or as an approximation) and if you have the exact same things across each sample, then the interval would be the same.

    You have to consider how the variables of the mean, standard error of the mean, and its relation to the number of data points in your sample effects the actual interval itself.
  13. Nov 7, 2014 #12
    I wonder if the original poster (nraic) confused sample size of individual data points with sample size of a distribution of means (which I have done, trying to remember stats from many years ago). For individual data (let's say heights of college-aged men), my understanding is that, once there is enough data for the mean to stabilize, collecting more data will not change the shape of the distribution (or the standard deviation of this distribution). I think this is what the original poster's intuition is, and I think that his/her intuition is correct. However, the standard deviation of the distribution representing the probability of the mean decreases as the sample size (e.g., number of heights) increases. So, in other words, your estimate of the mean height becomes more accurate (specifically, the range of your X% confidence level narrows) as you collect more data (which I think is also intuitive).
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook