Sample Size and Standard Deviation of the Sampling Distribution of the Mean

nraic · Apr 6, 2012

Hi,

I am doing an undergraduate introductory statistics course and I'm trying to understand some basic concepts.

I'm trying to understand why the sample size (n) affects the standard deviation of the sampling distribution of the mean (σ[itex]_{M}[/itex])

I understand how a sample size affects the sampling distribution of the mean. I've been shown that with larger sample sizes the standard deviation decreases. This can be seen graphically the normal distribution curve of the samples mean becoming more narrow as the sample size increases.

σ[itex]_{M}[/itex] = σ[itex]/\sqrt{n}[/itex]

What I don't understand is why this is happening.

I have this intuitive feeling that if you take an infinite number of samples means they should have a fixed mean and standard deviation and that this shouldn't be different if you take samples of n=10 or n=100. I've been shown that this is wrong but I don't understand why.

chiro · Apr 6, 2012

nraic said:

Hi,

I am doing an undergraduate introductory statistics course and I'm trying to understand some basic concepts.

I'm trying to understand why the sample size (n) affects the standard deviation of the sampling distribution of the mean (σ[itex]_{M}[/itex])

I understand how a sample size affects the sampling distribution of the mean. I've been shown that with larger sample sizes the standard deviation decreases. This can be seen graphically the normal distribution curve of the samples mean becoming more narrow as the sample size increases.

σ[itex]_{M}[/itex] = σ[itex]/\sqrt{n}[/itex]

What I don't understand is why this is happening.

I have this intuitive feeling that if you take an infinite number of samples means they should have a fixed mean and standard deviation and that this shouldn't be different if you take samples of n=10 or n=100. I've been shown that this is wrong but I don't understand why.

Hey nraic and welcome to the forums.

Let's assume we have a consistent, unbiased estimator. If you don't know what these are then I suggest you try wikipedia and maybe other books. These are the estimators that are used in practice because they are actually useful for statistics.

The simple idea for the variance of the estimator is to shrink is that more data gives us a better estimate, and that if we have more data, then the range of where most of the probable values for the estimator to lie in shrinks and this is reflected in the variance shrinking as a result of dividing by square root of n.

The idea is based on the law of large numbers.

What this says intuitively is that the more data we collect for something and then average out the sum, the closer and closer that this average goes to the true average of the distribution and not just for a specific distribution, but for any distribution. Basically it's a convergence result that says as you collect more and more data about the distribution then if you average all the numbers you'll get a better approximation for the mean.

Now think about this in terms of the variance: this means that if the estimator is consistent and unbiased then as we get more information the variance shrinks which means we should get more certainty in our information for trying to estimate the mean.

Since variance is one measure for measuring uncertainty, it is no surprise intuitively that the variance gets lower as we get more information in the terms of number of data points.

If the variance went up then that would mean that our guess for the mean would become more uncertain which means that we would be better off getting less information! This doesn't make sense intuitively.

Also if the variance stayed the same then it means that no extra information would give us a more accurate guess for the mean given some fixed level of confidence which means it would be pointless getting more data.

So think about it in terms of the fact that getting more data should tell us more about what we are trying to measure and reduce uncertainty about trying to measure it and this translates into lower variance as one measure of reduced uncertainty.

I want to say something though that I think you should hear because you may get the wrong idea if I don't say it.

The thing about the estimator is that no matter how many data points we get from an unknown process (what the data is talking about), you will never be able to 100% say what the mean is at a point or even in an interval. Your 95% interval will shrink a lot when you have collected 100,000 data points, but the 100% interval will always be every point on the real line.

To understand this think about if you have a process and the real mean is 0. In the first hundred thousand data points they could all be positive and give an estimate that is also positive. This could happen for a million, a billion, even a googleplex number of times!

But then after this you might get a whole lot of negative values for the same number of times and if this is taken into account then your estimate shifts from positive towards zero.

This is why for a truly unknown process you will never be able to have a fixed interval for 100% confidence.

So in conclusion: more information about a process (higher sample) should by most means give us more certainty (not not 100% certainty of course) about where we would most likely expect the mean to lie (because variance shrinks).

nraic · Apr 6, 2012

Hi Chiro,

Thanks for the quick reply. First I have to say I don't understand your explanation but I guess I shouldn't expect to since I have only read it once.

While I try and get my head around your answer, if anyone else wants to try explain it from a different angle please do so.

Chiro, I will post questions about your answer when I understand it more clearly.

One thing I should add is I haven't covered estimations of σ yet. So please try and explain assuming that the standard deviation of the population is known and as used with Z values and not estimated as used with T values.

chiro · Apr 6, 2012

nraic said:

Hi Chiro,

Thanks for the quick reply. First I have to say I don't understand your explanation but I guess I shouldn't expect to since I have only read it once.

While I try and get my head around your answer, if anyone else wants to try explain it from a different angle please do so.

Chiro, I will post questions about your answer when I understand it more clearly.

One thing I should add is I haven't covered estimations of σ yet. So please try and explain assuming that the standard deviation of the population is known and as used with Z values and not estimated as used with T values.

No worries.

I can explain it mathematically later on when you are ready and have read about estimators, but until then I think it's better you learn about estimators first before I give a mathematical idea of why this is the case.

Also this is important: we are not estimating σ, we are estimating μ and our σ is actually a parameter (constant) and not a random variable. If this isn't cleared up you'll get even more confused later on.

Stephen Tashi · Apr 6, 2012

nraic said:

I'm trying to understand why the sample size (n) affects the standard deviation of the sampling distribution of the mean (σ[itex]_{M}[/itex])

A primitive intuition is that the more samples you have, the more "extremes" or "errors" tend to cancel each other out. There are not many real pheonomena in our experience where independent samples "add" in the sense of actual arithmetic addition, but you can think of things where something like that occurs. For example, the molecules in gases have a distribution of velocities, but you don't feel much variation in air pressure over a short period of time.

moonman239 · Apr 6, 2012

nraic said:

Hi Chiro,

Thanks for the quick reply. First I have to say I don't understand your explanation but I guess I shouldn't expect to since I have only read it once.

While I try and get my head around your answer, if anyone else wants to try explain it from a different angle please do so.

Chiro, I will post questions about your answer when I understand it more clearly.

One thing I should add is I haven't covered estimations of σ yet. So please try and explain assuming that the standard deviation of the population is known and as used with Z values and not estimated as used with T values.

OK, how about a real-life example? Let's say I roll a 6-sided dice 200 times. I can be about 95% confident the sample average will be between 3.0958 and 3.9042. Now let's say I roll the same dice 100,000 times. I can be about 95% confident that the average will fall between 3.4819 and 3.5181.

It might help you understand if you simulated dice rolls, or any random variable that has an average. Build a simulator, say on your graphing calculator (I can help you on a TI-84 Plus) or statistics program (I can help you with Excel or iWork Numbers).

nraic · Apr 7, 2012

Moonman, Stephen and Chiro, thank you for taking the time to post your explanations. I thought about what each said and I think it helped me come up with my own explanation.

I think that in my original post I should of said that I was more interested in purely theoretical explanation. So maybe I should have phrased the question more like this:

Why does the standard deviation of the sampling distribution of mean decrease as sample size (n) increases as described by this formula?

σ[itex]_{M}[/itex] = σ / [itex]\sqrt{n}[/itex]

Where σ[itex]_{M}[/itex] is the standard deviation of the sampling distribution of mean

and

σ is the population standard deviation from which we take samples and calculate means

Ok, so the answer I have come up with to my own question starts like this:

If you have a population with say a μ=100 and σ=2 and you take an infinite number of samples of size n=1, calculate the mean of each sample and graph this sampling distribution of mean it will look exactly like the population. It will also have a μ=100 and σ=2.

Why? Because when you take the mean of each sample n=1 it will be the same as the one any only number in that sample.

Say sample 1 = {98} the mean of this will be 98.
and, samlpe 2 = {102} the mean of this will be 102.

Population = { ..., 98, 102, ...}
Mean of samples = { ..., 98, 102, ...}

If you keep doing this and you will just end up with the same infinite set that you had for the populations normal distribution,

Now, let's increase the sample size to n=2

Say samlpe 1 = {98, 101} the mean of this now be 99.5
and sample 2 = {95, 100} the mean of this now be 97.5

Population = { ..., 95, 98, 100, 101, ...}
Mean of samples = { ...,97.5, 99.5, ...}

If you keep doing this you will end up with a different infinite set of numbers to that of the population and your new set of numbers will have a different σ. Why will the standard deviation be different?

The process of taking a mean of each sample has created a set of values that are closer together than the values of the population and thus the sampling distribution of the mean will have a smaller standard deviation than the population if n > 1.

Now why this is equal to exactly σ[itex]_{M}[/itex] = σ / [itex]\sqrt{n}[/itex] I can't explain but I'm happy to accept it at this point.

My explanation is pretty rough and will probably only make sense to me but if anyone does follow what I'm saying please feel free to point out any mistakes or improvements.

chiro · Apr 7, 2012

nraic said:

Now why this is equal to exactly σ[itex]_{M}[/itex] = σ / [itex]\sqrt{n}[/itex] I can't explain but I'm happy to accept it at this point.

My explanation is pretty rough and will probably only make sense to me but if anyone does follow what I'm saying please feel free to point out any mistakes or improvements.

Mathematically you need to consider the variance of your estimator.

Your estimator is going to be Y = (X1 + X2 + ... + XN)/N where all the X's come from the same underlying distribution. Your estimator for the mean is going Y and what you have to find out is E[Y] for 'point estimate' (the estimator is still a random variable but one of the properties of a good estimator is that no matter how many data points you have, E[Y] never changes) as well as Var[Y].

Now remember that all the X's are independent and identically distributed from the same underlying distribution. We know that σ² is a known parameter for the underlying distribution which means VAR[X1] = VAR[X2] = ... = VAR[Xn] = σ² for this example.

Now given that Var[aX + bY] = a^2Var[X] + b^2Var[Y] if X and Y are independent, then for the estimator Y above, what is the variance of Y: in other words what is Var[Y] for (X1 + X2 + ... + XN)/N and subsequently what is the standard deviation for the estimator Y?

woopyoudead · Sep 13, 2013

If you want an example of the law of large numbers, check out this simulator

http://www.btwaters.com/probab/dice/dicemain3D.html

Set the number of dice to 1, the number of rolls to 1, and the results to session.

Notice that the expected value is one sixth.

click auto roll once, and observe the changes in the table. Repeat as many times as you can and watch the results start to normalize. Refresh the page and then follow the same procedure except leave the number of rolls per click at its default of 1000. How does this change your results?

yumna · Oct 15, 2014

Hi
I had a question regarding this as well but i have no idea how to post here because I'm new, I'm just going to go ahead and post the question here. I have an assignment and I don't understand how to go about this: with a sample size of 1000 Ontario high school students you find that the mean reading score is 530 with a standard deviation of 90.
The question is "what if, instead of a sample, you obtained the same mean and standard deviation when testing every Ontario high school student? How would the standard error of estimate and the 95 and 99 percent confidence intervals for the estimate of the mean change?" Thank you in advance.

- Yumna

chiro · Oct 18, 2014

A standard deviation needs a sample (i.e. one with more than one data point).

The intervals rely on the mean and standard error if you are using a normal distribution (or as an approximation) and if you have the exact same things across each sample, then the interval would be the same.

You have to consider how the variables of the mean, standard error of the mean, and its relation to the number of data points in your sample effects the actual interval itself.

steph91 · Nov 7, 2014

I wonder if the original poster (nraic) confused sample size of individual data points with sample size of a distribution of means (which I have done, trying to remember stats from many years ago). For individual data (let's say heights of college-aged men), my understanding is that, once there is enough data for the mean to stabilize, collecting more data will not change the shape of the distribution (or the standard deviation of this distribution). I think this is what the original poster's intuition is, and I think that his/her intuition is correct. However, the standard deviation of the distribution representing the probability of the mean decreases as the sample size (e.g., number of heights) increases. So, in other words, your estimate of the mean height becomes more accurate (specifically, the range of your X% confidence level narrows) as you collect more data (which I think is also intuitive).

Sample Size and Standard Deviation of the Sampling Distribution of the Mean

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight