Does the Central Limit Theorem Mean Sample Means Form a Normal Distribution?

In summary, the central limit theorem states that the mean of a large independent random sample from a population will have an approximately normal distribution. This means that the probability of the sample mean being close to the population mean increases as the sample size increases. However, this does not guarantee that the actual outcomes will be close to the population mean, but rather talks about the probabilities and approximations of probability distributions. It is important to understand the precise mathematical statement of the theorem, rather than relying on popularized summaries.
  • #1
ampakine
60
0
I keep reading explanations that say things like "the mean is normally approximated" but I don't know what that means. Are they saying that if you take a load of samples then plot the means of every one of those samples on a graph that the mean of that graph will be approximately the population mean of the population that you took the samples from? For example let's say I want to know the probability that I can catapult a gypsy 15 metres or more so once a year I go around the world randomly picking out gypsies and seeing how far I can catapult them. The population is how far I can catapult every gypsy in the world but on my yearly I only catapult about 20 gypsies. Does the central limit theorem say that if I take the average catapult distances obtained from each of these yearly gypsy catapulting expeditions and plot them on a graph that this graph will keep becoming more and more normal every year as I add a new mean to it? Is that the idea or have I got it wrong?
 
Physics news on Phys.org
  • #2
ampakine said:
I keep reading explanations that say things like "the mean is normally approximated" but I don't know what that means.

You should read and ask questions about the precise mathematical statement of the theorem, not about popularized summaries of it. Then people will offer you popularized summaries of it as explanations and you can demand to know what they are talking about.

Are they saying that if you take a load of samples then plot the means of every one of those samples on a graph that the mean of that graph will be approximately the population mean of the population that you took the samples from?

Theorems about probability rarely say anything definite about actual outcomes, approximate or otherwise. Instead they talk about the probabilities of outcomes and approximations to probability distributions. The mean of an independent random sample of N things from a population has an approximately normal distribution. The standard deviation of this approximately normal distribution gets smaller as N increases. So it would be correct conclusion that the mean of a large sample is "probably" close to the mean of the population. The Central Limit Theorem implies this, but it wouldn't be correct to say that they Central Limit Theorem "is" that statement.

For example let's say I want to know the probability that I can catapult a gypsy 15 metres or more so once a year I go around the world randomly picking out gypsies and seeing how far I can catapult them. The population is how far I can catapult every gypsy in the world but on my yearly I only catapult about 20 gypsies. Does the central limit theorem say that if I take the average catapult distances obtained from each of these yearly gypsy catapulting expeditions and plot them on a graph that this graph will keep becoming more and more normal every year as I add a new mean to it? Is that the idea or have I got it wrong?

Suppose you have a random variable whose distribution is not normal. Imagine one whose density is shaped like an isoceles triangle. If you take samples of size 1 and plot them as a histogram, the it is probable that your plot will begin to look like an isoceles triangle as you take more and more samples.

Suppose you take samples of size 10 and histogram the mean of those samples (not the value of each of the 10 individually, only the mean of all 10). You do this for many samples of size 10. Then it is probable that your plot won't look like an isoceles triangle. It will be smoother.

If you take samples of size 1000 and histogram their means, the graph will probably be even smoother and more "normal" looking.
 
  • #3
Stephen Tashi said:
The mean of an independent random sample of N things from a population has an approximately normal distribution.

I think its the terminology that's confusing me. You say that the mean of a sample has an approximately normal distribution. When I think of a sample I think of 1 sample which can have only 1 mean so the idea of a distribution doesn't make sense. Do you mean that if I was to take multiple samples from a population then get the mean of each of these samples then plot them on a graph that I'd have a distribution for the mean?
 
  • #4
If you only look at the mean of, say, 100 independently chosen values of a random variable, you increase the probabilities that extreme values in the sample will "cancel out". If you only plot the means of such "batched" samples, they have a smaller probability of taking on extreme values than if you plot the values of single samples. (For example,of X is a 0-or-1 random variable, the mean of a sample of 100 X's might be 100/100 = 1, but that is less likely than observing a single example of X = 1. Furthermore a histogram of single observations, can't look very normal since it has only two bars on it for X = 0 or X =1 , while the histogram of a sample mean has bars for values like 85/100, 10/100 etc.)


If you had 10,000 independent random samples, you could group them into mutually exclusive batches of 10 or 100 or 1000 etc. The Central Limit Theorem doesn't say that you can "cheat the Devil" by doing this and gain greater and greater certainty about the mean value. (The number of samples N is decreasing as you group observations into larger and larger batches.) If you take the mean of the 10,000 observations and plot it as a single point then, yes, the Central Limit Theorem says you have plotted 1 observation from an approximately normal distribution and it tells you about the standard deviation of that distribution.
 
Last edited:
  • #5

The Central Limit Theorem is a fundamental concept in statistics that states that the means of a large number of independent and identically distributed random variables will follow a normal distribution, regardless of the underlying distribution of the variables themselves. This means that if you take a large number of samples from a population and calculate the mean of each sample, the distribution of those means will be approximately normal.

In your example, if you were to take a large number of samples of catapult distances from different gypsies and calculate the mean of each sample, the distribution of those means would follow a normal distribution. This is important because it allows us to make inferences about the population mean based on a sample mean, as the sample mean will be a good approximation of the population mean.

This concept is particularly important in STEM fields because it allows us to make predictions and draw conclusions based on limited data. For example, in scientific experiments, we often take a sample of individuals and make conclusions about the entire population based on that sample. The Central Limit Theorem ensures that these conclusions will be valid, as long as the sample is representative of the population.

In summary, the Central Limit Theorem is a powerful tool in statistics that allows us to make inferences about a population based on a sample. It is a fundamental concept in STEM fields and is crucial for understanding and interpreting data.
 

FAQ: Does the Central Limit Theorem Mean Sample Means Form a Normal Distribution?

1. What is the Central Limit Theorem?

The Central Limit Theorem is a fundamental concept in statistics that states that as the sample size of a population increases, the distribution of sample means approaches a normal distribution, regardless of the shape of the original population distribution. This means that the mean of a large enough sample is a good estimate of the mean of the entire population.

2. How is the Central Limit Theorem used in STEM?

In STEM (Science, Technology, Engineering, and Mathematics), the Central Limit Theorem is used to make statistical inferences and predictions based on sample data. It allows scientists to estimate the parameters of a population and test hypotheses using the mean of a sample.

3. Can the Central Limit Theorem be applied to any sample size?

The Central Limit Theorem works best for sample sizes larger than 30, but it can still be applied to smaller sample sizes. However, as the sample size decreases, the sample mean may not be as accurate of an estimate of the population mean.

4. How does the Central Limit Theorem relate to the Law of Large Numbers?

The Law of Large Numbers states that as the sample size increases, the sample mean gets closer to the population mean. This is similar to the Central Limit Theorem, which also shows that as the sample size increases, the distribution of sample means gets closer to a normal distribution.

5. Why is the Central Limit Theorem important in science and research?

The Central Limit Theorem is important in science and research because it allows scientists to make reliable inferences and predictions based on sample data. It also helps to reduce the effects of random variation in data, making it easier to compare and analyze results from different studies.

Back
Top