# Central Limit Theorem applied to a Bernoulli distribution

• TheOldHag
In summary: The central limit theorem says that the distribution of sample means converges to a normal distribution as the size of the sample increases.
TheOldHag
As I understand it, one result of the central limit theorem is that the sampling distribution of means drawn from any population will be approximately normal. Assume the population consist of Bernoulli trials with a given probability p and we want to estimate p. Then our population consist of zeros and ones and our sample means will be in the interval [0,1]. So I'm a bit confused how such a distribution of sample means can be approximated by a normal distribution which is never 0 in the interval (-infinity, infinity).

Is it because the probability is very small outside of [0,1] and thereby still a good approximation. Or am I not understanding the central limit theorem. I have googled a bit and most of what I have found is applying the central limit theorem to the binomial distribution. But this seems to be the easier case and my guess is having to estimate a proportion is very common. Any insight is appreciated.

It is true that the sample mean ##s_n = \sum_{k=1}^{n} x_n## will be in the interval ##[0,1]##. The central limit theorem says that ##\sqrt{n}(s_n - \mu) \rightarrow N(0, \sigma)## (convergence in distribution). Equivalently, the distribution of ##s_n## is approximated by ##N(\mu,\sigma/\sqrt{n})## for large ##n##. As ##n## increases, ##N(\mu, \sigma/\sqrt{n})## becomes narrowly centered around ##\mu##, and the tails (e.g. the portion of the pdf outside a narrow interval around ##\mu##) become very small. This is reasonable because ##s_n## converges with probability 1 to ##\mu## by the law of large numbers, so we would expect that for large ##n##, its pdf should largely confined to a narrow interval around ##\mu##.

TheOldHag said:
will be approximately normal.

To add to what Jbunniii said, the technical definition of "will be approximately" involves the concept of convergence of a sequence of functions to a function and the further complication of defining a convergence of a sequence of random variables to another random variable. These concepts are more complicated than the concept of the convergence of a sequence of numbers to another number (i.e. the ordinary idea of limit studied in first semester calculus).

Of course, you can find an article about the convergence of sequences of random variables in the Wikipedia. If you want to pursue this, you should begin with an exact statement of what the Central Limit Theorem says.

One more thing I would like to mention: observe that as long as the ##x_n## are independent and share a common mean ##\mu## and standard deviation ##\sigma##, then ##E[s_n] = \mu## and ##\text{var}(s_n) = \sigma^2/n##, and so
$$E[\sqrt{n}(s_n - \mu)] = \sqrt{n}(E[s_n] - \mu) = \sqrt{n}(\mu - \mu) = 0$$
and
\begin{align} \text{var}(\sqrt{n}(s_n - \mu)) &= E[n(s_n - \mu)^2] \\ &= n(E[s_n^2] - 2\mu E[s_n] + \mu^2) \\ &= n(E[s_n^2] - \mu^2) \\ &= n(\text{var}(s_n)) \\ &= n(\sigma^2/n) \\ &= \sigma^2\end{align}
So even without taking any limit, we know that ##\sqrt{n}(s_n - \mu)## must have mean ##0## and standard deviation ##\sigma## for every ##n##. Thus it's no surprise that the limit distribution (if it exists) must also have this mean and standard deviation. So the real content of the CLT is that if we add the hypothesis that the ##x_n## are identically distributed, then (1) the sequence does converge in distribution and (2) the limit distribution is normal.

One last thing to note is that convergence in distribution means that the sequence of cdf's (cumulative distribution functions) converges pointwise. Thus the fact a Bernoullli random variable, or any finite sum of them, is discrete whereas a normal distribution is continuous, does not cause any difficulty. We do NOT claim that the sequence of probability mass functions of the Bernoulli sums somehow converges to
$$\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{x^2}{2\sigma^2}\right)$$

TheOldHag said:
As I understand it, one result of the central limit theorem is that the sampling distribution of means drawn from any population will be approximately normal.
This isn't true. There is a condition that is easily met, so don't worry about it. A distribution that does not meet the condition is X/Y where both are standard normal variables with mean zero. When Y is close to zero you will get freakish results. This distribution doesn't even have a mean. But this sort of thing never seems to show up in real life.

If you meet that condition, then the sample means converge to a normal. It may take a long time for it to converge, but most people are interested in bounded distributions without extreme skew, so in practice things usually converge quite quickly.

TheOldHag said:
Assume the population consist of Bernoulli trials with a given probability p and we want to estimate p. Then our population consist of zeros and ones and our sample means will be in the interval [0,1]. So I'm a bit confused how such a distribution of sample means can be approximated by a normal distribution which is never 0 in the interval (-infinity, infinity).

The normal distribution is an abstract mathematical model. Real life NEVER matches it exactly. This is applied mathematics so when they say "the distribution is normal" they really mean approximately normal. So don't worry about it, unless you have a Bernoulli distribution with p close to 0 or 1. Then you have a skewed distribution and will have to have a large sample size. If p is close to zero then you need a sample size like 15/p. If p=10^-9 you will have extreme skew and will need a very large sample.

The first thing I did was to simulate a basic Bernouilli(1/2) variable. So I generated 10000 data points which were distributed Bernouilli(1/2). This is the plot of the frequencies of the data points:

As we expect, about half of the data points are in 0, and half are in 1.

Then I took 2 data points who were Bernouilli(1/2) distributed and took the average of the two. I did this 10000 times. This is the plot of the frequencies:

I then took 5 data points who were Bernouilli(1/2) distributed and took the average of the 5. I did this 10000 times. This is the plot of the frequencies:

I then took 10 data points who were Bernouilli(1/2) distributed and took the average of the 10. I did this 10000 times. This is the plot of the frequencies:

I then took 100 data points who were Bernouilli(1/2) distributed and took their average. I did this 10000 times. This is the plot of the frequences:

I then took 1000 data points who were Bernouilli(1/2) distributed and took their average. I did this 10000 times. This is the plot of the frequencies:

As you see, the mean of the distribution translates to the right if I take enough data points in the average. You also see that I get very close to a normal distribution. Even if I only take averages of 10 data points, I'm already quite close! The variance also gets smaller each time (relative to the total population).

So while you are right that I can never generate negative values. We do see that the population translates to the right and since the variance gets (relatively) small. Thus the theoretical contribution from the negative values in the normal distribution gets closer and closer to 0.

#### Attachments

• Data1.png
2 KB · Views: 954
• Data2.png
1.9 KB · Views: 979
• Data5.png
2.1 KB · Views: 982
• Data10.png
2.4 KB · Views: 942
• Data100.png
2.9 KB · Views: 988
• Data1000.png
2.3 KB · Views: 970
@micro - Based on the x-axis in your plots, I think your simulation is summing the Bernoulli trials, not averaging them, correct?

I also noticed that I omitted a ##1/n## scale factor in my definition of ##s_n## in post #2. It should have been
$$s_n = \frac{1}{n}\sum_{k=1}^{n} x_n$$
The rest of my remarks assumed we were talking about averages, not sums.

If we sum instead of averaging, we get similar results, except the mean and standard deviation of ##\sum_{k=1}^n x_n## are ##n \mu## and ##\sqrt{n}\sigma## instead of ##\mu## and ##\sigma/\sqrt{n}##. However, the sums do NOT converge in distribution.

1 person
jbunniii said:
@micro - Based on the x-axis in your plots, I think your simulation is summing the Bernoulli trials, not averaging them, correct?

Yes, that is correct. Here are the plots for the averages:

The averages of the above plots remain the same now (as 1/2). But the variances decrease each time. So the corresponding normal distributions have variances which decrease. So the negative points have probabilities which decrease to 0.

#### Attachments

• Data2.png
2 KB · Views: 990
• Data5.png
2.2 KB · Views: 987
• Data10.png
2.5 KB · Views: 986
• Data100.png
2.4 KB · Views: 962
• Data1000.png
2 KB · Views: 966
• Data1.png
2 KB · Views: 923
1 person

## What is the Central Limit Theorem?

The Central Limit Theorem (CLT) is a statistical theory that states that when independent random variables are added together, their sum will tend towards a normal distribution, even if the original variables themselves are not normally distributed.

## How does the Central Limit Theorem apply to a Bernoulli distribution?

The Central Limit Theorem can be applied to a Bernoulli distribution when the sample size is sufficiently large. This means that if we take a large number of independent samples from a Bernoulli distribution and calculate the mean of each sample, the distribution of these means will tend towards a normal distribution.

## What is the significance of the Central Limit Theorem for a Bernoulli distribution?

The Central Limit Theorem is significant for a Bernoulli distribution because it allows us to use the normal distribution to make inferences about the population parameter (e.g. the probability of success) based on sample means. This is useful as the normal distribution is easier to work with and has well-established properties.

## What are the assumptions of the Central Limit Theorem?

The assumptions of the Central Limit Theorem are: 1) the sample size is sufficiently large (typically n > 30), 2) the samples are independent, and 3) the variability within the population is not too large. Violating these assumptions can lead to inaccurate results.

## How can the Central Limit Theorem be used in practice for a Bernoulli distribution?

In practice, the Central Limit Theorem can be used to estimate the population parameter (e.g. the probability of success) by calculating the mean of multiple independent samples from a Bernoulli distribution. This estimated mean will be normally distributed, allowing for the calculation of confidence intervals and hypothesis testing.

• Set Theory, Logic, Probability, Statistics
Replies
7
Views
763
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
689
• Calculus and Beyond Homework Help
Replies
1
Views
435
• Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
9
Views
3K
• Set Theory, Logic, Probability, Statistics
Replies
30
Views
3K
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K