Central Limit Theorem applied to a Bernoulli distribution

Click For Summary
SUMMARY

The Central Limit Theorem (CLT) states that the sampling distribution of sample means from a population, such as Bernoulli trials with probability p, will approximate a normal distribution as the sample size increases. Specifically, for large n, the distribution of the sample mean, denoted as s_n, converges to N(μ, σ/√n), where μ is the population mean and σ is the population standard deviation. This approximation holds true even though the sample means are confined to the interval [0,1], as the probability of extreme values outside this range diminishes with larger sample sizes. The discussion emphasizes that while the normal distribution is a continuous model, the convergence of the cumulative distribution functions (CDFs) of sample means to a normal distribution is valid.

PREREQUISITES
  • Understanding of the Central Limit Theorem (CLT)
  • Familiarity with Bernoulli distributions and their properties
  • Knowledge of convergence in distribution and its implications
  • Basic statistics, including mean and variance calculations
NEXT STEPS
  • Study the mathematical formulation of the Central Limit Theorem in detail
  • Explore the properties of Bernoulli distributions and their applications
  • Learn about convergence of random variables and its significance in statistics
  • Investigate simulation techniques for visualizing the CLT using statistical software
USEFUL FOR

Statisticians, data scientists, and students of probability theory who are interested in understanding the application of the Central Limit Theorem to Bernoulli distributions and its implications for statistical analysis.

TheOldHag
Messages
44
Reaction score
3
As I understand it, one result of the central limit theorem is that the sampling distribution of means drawn from any population will be approximately normal. Assume the population consist of Bernoulli trials with a given probability p and we want to estimate p. Then our population consist of zeros and ones and our sample means will be in the interval [0,1]. So I'm a bit confused how such a distribution of sample means can be approximated by a normal distribution which is never 0 in the interval (-infinity, infinity).

Is it because the probability is very small outside of [0,1] and thereby still a good approximation. Or am I not understanding the central limit theorem. I have googled a bit and most of what I have found is applying the central limit theorem to the binomial distribution. But this seems to be the easier case and my guess is having to estimate a proportion is very common. Any insight is appreciated.
 
Physics news on Phys.org
It is true that the sample mean ##s_n = \sum_{k=1}^{n} x_n## will be in the interval ##[0,1]##. The central limit theorem says that ##\sqrt{n}(s_n - \mu) \rightarrow N(0, \sigma)## (convergence in distribution). Equivalently, the distribution of ##s_n## is approximated by ##N(\mu,\sigma/\sqrt{n})## for large ##n##. As ##n## increases, ##N(\mu, \sigma/\sqrt{n})## becomes narrowly centered around ##\mu##, and the tails (e.g. the portion of the pdf outside a narrow interval around ##\mu##) become very small. This is reasonable because ##s_n## converges with probability 1 to ##\mu## by the law of large numbers, so we would expect that for large ##n##, its pdf should largely confined to a narrow interval around ##\mu##.
 
TheOldHag said:
will be approximately normal.

To add to what Jbunniii said, the technical definition of "will be approximately" involves the concept of convergence of a sequence of functions to a function and the further complication of defining a convergence of a sequence of random variables to another random variable. These concepts are more complicated than the concept of the convergence of a sequence of numbers to another number (i.e. the ordinary idea of limit studied in first semester calculus).

Of course, you can find an article about the convergence of sequences of random variables in the Wikipedia. If you want to pursue this, you should begin with an exact statement of what the Central Limit Theorem says.
 
One more thing I would like to mention: observe that as long as the ##x_n## are independent and share a common mean ##\mu## and standard deviation ##\sigma##, then ##E[s_n] = \mu## and ##\text{var}(s_n) = \sigma^2/n##, and so
$$E[\sqrt{n}(s_n - \mu)] = \sqrt{n}(E[s_n] - \mu) = \sqrt{n}(\mu - \mu) = 0$$
and
$$\begin{align}
\text{var}(\sqrt{n}(s_n - \mu)) &= E[n(s_n - \mu)^2] \\
&= n(E[s_n^2] - 2\mu E[s_n] + \mu^2) \\
&= n(E[s_n^2] - \mu^2) \\
&= n(\text{var}(s_n)) \\
&= n(\sigma^2/n) \\
&= \sigma^2\end{align}$$
So even without taking any limit, we know that ##\sqrt{n}(s_n - \mu)## must have mean ##0## and standard deviation ##\sigma## for every ##n##. Thus it's no surprise that the limit distribution (if it exists) must also have this mean and standard deviation. So the real content of the CLT is that if we add the hypothesis that the ##x_n## are identically distributed, then (1) the sequence does converge in distribution and (2) the limit distribution is normal.

One last thing to note is that convergence in distribution means that the sequence of cdf's (cumulative distribution functions) converges pointwise. Thus the fact a Bernoullli random variable, or any finite sum of them, is discrete whereas a normal distribution is continuous, does not cause any difficulty. We do NOT claim that the sequence of probability mass functions of the Bernoulli sums somehow converges to
$$\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{x^2}{2\sigma^2}\right)$$
 
TheOldHag said:
As I understand it, one result of the central limit theorem is that the sampling distribution of means drawn from any population will be approximately normal.
This isn't true. There is a condition that is easily met, so don't worry about it. A distribution that does not meet the condition is X/Y where both are standard normal variables with mean zero. When Y is close to zero you will get freakish results. This distribution doesn't even have a mean. But this sort of thing never seems to show up in real life.

If you meet that condition, then the sample means converge to a normal. It may take a long time for it to converge, but most people are interested in bounded distributions without extreme skew, so in practice things usually converge quite quickly.

TheOldHag said:
Assume the population consist of Bernoulli trials with a given probability p and we want to estimate p. Then our population consist of zeros and ones and our sample means will be in the interval [0,1]. So I'm a bit confused how such a distribution of sample means can be approximated by a normal distribution which is never 0 in the interval (-infinity, infinity).

The normal distribution is an abstract mathematical model. Real life NEVER matches it exactly. This is applied mathematics so when they say "the distribution is normal" they really mean approximately normal. So don't worry about it, unless you have a Bernoulli distribution with p close to 0 or 1. Then you have a skewed distribution and will have to have a large sample size. If p is close to zero then you need a sample size like 15/p. If p=10^-9 you will have extreme skew and will need a very large sample.
 
In order to illustrate the answers in this thread, I have made some simulations.

The first thing I did was to simulate a basic Bernouilli(1/2) variable. So I generated 10000 data points which were distributed Bernouilli(1/2). This is the plot of the frequencies of the data points:

attachment.php?attachmentid=70521&stc=1&d=1402500631.png


As we expect, about half of the data points are in 0, and half are in 1.

Then I took 2 data points who were Bernouilli(1/2) distributed and took the average of the two. I did this 10000 times. This is the plot of the frequencies:

attachment.php?attachmentid=70522&stc=1&d=1402500631.png


I then took 5 data points who were Bernouilli(1/2) distributed and took the average of the 5. I did this 10000 times. This is the plot of the frequencies:

attachment.php?attachmentid=70523&stc=1&d=1402500631.png


I then took 10 data points who were Bernouilli(1/2) distributed and took the average of the 10. I did this 10000 times. This is the plot of the frequencies:

attachment.php?attachmentid=70524&stc=1&d=1402500631.png


I then took 100 data points who were Bernouilli(1/2) distributed and took their average. I did this 10000 times. This is the plot of the frequences:

attachment.php?attachmentid=70525&stc=1&d=1402500631.png


I then took 1000 data points who were Bernouilli(1/2) distributed and took their average. I did this 10000 times. This is the plot of the frequencies:

attachment.php?attachmentid=70526&stc=1&d=1402500631.png


As you see, the mean of the distribution translates to the right if I take enough data points in the average. You also see that I get very close to a normal distribution. Even if I only take averages of 10 data points, I'm already quite close! The variance also gets smaller each time (relative to the total population).

So while you are right that I can never generate negative values. We do see that the population translates to the right and since the variance gets (relatively) small. Thus the theoretical contribution from the negative values in the normal distribution gets closer and closer to 0.
 

Attachments

  • Data1.png
    Data1.png
    2 KB · Views: 1,019
  • Data2.png
    Data2.png
    1.9 KB · Views: 1,067
  • Data5.png
    Data5.png
    2.1 KB · Views: 1,061
  • Data10.png
    Data10.png
    2.4 KB · Views: 1,006
  • Data100.png
    Data100.png
    2.9 KB · Views: 1,064
  • Data1000.png
    Data1000.png
    2.3 KB · Views: 1,038
@micro - Based on the x-axis in your plots, I think your simulation is summing the Bernoulli trials, not averaging them, correct?

I also noticed that I omitted a ##1/n## scale factor in my definition of ##s_n## in post #2. It should have been
$$s_n = \frac{1}{n}\sum_{k=1}^{n} x_n$$
The rest of my remarks assumed we were talking about averages, not sums.

If we sum instead of averaging, we get similar results, except the mean and standard deviation of ##\sum_{k=1}^n x_n## are ##n \mu## and ##\sqrt{n}\sigma## instead of ##\mu## and ##\sigma/\sqrt{n}##. However, the sums do NOT converge in distribution.
 
  • Like
Likes 1 person
jbunniii said:
@micro - Based on the x-axis in your plots, I think your simulation is summing the Bernoulli trials, not averaging them, correct?

Yes, that is correct. Here are the plots for the averages:

The averages of the above plots remain the same now (as 1/2). But the variances decrease each time. So the corresponding normal distributions have variances which decrease. So the negative points have probabilities which decrease to 0.

attachment.php?attachmentid=70527&stc=1&d=1402506411.png


attachment.php?attachmentid=70528&stc=1&d=1402506411.png


attachment.php?attachmentid=70529&stc=1&d=1402506411.png


attachment.php?attachmentid=70530&stc=1&d=1402506411.png


attachment.php?attachmentid=70531&stc=1&d=1402506411.png


attachment.php?attachmentid=70532&stc=1&d=1402506411.png
 

Attachments

  • Data2.png
    Data2.png
    2 KB · Views: 1,066
  • Data5.png
    Data5.png
    2.2 KB · Views: 1,058
  • Data10.png
    Data10.png
    2.5 KB · Views: 1,056
  • Data100.png
    Data100.png
    2.4 KB · Views: 1,041
  • Data1000.png
    Data1000.png
    2 KB · Views: 1,022
  • Data1.png
    Data1.png
    2 KB · Views: 991
  • Like
Likes 1 person

Similar threads

  • · Replies 31 ·
2
Replies
31
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
983
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
7K