Central Limit Theorem applied to a Bernoulli distribution

Click For Summary

Discussion Overview

The discussion revolves around the application of the central limit theorem (CLT) to a Bernoulli distribution, particularly focusing on how the sampling distribution of means from Bernoulli trials can be approximated by a normal distribution. Participants explore theoretical implications, mathematical definitions, and practical considerations related to this approximation.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • Some participants assert that the CLT indicates the sampling distribution of means will be approximately normal, even when the underlying distribution is bounded, such as in the case of Bernoulli trials.
  • Others clarify that the convergence of the sample mean to a normal distribution is contingent upon certain conditions being met, and that extreme cases can lead to distributions that do not conform to normality.
  • A participant emphasizes that the definition of convergence in distribution is complex and involves sequences of random variables, which may not align with intuitive notions of limits.
  • Some contributions highlight that while the sample mean is confined to the interval [0,1], the normal approximation becomes valid as sample size increases, with the distribution of sample means becoming narrowly centered around the population mean.
  • One participant presents simulations demonstrating that as the sample size increases, the distribution of sample means approaches normality, despite the original Bernoulli distribution being discrete.
  • Another participant points out a potential misunderstanding regarding whether the simulations were averaging or summing the Bernoulli trials, which affects the interpretation of results.
  • Concerns are raised about the implications of skewness in the distribution when the probability parameter p is close to 0 or 1, suggesting that larger sample sizes may be necessary in such cases for the CLT to hold effectively.

Areas of Agreement / Disagreement

Participants express differing views on the applicability of the CLT to Bernoulli distributions, with some agreeing on the general principle of approximation while others highlight specific conditions and exceptions. The discussion remains unresolved regarding the nuances of these conditions and their implications.

Contextual Notes

Some participants note that the convergence of distributions involves complex definitions and that the normal approximation may not hold in all scenarios, particularly when dealing with extreme values or skewed distributions.

TheOldHag
Messages
44
Reaction score
3
As I understand it, one result of the central limit theorem is that the sampling distribution of means drawn from any population will be approximately normal. Assume the population consist of Bernoulli trials with a given probability p and we want to estimate p. Then our population consist of zeros and ones and our sample means will be in the interval [0,1]. So I'm a bit confused how such a distribution of sample means can be approximated by a normal distribution which is never 0 in the interval (-infinity, infinity).

Is it because the probability is very small outside of [0,1] and thereby still a good approximation. Or am I not understanding the central limit theorem. I have googled a bit and most of what I have found is applying the central limit theorem to the binomial distribution. But this seems to be the easier case and my guess is having to estimate a proportion is very common. Any insight is appreciated.
 
Physics news on Phys.org
It is true that the sample mean ##s_n = \sum_{k=1}^{n} x_n## will be in the interval ##[0,1]##. The central limit theorem says that ##\sqrt{n}(s_n - \mu) \rightarrow N(0, \sigma)## (convergence in distribution). Equivalently, the distribution of ##s_n## is approximated by ##N(\mu,\sigma/\sqrt{n})## for large ##n##. As ##n## increases, ##N(\mu, \sigma/\sqrt{n})## becomes narrowly centered around ##\mu##, and the tails (e.g. the portion of the pdf outside a narrow interval around ##\mu##) become very small. This is reasonable because ##s_n## converges with probability 1 to ##\mu## by the law of large numbers, so we would expect that for large ##n##, its pdf should largely confined to a narrow interval around ##\mu##.
 
TheOldHag said:
will be approximately normal.

To add to what Jbunniii said, the technical definition of "will be approximately" involves the concept of convergence of a sequence of functions to a function and the further complication of defining a convergence of a sequence of random variables to another random variable. These concepts are more complicated than the concept of the convergence of a sequence of numbers to another number (i.e. the ordinary idea of limit studied in first semester calculus).

Of course, you can find an article about the convergence of sequences of random variables in the Wikipedia. If you want to pursue this, you should begin with an exact statement of what the Central Limit Theorem says.
 
One more thing I would like to mention: observe that as long as the ##x_n## are independent and share a common mean ##\mu## and standard deviation ##\sigma##, then ##E[s_n] = \mu## and ##\text{var}(s_n) = \sigma^2/n##, and so
$$E[\sqrt{n}(s_n - \mu)] = \sqrt{n}(E[s_n] - \mu) = \sqrt{n}(\mu - \mu) = 0$$
and
$$\begin{align}
\text{var}(\sqrt{n}(s_n - \mu)) &= E[n(s_n - \mu)^2] \\
&= n(E[s_n^2] - 2\mu E[s_n] + \mu^2) \\
&= n(E[s_n^2] - \mu^2) \\
&= n(\text{var}(s_n)) \\
&= n(\sigma^2/n) \\
&= \sigma^2\end{align}$$
So even without taking any limit, we know that ##\sqrt{n}(s_n - \mu)## must have mean ##0## and standard deviation ##\sigma## for every ##n##. Thus it's no surprise that the limit distribution (if it exists) must also have this mean and standard deviation. So the real content of the CLT is that if we add the hypothesis that the ##x_n## are identically distributed, then (1) the sequence does converge in distribution and (2) the limit distribution is normal.

One last thing to note is that convergence in distribution means that the sequence of cdf's (cumulative distribution functions) converges pointwise. Thus the fact a Bernoullli random variable, or any finite sum of them, is discrete whereas a normal distribution is continuous, does not cause any difficulty. We do NOT claim that the sequence of probability mass functions of the Bernoulli sums somehow converges to
$$\frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{x^2}{2\sigma^2}\right)$$
 
TheOldHag said:
As I understand it, one result of the central limit theorem is that the sampling distribution of means drawn from any population will be approximately normal.
This isn't true. There is a condition that is easily met, so don't worry about it. A distribution that does not meet the condition is X/Y where both are standard normal variables with mean zero. When Y is close to zero you will get freakish results. This distribution doesn't even have a mean. But this sort of thing never seems to show up in real life.

If you meet that condition, then the sample means converge to a normal. It may take a long time for it to converge, but most people are interested in bounded distributions without extreme skew, so in practice things usually converge quite quickly.

TheOldHag said:
Assume the population consist of Bernoulli trials with a given probability p and we want to estimate p. Then our population consist of zeros and ones and our sample means will be in the interval [0,1]. So I'm a bit confused how such a distribution of sample means can be approximated by a normal distribution which is never 0 in the interval (-infinity, infinity).

The normal distribution is an abstract mathematical model. Real life NEVER matches it exactly. This is applied mathematics so when they say "the distribution is normal" they really mean approximately normal. So don't worry about it, unless you have a Bernoulli distribution with p close to 0 or 1. Then you have a skewed distribution and will have to have a large sample size. If p is close to zero then you need a sample size like 15/p. If p=10^-9 you will have extreme skew and will need a very large sample.
 
In order to illustrate the answers in this thread, I have made some simulations.

The first thing I did was to simulate a basic Bernouilli(1/2) variable. So I generated 10000 data points which were distributed Bernouilli(1/2). This is the plot of the frequencies of the data points:

attachment.php?attachmentid=70521&stc=1&d=1402500631.png


As we expect, about half of the data points are in 0, and half are in 1.

Then I took 2 data points who were Bernouilli(1/2) distributed and took the average of the two. I did this 10000 times. This is the plot of the frequencies:

attachment.php?attachmentid=70522&stc=1&d=1402500631.png


I then took 5 data points who were Bernouilli(1/2) distributed and took the average of the 5. I did this 10000 times. This is the plot of the frequencies:

attachment.php?attachmentid=70523&stc=1&d=1402500631.png


I then took 10 data points who were Bernouilli(1/2) distributed and took the average of the 10. I did this 10000 times. This is the plot of the frequencies:

attachment.php?attachmentid=70524&stc=1&d=1402500631.png


I then took 100 data points who were Bernouilli(1/2) distributed and took their average. I did this 10000 times. This is the plot of the frequences:

attachment.php?attachmentid=70525&stc=1&d=1402500631.png


I then took 1000 data points who were Bernouilli(1/2) distributed and took their average. I did this 10000 times. This is the plot of the frequencies:

attachment.php?attachmentid=70526&stc=1&d=1402500631.png


As you see, the mean of the distribution translates to the right if I take enough data points in the average. You also see that I get very close to a normal distribution. Even if I only take averages of 10 data points, I'm already quite close! The variance also gets smaller each time (relative to the total population).

So while you are right that I can never generate negative values. We do see that the population translates to the right and since the variance gets (relatively) small. Thus the theoretical contribution from the negative values in the normal distribution gets closer and closer to 0.
 

Attachments

  • Data1.png
    Data1.png
    2 KB · Views: 1,028
  • Data2.png
    Data2.png
    1.9 KB · Views: 1,075
  • Data5.png
    Data5.png
    2.1 KB · Views: 1,075
  • Data10.png
    Data10.png
    2.4 KB · Views: 1,017
  • Data100.png
    Data100.png
    2.9 KB · Views: 1,075
  • Data1000.png
    Data1000.png
    2.3 KB · Views: 1,045
@micro - Based on the x-axis in your plots, I think your simulation is summing the Bernoulli trials, not averaging them, correct?

I also noticed that I omitted a ##1/n## scale factor in my definition of ##s_n## in post #2. It should have been
$$s_n = \frac{1}{n}\sum_{k=1}^{n} x_n$$
The rest of my remarks assumed we were talking about averages, not sums.

If we sum instead of averaging, we get similar results, except the mean and standard deviation of ##\sum_{k=1}^n x_n## are ##n \mu## and ##\sqrt{n}\sigma## instead of ##\mu## and ##\sigma/\sqrt{n}##. However, the sums do NOT converge in distribution.
 
  • Like
Likes   Reactions: 1 person
jbunniii said:
@micro - Based on the x-axis in your plots, I think your simulation is summing the Bernoulli trials, not averaging them, correct?

Yes, that is correct. Here are the plots for the averages:

The averages of the above plots remain the same now (as 1/2). But the variances decrease each time. So the corresponding normal distributions have variances which decrease. So the negative points have probabilities which decrease to 0.

attachment.php?attachmentid=70527&stc=1&d=1402506411.png


attachment.php?attachmentid=70528&stc=1&d=1402506411.png


attachment.php?attachmentid=70529&stc=1&d=1402506411.png


attachment.php?attachmentid=70530&stc=1&d=1402506411.png


attachment.php?attachmentid=70531&stc=1&d=1402506411.png


attachment.php?attachmentid=70532&stc=1&d=1402506411.png
 

Attachments

  • Data2.png
    Data2.png
    2 KB · Views: 1,079
  • Data5.png
    Data5.png
    2.2 KB · Views: 1,070
  • Data10.png
    Data10.png
    2.5 KB · Views: 1,068
  • Data100.png
    Data100.png
    2.4 KB · Views: 1,052
  • Data1000.png
    Data1000.png
    2 KB · Views: 1,031
  • Data1.png
    Data1.png
    2 KB · Views: 1,005
  • Like
Likes   Reactions: 1 person

Similar threads

  • · Replies 31 ·
2
Replies
31
Views
4K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 4 ·
Replies
4
Views
6K
  • · Replies 3 ·
Replies
3
Views
2K