Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

About central limit theorm

  1. Jun 28, 2010 #1

    KFC

    User Avatar

    Hi there,
    I am trying to understand the central limit theorem with simple example. As written in some texts, the central limit theorem can be stated as (I rephrase): for a population P, randomly pick n independent samples, X1, X2, ... , Xn. the average of (X1, X2, ..., Xn) approaches normal distribution as n is large enough.

    Here is my questions

    1) Central limit theorem is telling that it is the set of averages of every possible samples set approaching the normal distribution not the samples themself, right?

    2) to obtain the normal distribution, we should take many sets of samples (says k sets) from the population, for each set, we have n samples X1, X2, ..., Xn, which gives one average. So k sets will give k averages, all these averages will show a normal distribution if n is large enough?

    3) Let's consider one die, 6 faces. The population is {1, 2, 3, 4, 5, 6}. Randomly toss the die, which gives X1, and average is also X1 (only one die). Repeat this again for k times, we get a set of k averages, but these set of data will definitely be uniform. So small n won't reveal the bell shape (normal) distribution. Do I understand this correctly?

    4) Let's consider the same problem with 3 dice, if we take the sum of dice into account, the population will be {3, 4, 5, ..., 18}. This time, each set of sample has three elements X1 X2 X3 and which will give one average. Repeat this process k times will give k averages, which more or less show a normal distribution but not exactly. However, the more dice you use, the closer to normal distribution will be.

    5) Ok. Now look at different example about a warped roulette, found in a text. The roulette used in the experiment is not an ordinary one such that 17's appears at the probability of 2/38 instead of 1/38. To tell if this roulette is really a warped one or not, people spin it for many times and compute the related standard deviation. They find that the distribution for the warped one does not overlap with that for ordinary one; hence, it is easy to tell if the one in question is warped or not. My question is, if the author uses the normal distribution and standard deviation to tell the fact, he silently take central limit theorem into account. However, if my understanding in question 3) is correct, that is, only one die is not enough to reveal the property of normal distribution. So in this case, in each spin, we only get one number, it is no difference from one die. In other word, no matter how many times you spin the roulette, you will not get the normal distribution for the averages and it is meaningless to talk about the central limit theorem. Hence, the way mentioned above to tell if the roulette is warped is not correct. Right?
     
  2. jcsd
  3. Jun 28, 2010 #2

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Note that rolling one die three times is no different than rolling three (distinguishable) dice at once.


    Consider flipping a coin 10000 times. You can view this as 10000 samples from a Bernoulli distribution. Or, you can count the number of heads and view it as 1 sample from a binomial distribution with 10000 trials. (Or as 100 picks from a binomial distribution with 100 trials)

    The binomial distribution closely approximates normal with enough trials.
     
  4. Jun 28, 2010 #3

    KFC

    User Avatar

    So you mean, for the roulette case, if only I spin it for many times (n is large enough), the distribution will closely approximates normal? So does one-die case?

    I try this in computer. I take 10 dice into account, every time I produce a random toss and calculate the sum, repeat this with a loop until I create 100000 sums, plot these sums, I see a approximated normal distribution. But if I do it with only one die or two, no matter how many time I toss, it won't be normal.
     
  5. Jun 28, 2010 #4

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    I think you have the idea, although it's still not clear exactly what you're thinking.



    (Making usual assumptions -- e.g. each spin is independent and identically distributed)

    "How often does 17 come up in a million spins" has a normal distribution. If you repeated this million-spin test a thousand times and plotted a histogram of the results, you would see the approximate shape of the normal distribution.

    However, the point of the statistical test is that we already know it's going to be (approximately) normally distributed, and if 17 on the wheel is fair, we know the mean and standard deviation of that distribution too, and we can construct a confidence interval. So to test for unfairness, we only need to do the million spin experiment once, and check if the result lies outside the confidence interval.
     
  6. Jun 28, 2010 #5

    KFC

    User Avatar

    Thanks for your tips. I am reading your reply carefully. So you mean in the statement of central limit theorem, the sample set {X1, X2, X3, ... Xn} is the "million spins" you mentioned below, and if we repeat this "million spins" for many times, says k times, so we get a k sets of {X1, X2, X3, ... Xn}, the average of each set will be normal?

    Well, what I thought is each time I spin the wheel once, which will get me X1, and I repeat this again and again to get many X1's. I think this why I find it confusing.

    Ok. Now following your example, I want to redesign an example with one die. Now "I toss one die for millions times", and I repeat this "millions tossing" for k times and to find k averages, will these averages be normal distributed?

    Or I ask this way. If I want to find out if a die is weighted. Like your example, I have "How often does 3 come up in a million tosses" has a normal distribution. If you repeated this million-toss test a thousand times and plotted a histogram of the results, will you see the approximate shape of the normal distribution? I try that in computer but the results is still quite uniform. Why? What's the different between one-die problem and roulette?


     
    Last edited: Jun 28, 2010
  7. Jun 28, 2010 #6

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Maybe it's a bug.

    Or maybe it's how you're interpreting the results -- if you take 1000 samples from a continuous distribution, you will almost surely get each particular number exactly once. It doesn't matter which continuous distribution -- they all have this property. So if all you look at is the raw frequencies, you won't be able to tell anything.

    Try plotting the cumulative distribution function instead. Or increasing the size of the bins in your histogram.
     
  8. Jun 28, 2010 #7

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    I just did your experiment (but with 100 samples instead of 1000 samples), and my histogram is:

    165500 - 165750 : 1
    165750 - 166000 : 2
    166000 - 166250 : 7
    166250 - 166500 : 20
    166500 - 166750 : 28
    166750 - 167000 : 27
    167000 - 167250 : 9
    167250 - 167500 : 6

    Of course, 100 samples isn't all that much, nor do I think I'm using a strong random number generator.
     
  9. Jun 28, 2010 #8

    KFC

    User Avatar

    What is this experiment about? What is the number (165500 - 165750 for example)?
     
  10. Jun 28, 2010 #9

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    This was your "How often does 3 come up in a million tosses" experiment. Of my 100 samples, only one of them were in the closed interval [165500, 165749].

    (The stated intervals do not include their right endpoint)
     
  11. Jun 28, 2010 #10

    KFC

    User Avatar

    Thanks, but I still confuse. Here is how I did. I randomly toss a die for 100 times, count how many die will get me a 3. And I repeat this process "toss a die for 100 times, count 3's" for 10000 times, plot the histogram. But it does not reveal any shape I expect. I think I must get it wrong. Can you tell me how you get the histogram shown above? Thanks.
     
  12. Jun 28, 2010 #11

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    What shape did you get?

    I just did your experiment a few times. Each time my results all lie between 7 and 28, with a few stragglers outside that interval. Those endpoints only get a dozen or two results each, whereas the bits 15, 16, and 17 get around a thousand each. The whole thing basically looks like a hump centered on 16, as I would expect. It doesn't look perfect, but that would be an incredibly unlikely result.
     
  13. Jun 28, 2010 #12

    KFC

    User Avatar

    What I got is quite uniform. I think it must be something wrong. Can you please show me your pseudo code?

    Thanks.

     
  14. Jun 28, 2010 #13

    Hurkyl

    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Actual python code:
    Code (Text):

    import random
    def die(): return random.randint(1,6)
    def trial(): return sum(die() == 3 for x in xrange(100))
    experiment = [ trial() for x in xrange(10000) ]
    def histogram(li):
        x = {}
        for y in li:
            if y in x:
                x[y] += 1
            else:
                x[y] = 1
        return x
    print histogram(experiment)
     
     
  15. Jun 29, 2010 #14

    KFC

    User Avatar

    I see what's going on now. So you make a set of samples by tossing 100 dice, repeat that process for 10000 times to obtains 10000 sets, then count the frequencies will give you the normal distribution, right?

    Statistically, people call 10000 sets of samples ensemble? So the actual meaning of central limit theorem is: if the size of each element in the ensemble is large enough, the resulting distribution will be approximately normal, right?

     
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook