Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Why my random experiment has a log normal distribution?

  1. Oct 25, 2011 #1
    Hi,

    I am confused with the results of a seemingly simple simulation that is generating a log normally distributed output. Please see the attached results file.

    Simulation: I have built a Scratch program that randomly picks six letters from a group of six letters (A, B, C, D, E & E). The program displays the order in which the letters have been picked. I am interested in finding out how many iterations the program takes to get a specific order of letters (say, EEDCBA).

    I repeated this experiment 100 times and I was surprised to see log normally distributed results. I was hoping to see a normal distribution with an average of 360.

    Can someone please explain what is going on?

    Thanks,
     

    Attached Files:

  2. jcsd
  3. Oct 25, 2011 #2
    The actual distribution should be geometric with p=1/6^6 (i.e. counting the number of failures before a success).

    I guess a smallish sample (size 100) would superficially resemble lognormal.
     
  4. Oct 25, 2011 #3
    I assume you understand, that you cannot actually get neither lognormal nor normal distribution, as they are continuous, and your r.v. is discrete.

    If I understood your description, then your r.v. is just "the number of failures, before first success", where success is getting "EEDCBA" and trials are independent, right? In this case what you should get is the geometric distribution.

    P.S. I can't open your excel file, so can't give you details of what it's doing wrong
     
  5. Oct 25, 2011 #4
    Thanks folks.

    For those who are not able to open my excel file, I have attached a text file with my results.

    That is correct.

    Also, I got the 360 as follows: Prob of getting E in the first place = 2/6, prob of getting the second E in the second place = 1/5, prob of getting D in the third place = 1/4...and so on.


    probability of getting EEDCBA = 2/6 * 1/5* 1/4* 1/3 * 1/2 * 1 = 1/360
    How is this number related to the distribution? Is it the mean of the distribution?

    Also, can you please point to me a source where I can read more about this? I am not sure why I should get a geometric distribution.
     

    Attached Files:

    • Data.txt
      Data.txt
      File size:
      505 bytes
      Views:
      60
    Last edited: Oct 25, 2011
  6. Oct 26, 2011 #5
    p = 1/360 is the probability of success. Look at wikipedia article on geometric distribution: it says "geometric distribution [...] is the probability distribution of the number X of Bernoulli trials needed to get one success". Bernoulli trial means a trial which can have only two outcomes: 1 or 0 (or true/false, success/failure etc).

    Also, if p is the probability of success, then 1-p is probability of failure. In order to get (first) success on k-th trial (iteration), you need to fail k-1 times in a row and then have a success, thus [tex]P(X=k) = (1-p)^{k-1}p,[/tex] which is exactly the pmf of geometric distribution.

    Also, geometrically distrubuted r.v. with parameter p has mean 1/p. So the average number of iterations should indeed be 360.
     
    Last edited: Oct 26, 2011
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook