Why my random experiment has a log normal distribution?

Click For Summary

Discussion Overview

The discussion centers around a simulation experiment that generates a log-normally distributed output when randomly selecting letters. Participants explore the nature of the distribution resulting from the experiment, questioning why the output does not align with expected normal distribution characteristics.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes a simulation that randomly selects six letters and notes that the results appear log-normally distributed, despite expecting a normal distribution with an average of 360.
  • Another participant suggests that the actual distribution should be geometric, with a probability of success calculated as p=1/6^6, and notes that a small sample size could resemble a lognormal distribution superficially.
  • A different participant asserts that neither lognormal nor normal distributions can be obtained from the discrete random variable defined in the experiment, emphasizing that it should yield a geometric distribution instead.
  • One participant confirms the understanding of the random variable as the number of failures before the first success, which is defined as obtaining the specific order of letters.
  • Further calculations are provided to explain the probability of obtaining the specific order, leading to a discussion about the relationship between this probability and the distribution's mean.
  • Another participant references the geometric distribution's properties, including the mean being the inverse of the probability of success, suggesting that the average number of iterations should indeed be 360.

Areas of Agreement / Disagreement

Participants express disagreement regarding the nature of the distribution. While some argue for a geometric distribution based on the setup of the experiment, others question the initial observation of a log-normal distribution, leading to an unresolved discussion about the correct interpretation of the results.

Contextual Notes

There are limitations in the discussion regarding the assumptions made about the distribution types and the implications of using a discrete random variable in a context typically associated with continuous distributions.

musicgold
Messages
303
Reaction score
19
Hi,

I am confused with the results of a seemingly simple simulation that is generating a log normally distributed output. Please see the attached results file.

Simulation: I have built a Scratch program that randomly picks six letters from a group of six letters (A, B, C, D, E & E). The program displays the order in which the letters have been picked. I am interested in finding out how many iterations the program takes to get a specific order of letters (say, EEDCBA).

I repeated this experiment 100 times and I was surprised to see log normally distributed results. I was hoping to see a normal distribution with an average of 360.

Can someone please explain what is going on?

Thanks,
 

Attachments

Physics news on Phys.org
The actual distribution should be geometric with p=1/6^6 (i.e. counting the number of failures before a success).

I guess a smallish sample (size 100) would superficially resemble lognormal.
 
I assume you understand, that you cannot actually get neither lognormal nor normal distribution, as they are continuous, and your r.v. is discrete.

If I understood your description, then your r.v. is just "the number of failures, before first success", where success is getting "EEDCBA" and trials are independent, right? In this case what you should get is the geometric distribution.

P.S. I can't open your excel file, so can't give you details of what it's doing wrong
 
Thanks folks.

For those who are not able to open my excel file, I have attached a text file with my results.

If I understood your description, then your r.v. is just "the number of failures, before first success", where success is getting "EEDCBA" and trials are independent, right?
That is correct.

Also, I got the 360 as follows: Prob of getting E in the first place = 2/6, prob of getting the second E in the second place = 1/5, prob of getting D in the third place = 1/4...and so on.


probability of getting EEDCBA = 2/6 * 1/5* 1/4* 1/3 * 1/2 * 1 = 1/360
How is this number related to the distribution? Is it the mean of the distribution?

Also, can you please point to me a source where I can read more about this? I am not sure why I should get a geometric distribution.
 

Attachments

Last edited:
musicgold said:
Thanks folks.
How is this number related to the distribution? Is it the mean of the distribution?

Also, can you please point to me a source where I can read more about this? I am not sure why I should get a geometric distribution.

p = 1/360 is the probability of success. Look at wikipedia article on geometric distribution: it says "geometric distribution [...] is the probability distribution of the number X of Bernoulli trials needed to get one success". Bernoulli trial means a trial which can have only two outcomes: 1 or 0 (or true/false, success/failure etc).

Also, if p is the probability of success, then 1-p is probability of failure. In order to get (first) success on k-th trial (iteration), you need to fail k-1 times in a row and then have a success, thus [tex]P(X=k) = (1-p)^{k-1}p,[/tex] which is exactly the pmf of geometric distribution.

Also, geometrically distrubuted r.v. with parameter p has mean 1/p. So the average number of iterations should indeed be 360.
 
Last edited:

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 11 ·
Replies
11
Views
4K
Replies
3
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K