1. Not finding help here? Sign up for a free 30min tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Statistics - data follows x distribution

  1. Mar 11, 2014 #1
    1. The problem statement, all variables and given/known data

    The data contains 2500 integers, each is either a 0, 1 or 2:

    zeroes: 1240
    ones: 1014
    two's: 246

    Does the data follow Poisson, geometric, binomial or negative-binomial distribution?

    2. Relevant equations



    3. The attempt at a solution

    The mean of the data is 0.6024 and the variance is 0.436314

    Negative-binomial distribution is supposed to have greater variance than mean, so I only consider Poisson, binomial and geometric distributions.

    Poisson is supposed to have it's mean equal to it's variance. I don't know if I should reject Poisson though, after using method of moments and setting λ=0.6024 I get these theoretical values of distribution:

    zeroes:1369
    ones:825
    twos:248

    It's not really that far off. However, Chi-squared test gives me a value of χ2≈55 which is very large and tells me the hypothesis that my data follows Poisson distribution should be rejected.
    I tried generating random Poisson distribution values with λ=0.6024 and got

    zeroes:1348
    ones:880
    twos:214

    which gives χ2≈33. Closer, but still too large.

    As for binomial distribution, using method of moments I get 2500*p=0.6024; p=0.000241
    With this estimator, using theoretical formulas for calculating binomial probabilities I end up with these values:

    zeroes:1369
    ones:825
    twos:248

    These are identical to the theoretical Poisson distribution values. However, when I try to generate 2500 binomial distribution random values with p=0.000241 I get very different results, something like:

    zeroes:1850
    ones:531
    twos:97

    I don't really know why it differs.

    Finally, geometric distribution. I really did not know what estimators I should use for this one. I tried using (1-p)/p=0.6024 which gives theoretical values of

    zeroes:1506
    ones:599
    twos:238

    The randomly generated values were very close to these, but it's quite far off from my data.

    So, after all, I still have no clue which distribution does my data follow. Could you help me with that?
     
  2. jcsd
  3. Mar 11, 2014 #2

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    For the binomial, why do you take n = 2500? Presumably, the data is a random sample of 2500 independent draws from a binomial distribution B(n,p), with both n and p unknown and to be estimated.
     
  4. Mar 11, 2014 #3
    I don't think I could solve it with both n and p being unknown. I just took n=2500. Then it's easy to see that p=0.000241

    Probablities of B(2500,0.000241):

    probability of a 0: 0.547
    probability of a 1: 0.33
    probability of a 2: 0.099

    then I multiply each probability by n=2500, to get these theoretical frequencies

    zeroes: 1369
    ones: 825
    twos: 248

    So the theoretical frequencies are somewhat close.

    I don't think I'm supposed to consider both n and p unknown, because I have been only introduced to an exercise of finding p when n is known... So instead of saying that my sample is from some data that follows Binomial distribution, I declare that the sample itself is following Binomial distribution and take n as equal to 2500. So I guess what I'm doing here is wrong? How should I evaluate both n and p? I'd appreciate your input :)
     
  5. Mar 11, 2014 #4

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    Yes, I would say it is wrong. You have a sample (numerical) mean and variance, and for bin(n,p) you have formulas for those in terms of n and p.
     
  6. Mar 17, 2014 #5
    Hello,

    If I estimate n and p using formulas
    mean=n*p
    variance=n*p*(1-p)
    it yields p=0.275707172 and n=2.18492684. I don't see a way to use this, sample n is 2500, so this estimate of n would be nonsense. I've read more about estimating binomial parameters when both n and p are unknown, and using other (more complex) formulas I still get p≈0.29 and n≈2.2. What's wrong?

    Another question:
    With n=2500 and p=0.00028, probabilities of x=0,1,2 would be
    0: 0.4965366317
    1: 0.347673
    2: 0.1216709
    each probability multiplied by n=2500, I get approximately
    0: 1241
    1: 869
    2: 304
    However, when I generate random binomial numbers with the SAME parameters, I get results that are VERY different. For example
    0: 1857
    1: 495
    2: 113
    Why?
     
  7. Mar 17, 2014 #6

    Ray Vickson

    User Avatar
    Science Advisor
    Homework Helper

    No, what you are doing is nonsense.

    The best estimate of n from the data is n = 2; anyway, this matches the fact that each experiment yields one of only three outcomes (k = 0,1,2 in the binomial with n = 2). If you used a binomial with n = 2500, you would absolutely have to find some outcomes > 2.

    I will repeat once more: you seem to have a binomial with n = 2, and with some p that you can give a best estimate for. You are looking at a random sample of 2500 independent draws from that binomial.

    I will not repeat myself any more; at this point I am out of here.
     
  8. Mar 17, 2014 #7
    Thanks for your help, I now realise my perception of binomial distribution was a bit off. This made me understand.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted