Statistics - data follows x distribution

Deimantas · Mar 11, 2014

Homework Statement

The data contains 2500 integers, each is either a 0, 1 or 2:

zeroes: 1240
ones: 1014
two's: 246

Does the data follow Poisson, geometric, binomial or negative-binomial distribution?

Homework Equations

The Attempt at a Solution

The mean of the data is 0.6024 and the variance is 0.436314

Negative-binomial distribution is supposed to have greater variance than mean, so I only consider Poisson, binomial and geometric distributions.

Poisson is supposed to have it's mean equal to it's variance. I don't know if I should reject Poisson though, after using method of moments and setting λ=0.6024 I get these theoretical values of distribution:

zeroes:1369
ones:825
twos:248

It's not really that far off. However, Chi-squared test gives me a value of χ²≈55 which is very large and tells me the hypothesis that my data follows Poisson distribution should be rejected.
I tried generating random Poisson distribution values with λ=0.6024 and got

zeroes:1348
ones:880
twos:214

which gives χ²≈33. Closer, but still too large.

As for binomial distribution, using method of moments I get 2500*p=0.6024; p=0.000241
With this estimator, using theoretical formulas for calculating binomial probabilities I end up with these values:

zeroes:1369
ones:825
twos:248

These are identical to the theoretical Poisson distribution values. However, when I try to generate 2500 binomial distribution random values with p=0.000241 I get very different results, something like:

zeroes:1850
ones:531
twos:97

I don't really know why it differs.

Finally, geometric distribution. I really did not know what estimators I should use for this one. I tried using (1-p)/p=0.6024 which gives theoretical values of

zeroes:1506
ones:599
twos:238

The randomly generated values were very close to these, but it's quite far off from my data.

So, after all, I still have no clue which distribution does my data follow. Could you help me with that?

Ray Vickson · Mar 11, 2014

Deimantas said:

Homework Statement

The data contains 2500 integers, each is either a 0, 1 or 2:

zeroes: 1240
ones: 1014
two's: 246

Does the data follow Poisson, geometric, binomial or negative-binomial distribution?

Homework Equations

The Attempt at a Solution

The mean of the data is 0.6024 and the variance is 0.436314

Negative-binomial distribution is supposed to have greater variance than mean, so I only consider Poisson, binomial and geometric distributions.

Poisson is supposed to have it's mean equal to it's variance. I don't know if I should reject Poisson though, after using method of moments and setting λ=0.6024 I get these theoretical values of distribution:

zeroes:1369
ones:825
twos:248

It's not really that far off. However, Chi-squared test gives me a value of χ²≈55 which is very large and tells me the hypothesis that my data follows Poisson distribution should be rejected.
I tried generating random Poisson distribution values with λ=0.6024 and got

zeroes:1348
ones:880
twos:214

which gives χ²≈33. Closer, but still too large.

As for binomial distribution, using method of moments I get 2500*p=0.6024; p=0.000241
With this estimator, using theoretical formulas for calculating binomial probabilities I end up with these values:

zeroes:1369
ones:825
twos:248

These are identical to the theoretical Poisson distribution values. However, when I try to generate 2500 binomial distribution random values with p=0.000241 I get very different results, something like:

zeroes:1850
ones:531
twos:97

I don't really know why it differs.

Finally, geometric distribution. I really did not know what estimators I should use for this one. I tried using (1-p)/p=0.6024 which gives theoretical values of

zeroes:1506
ones:599
twos:238

The randomly generated values were very close to these, but it's quite far off from my data.

So, after all, I still have no clue which distribution does my data follow. Could you help me with that?

For the binomial, why do you take n = 2500? Presumably, the data is a random sample of 2500 independent draws from a binomial distribution B(n,p), with both n and p unknown and to be estimated.

Deimantas · Mar 11, 2014

I don't think I could solve it with both n and p being unknown. I just took n=2500. Then it's easy to see that p=0.000241

Probablities of B(2500,0.000241):

probability of a 0: 0.547
probability of a 1: 0.33
probability of a 2: 0.099

then I multiply each probability by n=2500, to get these theoretical frequencies

zeroes: 1369
ones: 825
twos: 248

So the theoretical frequencies are somewhat close.

I don't think I'm supposed to consider both n and p unknown, because I have been only introduced to an exercise of finding p when n is known... So instead of saying that my sample is from some data that follows Binomial distribution, I declare that the sample itself is following Binomial distribution and take n as equal to 2500. So I guess what I'm doing here is wrong? How should I evaluate both n and p? I'd appreciate your input :)

Ray Vickson · Mar 11, 2014

Deimantas said:

I don't think I could solve it with both n and p being unknown. I just took n=2500. Then it's easy to see that p=0.000241

Probablities of B(2500,0.000241):

probability of a 0: 0.547
probability of a 1: 0.33
probability of a 2: 0.099

then I multiply each probability by n=2500, to get these theoretical frequencies

zeroes: 1369
ones: 825
twos: 248

So the theoretical frequencies are somewhat close.

I don't think I'm supposed to consider both n and p unknown, because I have been only introduced to an exercise of finding p when n is known... So instead of saying that my sample is from some data that follows Binomial distribution, I declare that the sample itself is following Binomial distribution and take n as equal to 2500. So I guess what I'm doing here is wrong? How should I evaluate both n and p? I'd appreciate your input :)

Yes, I would say it is wrong. You have a sample (numerical) mean and variance, and for bin(n,p) you have formulas for those in terms of n and p.

Deimantas · Mar 17, 2014

Hello,

If I estimate n and p using formulas
mean=n*p
variance=n*p*(1-p)
it yields p=0.275707172 and n=2.18492684. I don't see a way to use this, sample n is 2500, so this estimate of n would be nonsense. I've read more about estimating binomial parameters when both n and p are unknown, and using other (more complex) formulas I still get p≈0.29 and n≈2.2. What's wrong?

Another question:
With n=2500 and p=0.00028, probabilities of x=0,1,2 would be
0: 0.4965366317
1: 0.347673
2: 0.1216709
each probability multiplied by n=2500, I get approximately
0: 1241
1: 869
2: 304
However, when I generate random binomial numbers with the SAME parameters, I get results that are VERY different. For example
0: 1857
1: 495
2: 113
Why?

Ray Vickson · Mar 17, 2014

Deimantas said:

Hello,

If I estimate n and p using formulas
mean=n*p
variance=n*p*(1-p)
it yields p=0.275707172 and n=2.18492684. I don't see a way to use this, sample n is 2500, so this estimate of n would be nonsense. I've read more about estimating binomial parameters when both n and p are unknown, and using other (more complex) formulas I still get p≈0.29 and n≈2.2. What's wrong?

Another question:
With n=2500 and p=0.00028, probabilities of x=0,1,2 would be
0: 0.4965366317
1: 0.347673
2: 0.1216709
each probability multiplied by n=2500, I get approximately
0: 1241
1: 869
2: 304
However, when I generate random binomial numbers with the SAME parameters, I get results that are VERY different. For example
0: 1857
1: 495
2: 113
Why?

No, what you are doing is nonsense.

The best estimate of n from the data is n = 2; anyway, this matches the fact that each experiment yields one of only three outcomes (k = 0,1,2 in the binomial with n = 2). If you used a binomial with n = 2500, you would absolutely have to find some outcomes > 2.

I will repeat once more: you seem to have a binomial with n = 2, and with some p that you can give a best estimate for. You are looking at a random sample of 2500 independent draws from that binomial.

I will not repeat myself any more; at this point I am out of here.

Deimantas · Mar 17, 2014

Ray Vickson said:

I will repeat once more: you seem to have a binomial with n = 2, and with some p that you can give a best estimate for. You are looking at a random sample of 2500 independent draws from that binomial.

Thanks for your help, I now realize my perception of binomial distribution was a bit off. This made me understand.

Statistics - data follows x distribution

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

Similar threads

Hot Threads

Prove that the integral is equal to ##\pi^2/8##

Solving the wave equation with piecewise initial conditions

Area of loop in x-y plane

Calculating radius of gyration of plane figure about x-axis

Solve this problem that involves induction

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective