Statistics - data follows x distribution

Click For Summary

Homework Help Overview

The discussion revolves around determining the probability distribution that best fits a dataset consisting of 2500 integers, each being either 0, 1, or 2. The original poster presents the counts of each integer and explores various distributions including Poisson, binomial, and geometric, while questioning the appropriateness of the negative-binomial distribution.

Discussion Character

  • Exploratory, Assumption checking, Problem interpretation

Approaches and Questions Raised

  • The original poster calculates the mean and variance of the data and discusses the implications for different distributions. They express uncertainty about rejecting the Poisson distribution based on Chi-squared test results and explore the binomial distribution with a fixed sample size. Participants question the assumptions regarding the parameters n and p in the binomial context and discuss the implications of estimating these parameters.

Discussion Status

Participants are actively engaging with the problem, raising questions about the validity of their assumptions and the methods used for estimating parameters. Some guidance has been offered regarding the estimation of n and p, but there is no explicit consensus on the correct approach or distribution fit at this time.

Contextual Notes

There is a noted complexity in estimating parameters when both n and p are considered unknown, and participants are grappling with the implications of their chosen methods on the theoretical distributions derived from the data.

Deimantas
Messages
38
Reaction score
0

Homework Statement



The data contains 2500 integers, each is either a 0, 1 or 2:

zeroes: 1240
ones: 1014
two's: 246

Does the data follow Poisson, geometric, binomial or negative-binomial distribution?

Homework Equations





The Attempt at a Solution



The mean of the data is 0.6024 and the variance is 0.436314

Negative-binomial distribution is supposed to have greater variance than mean, so I only consider Poisson, binomial and geometric distributions.

Poisson is supposed to have it's mean equal to it's variance. I don't know if I should reject Poisson though, after using method of moments and setting λ=0.6024 I get these theoretical values of distribution:

zeroes:1369
ones:825
twos:248

It's not really that far off. However, Chi-squared test gives me a value of χ2≈55 which is very large and tells me the hypothesis that my data follows Poisson distribution should be rejected.
I tried generating random Poisson distribution values with λ=0.6024 and got

zeroes:1348
ones:880
twos:214

which gives χ2≈33. Closer, but still too large.

As for binomial distribution, using method of moments I get 2500*p=0.6024; p=0.000241
With this estimator, using theoretical formulas for calculating binomial probabilities I end up with these values:

zeroes:1369
ones:825
twos:248

These are identical to the theoretical Poisson distribution values. However, when I try to generate 2500 binomial distribution random values with p=0.000241 I get very different results, something like:

zeroes:1850
ones:531
twos:97

I don't really know why it differs.

Finally, geometric distribution. I really did not know what estimators I should use for this one. I tried using (1-p)/p=0.6024 which gives theoretical values of

zeroes:1506
ones:599
twos:238

The randomly generated values were very close to these, but it's quite far off from my data.

So, after all, I still have no clue which distribution does my data follow. Could you help me with that?
 
Physics news on Phys.org
Deimantas said:

Homework Statement



The data contains 2500 integers, each is either a 0, 1 or 2:

zeroes: 1240
ones: 1014
two's: 246

Does the data follow Poisson, geometric, binomial or negative-binomial distribution?

Homework Equations





The Attempt at a Solution



The mean of the data is 0.6024 and the variance is 0.436314

Negative-binomial distribution is supposed to have greater variance than mean, so I only consider Poisson, binomial and geometric distributions.

Poisson is supposed to have it's mean equal to it's variance. I don't know if I should reject Poisson though, after using method of moments and setting λ=0.6024 I get these theoretical values of distribution:

zeroes:1369
ones:825
twos:248

It's not really that far off. However, Chi-squared test gives me a value of χ2≈55 which is very large and tells me the hypothesis that my data follows Poisson distribution should be rejected.
I tried generating random Poisson distribution values with λ=0.6024 and got

zeroes:1348
ones:880
twos:214

which gives χ2≈33. Closer, but still too large.

As for binomial distribution, using method of moments I get 2500*p=0.6024; p=0.000241
With this estimator, using theoretical formulas for calculating binomial probabilities I end up with these values:

zeroes:1369
ones:825
twos:248

These are identical to the theoretical Poisson distribution values. However, when I try to generate 2500 binomial distribution random values with p=0.000241 I get very different results, something like:

zeroes:1850
ones:531
twos:97

I don't really know why it differs.

Finally, geometric distribution. I really did not know what estimators I should use for this one. I tried using (1-p)/p=0.6024 which gives theoretical values of

zeroes:1506
ones:599
twos:238

The randomly generated values were very close to these, but it's quite far off from my data.

So, after all, I still have no clue which distribution does my data follow. Could you help me with that?

For the binomial, why do you take n = 2500? Presumably, the data is a random sample of 2500 independent draws from a binomial distribution B(n,p), with both n and p unknown and to be estimated.
 
I don't think I could solve it with both n and p being unknown. I just took n=2500. Then it's easy to see that p=0.000241

Probablities of B(2500,0.000241):

probability of a 0: 0.547
probability of a 1: 0.33
probability of a 2: 0.099

then I multiply each probability by n=2500, to get these theoretical frequencies

zeroes: 1369
ones: 825
twos: 248

So the theoretical frequencies are somewhat close.

I don't think I'm supposed to consider both n and p unknown, because I have been only introduced to an exercise of finding p when n is known... So instead of saying that my sample is from some data that follows Binomial distribution, I declare that the sample itself is following Binomial distribution and take n as equal to 2500. So I guess what I'm doing here is wrong? How should I evaluate both n and p? I'd appreciate your input :)
 
Deimantas said:
I don't think I could solve it with both n and p being unknown. I just took n=2500. Then it's easy to see that p=0.000241

Probablities of B(2500,0.000241):

probability of a 0: 0.547
probability of a 1: 0.33
probability of a 2: 0.099

then I multiply each probability by n=2500, to get these theoretical frequencies

zeroes: 1369
ones: 825
twos: 248

So the theoretical frequencies are somewhat close.

I don't think I'm supposed to consider both n and p unknown, because I have been only introduced to an exercise of finding p when n is known... So instead of saying that my sample is from some data that follows Binomial distribution, I declare that the sample itself is following Binomial distribution and take n as equal to 2500. So I guess what I'm doing here is wrong? How should I evaluate both n and p? I'd appreciate your input :)

Yes, I would say it is wrong. You have a sample (numerical) mean and variance, and for bin(n,p) you have formulas for those in terms of n and p.
 
Hello,

If I estimate n and p using formulas
mean=n*p
variance=n*p*(1-p)
it yields p=0.275707172 and n=2.18492684. I don't see a way to use this, sample n is 2500, so this estimate of n would be nonsense. I've read more about estimating binomial parameters when both n and p are unknown, and using other (more complex) formulas I still get p≈0.29 and n≈2.2. What's wrong?

Another question:
With n=2500 and p=0.00028, probabilities of x=0,1,2 would be
0: 0.4965366317
1: 0.347673
2: 0.1216709
each probability multiplied by n=2500, I get approximately
0: 1241
1: 869
2: 304
However, when I generate random binomial numbers with the SAME parameters, I get results that are VERY different. For example
0: 1857
1: 495
2: 113
Why?
 
Deimantas said:
Hello,

If I estimate n and p using formulas
mean=n*p
variance=n*p*(1-p)
it yields p=0.275707172 and n=2.18492684. I don't see a way to use this, sample n is 2500, so this estimate of n would be nonsense. I've read more about estimating binomial parameters when both n and p are unknown, and using other (more complex) formulas I still get p≈0.29 and n≈2.2. What's wrong?

Another question:
With n=2500 and p=0.00028, probabilities of x=0,1,2 would be
0: 0.4965366317
1: 0.347673
2: 0.1216709
each probability multiplied by n=2500, I get approximately
0: 1241
1: 869
2: 304
However, when I generate random binomial numbers with the SAME parameters, I get results that are VERY different. For example
0: 1857
1: 495
2: 113
Why?

No, what you are doing is nonsense.

The best estimate of n from the data is n = 2; anyway, this matches the fact that each experiment yields one of only three outcomes (k = 0,1,2 in the binomial with n = 2). If you used a binomial with n = 2500, you would absolutely have to find some outcomes > 2.

I will repeat once more: you seem to have a binomial with n = 2, and with some p that you can give a best estimate for. You are looking at a random sample of 2500 independent draws from that binomial.

I will not repeat myself any more; at this point I am out of here.
 
  • Like
Likes   Reactions: 1 person
Ray Vickson said:
I will repeat once more: you seem to have a binomial with n = 2, and with some p that you can give a best estimate for. You are looking at a random sample of 2500 independent draws from that binomial.

Thanks for your help, I now realize my perception of binomial distribution was a bit off. This made me understand.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 21 ·
Replies
21
Views
2K
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 30 ·
2
Replies
30
Views
5K