Let's say the number 1 is picked 5% of the time, 2 is picked 4% of the time, etc to 50. That's 50 unknowns x1, x2, ..., x50. How do we get 50 equations to solve for these unknowns?
I'm not sure what you are learning from this. The question isn't directly how popular each individual number is but how popular different six-number combinations are. I've tended to use "numbers" above as shorthand for "combination of six numbers".Let's say the number 1 is picked 5% of the time, 2 is picked 4% of the time, etc to 50. That's 50 unknowns x1, x2, ..., x50. How do we get 50 equations to solve for these unknowns?
It's too little data. It's only a few hundred winning combinations as a sample of 15 million possibilities.The dependence between numbers played in a ticket must be very weak. Got some data for the frequencies of individual winning numbers and about to post a histogram, unfortunately it's not the numbers played, only the winning numbers which indirectly tell us what people tend to pick.
I suppose it depends a lot on to what extent you can trust the data! The numbers do look high, obviously. Are there any other restrictions that we don't know about?So we had 10,457,692,468 tickets sold (that is a lot!) and 465 winners. At 1 in 24,435,180 we would expect 428 winners. What is the statistical significance of 465 winners when 428 winners are expected in 1707 draws? I want a figure like those "p<0.0021" expressions in drug research.
If someone knew all the numbers in advance then it wouldn't make sense to buy multiple tickets for the same drawing. Too suspicious if the winners have some connection, and with just 5 million tickets you are likely to be the only winner anyway.Here's the winning numbers at that draw, played in 8 different tickets.
34 27 13 17 6 13
Surprise, it can't be birthday numbers. It's as if someone knew what would happen and bought the same combination 8 times to ensure he wouldn't have to share too much of the prize.
That's exactly what those histograms disprove, whatever effect there is it is very weak. Could give it a value if you want to, the sum of frequencies for numbers 1 to 30 is 66.48% while it should be 30 / 45 = 66.66%. They are a tiny bit LESS popular than higher numbers!Take into account that people favor some numbers
No.Isn't Poisson distribution the distribution of the time between wins?
That is true as well. A Poisson distribution with a large expectation value is approximately a Gaussian distribution.I thought it's a binomial distribution we've got here instead, approximated as gaussian.
Sure. In that case you need the additional information that the variance of the normal distribution is equal to the mean.Is it ok to NOT mention Poisson distribution at all and instead say that the number of tickets winning in the 16 years should follow a binomial distribution, which we approximate with a normal distribution like we did in my other question below?
There it is (bold added by me).Can't I just ignore that information and instead give the fact that the binomial distribution in
= 1 - BINOMDIST( M - 1 , N , 0.5, 1 )
is approximated by the normal distribution in
= 1 - NORMDIST( M - 1, N * 0.5, SQRT( N * 0.5 * (1-0.5) ), 1 )
where we'd replace 0.5 by 1/24,435,180 and use N = 10,457,692,468 and M = 465 ?
Why? Wouldn't a deviation in the other direction be equally suspicious?EDIT: just found the error. You're looking at the "|z| >" value but you should be looking at "z >".
Within the approximation from the Poisson distribution or normal distribution this doesn't matter. 464.5 should be slightly better.And because we want 465 or more, ie > 464, you should have calculated how many standard deviations 464 is from 428, not 465 from 428.
It is not incorrect. Check how you started the post (it is in the quote). You asked "can I ignore that, and just use [...]", but this "[...]" included the information you asked about.Why did you add the bold? To say it is incorrect?
A larger jackpot tends to attract more players, which means a larger profit for the organizers.A deviation in the opposite direction, it too few winning tickets, would not line the pockets of the organizers as easily because there are accountants auditing where the money goes when there is no win - it goes to the next draw.
Certainly don't use more than two significant figures. p=0.041 looks good, p=0.04 is not bad either. It is not small enough to claim fraud, especially as we know there are factors that make us underestimate the p-value.Another question is how many digits of this p = 0.040816379 result should we trust.