Mean time between lottery wins and probability of fraud by organizers

Click For Summary

Discussion Overview

The discussion revolves around the statistical analysis of lottery wins, particularly focusing on the mean time between wins and the potential for fraud by lottery organizers. Participants explore mathematical models to investigate the observed distribution of time between wins, the implications of varying ticket sales, and the probability of fraud based on these observations.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants suggest that the regularity of wins every 2 or 3 weeks raises suspicions of fraud, particularly if the prize accumulates without winners.
  • Others argue that the expected mean and standard deviation of time between wins cannot be determined without knowing the number of tickets sold (K), which may fluctuate.
  • One participant proposes deriving a formula based on a fixed K to analyze the observed mean time of 2.5 weeks and its standard deviation.
  • There is a discussion about the number of combinations for lottery numbers, specifically (50 choose 5), and how this affects the expected frequency of winners.
  • Some participants mention using the binomial distribution to calculate the probability of at least one winner each week, emphasizing the need for a constant K and independent number selection.
  • Concerns are raised about the sample size needed to accurately calculate the probability of fraud, with suggestions that a year's worth of data would be more informative than just a few weeks.
  • Discussions also touch on the concept of statistical significance and how it relates to the probability of observed outcomes occurring by chance.
  • Participants debate the interpretation of p-values in the context of fraud detection versus medical testing, highlighting the difference between P(D|H) and P(H|D).
  • There is mention of Bayesian methods as a potential approach to analyze the probability of fraud in relation to observed outcomes.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the probability of fraud or the appropriate methods to analyze the lottery data. Multiple competing views remain regarding the assumptions about ticket sales and the implications of observed win patterns.

Contextual Notes

Limitations include the dependence on the fluctuating values of K, the assumptions about player behavior, and the need for a larger sample size to draw meaningful conclusions about the probability of fraud.

  • #61
The histograms show no preference for specific numbers but they don’t show preferences for specific combinations.

The variance of a Poisson distribution is the same as its mean, the standard deviation is the square root of the variance.
 
Physics news on Phys.org
  • #62
I can't reproduce that 465 winners number. How did you calculate it?
 
  • #63
I just summed the entries in the second column in the long table. I get the same result if I multiply the second and third row in the first table and then sum the products.

Here an xls file, that is more convenient than the text file.
 

Attachments

  • #64
The variance of a Poisson distribution is the same as its mean

Isn't Poisson distribution the distribution of the time between wins? I thought it's a binomial distribution we've got here instead, approximated as gaussian.
 
Last edited:
  • #65
Jonathan212 said:
Isn't Poisson distribution the distribution of the time between wins?
No.
I thought it's a binomial distribution we've got here instead, approximated as gaussian.
That is true as well. A Poisson distribution with a large expectation value is approximately a Gaussian distribution.
 
  • #66
Greetings. I'm intending to write this up for a non-expert high-school-level audience. Complete with links for explanations like the origin of "(45 choose 5)", why we look at a normal distribution, etc. But there is one point I haven't yet understood myself. Is it ok to NOT mention Poisson distribution at all and instead say that the number of tickets winning in the 16 years should follow a binomial distribution, which we approximate with a normal distribution like we did in my other question below?

https://www.physicsforums.com/threads/probability-that-1000-coin-flips-results-in-600-tails.965579/
 
  • #67
Jonathan212 said:
Is it ok to NOT mention Poisson distribution at all and instead say that the number of tickets winning in the 16 years should follow a binomial distribution, which we approximate with a normal distribution like we did in my other question below?
Sure. In that case you need the additional information that the variance of the normal distribution is equal to the mean.
 
  • #68
Can't I just ignore that information and instead give the fact that the binomial distribution in

= 1 - BINOMDIST( M - 1 , N , 0.5, 1 )

is approximated by the normal distribution in

= 1 - NORMDIST( M - 1, N * 0.5, SQRT( N * 0.5 * (1-0.5) ), 1 )

where we'd replace 0.5 by 1/24,435,180 and use N = 10,457,692,468 and M = 465 ?

Then the statistical significance of the M = 465 wins (ie the probability of 465 wins or more) is

p = 1 - NORMDIST( 465 - 1, 427.9768951, SQRT( 427.9768776 ), 1 )

p = 0.040816379

That's not the same as your p=0.073 result in #59. Am I doing something wrong?

EDIT: just found the error. You're looking at the "|z| >" value but you should be looking at "z >". And because we want 465 or more, ie > 464, you should have calculated how many standard deviations 464 is from 428, not 465 from 428. That's 1.74129038 standard deviations and we get the same result at z > 1.74129.
 
Last edited:
  • #69
In drug research the results are stated like this: p<0.01. How can we do the same in this problem? Ie how can we establish an upper bound for p given that the normal we're looking at is only an approximation to the binomial?
 
Last edited:
  • #70
Is there any site where you can calculate extreme binomial integrals like this one without the normal approximation?

= 1 - BINOMDIST( 465 - 1, 10457692468, 1/24435180, 1 )
 
  • #71
Jonathan212 said:
Can't I just ignore that information and instead give the fact that the binomial distribution in

= 1 - BINOMDIST( M - 1 , N , 0.5, 1 )

is approximated by the normal distribution in

= 1 - NORMDIST( M - 1, N * 0.5, SQRT( N * 0.5 * (1-0.5) ), 1 )

where we'd replace 0.5 by 1/24,435,180 and use N = 10,457,692,468 and M = 465 ?
There it is (bold added by me).
Jonathan212 said:
EDIT: just found the error. You're looking at the "|z| >" value but you should be looking at "z >".
Why? Wouldn't a deviation in the other direction be equally suspicious?
Jonathan212 said:
And because we want 465 or more, ie > 464, you should have calculated how many standard deviations 464 is from 428, not 465 from 428.
Within the approximation from the Poisson distribution or normal distribution this doesn't matter. 464.5 should be slightly better.

WolframAlpha can calculate some extreme values. Check individual parts - you'll see the approximation is a *really* good one here.
 
  • #72
Why did you add the bold? To say it is incorrect? This is the formula we derived in the other thread for an identical problem with different N, M and probability. EDIT: it matches WolframAlpha perfectly too, if you type it in Excel.

A deviation in the opposite direction, it too few winning tickets, would not line the pockets of the organizers as easily because there are accountants auditing where the money goes when there is no win - it goes to the next draw.
 
Last edited:
  • #73
Another question is how many digits of this p = 0.040816379 result should we trust. Should the statistical significance be shown as "p < 0.05"?
 
  • #74
Jonathan212 said:
Why did you add the bold? To say it is incorrect?
It is not incorrect. Check how you started the post (it is in the quote). You asked "can I ignore that, and just use [...]", but this "[...]" included the information you asked about.
Jonathan212 said:
A deviation in the opposite direction, it too few winning tickets, would not line the pockets of the organizers as easily because there are accountants auditing where the money goes when there is no win - it goes to the next draw.
A larger jackpot tends to attract more players, which means a larger profit for the organizers.
Jonathan212 said:
Another question is how many digits of this p = 0.040816379 result should we trust.
Certainly don't use more than two significant figures. p=0.041 looks good, p=0.04 is not bad either. It is not small enough to claim fraud, especially as we know there are factors that make us underestimate the p-value.
 
  • #75
Does a question like "what is the probability that the organizers have never cheated by adding a winner after a draw?" make sense mathematically?
 
  • #76
Lottery wins may not be analyzed assuming they are a fair game.

Winnings are not allowed to happen randomly because the innumerate general public would misinterpret that as fraud. The lottery commissions use internal secret algorithms to ensure that the distribution of locations and dates of wins meet the appearance of what the general public assumes is randomness by suppressing variance and fluctuations in order to get a more balanced spread of winning locations and times avoiding unfair looking distributions where locations win too much or too little.

Methodically thinking about the optimum algorithm and process by which the lottery commission might ensure this controlled pseudorandom distribution of wins, Joan Ginther*, former math professor with a PhD from Stanford University specializing in statistics, won four Texas lotteries (total over $20 million).

"The Luckiest Woman on Earth", Harper's Magazine AUG-2011
 

Similar threads

  • · Replies 25 ·
Replies
25
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 3 ·
Replies
3
Views
4K
Replies
2
Views
3K