Mean time between lottery wins and probability of fraud by organizers

In summary, it seems that a lottery where you pick 5 numbers out of the set (1,2, ..., 50) is being rigged in a way where people are winning too often. It is not clear how to investigate this, but it is probably a fraud.
  • #1
Jonathan212
198
4
Looked at some lottery wins and something was fishy. This a lottery where you pick 5 numbers out of the set (1,2, ..., 50). When no one wins, the money goes to the next iteration of the game so the prize gets bigger and bigger. It seemed that a win was too regular around every 2 or 3 weeks and never occurred in consecutive draws. As if people were waiting for the money to accumulate, which is probably true but there is another possibility also, and that is fraud by the organizers: too few people play, no wins occur at all for ages, and because this demotivates players and could reduce sales to a possible collapse, the organizers cheat and win the prize themselves periodically. How would you investigate this mathematically based on the observed distribution of time between wins? What is the expected mean and standard deviation of the time between (full-match) wins? Given the number of 5-number sets played in each iteration: K = 100,000 and iterations per week = 2.

The observed time between wins seems to have a sharp distribution with a mean around 2.5 weeks and this is fishy. Based on an observation like this, what is the probability that the organizers cheat?

What should K be to match the observed mean time between wins?
 
Last edited:
Physics news on Phys.org
  • #2
Jonathan212 said:
What is the expected mean and standard deviation of the time between (full-match) wins?
I don’t think the answer can be determined without knowing the number of tickets sold. If K fluctuates then it would be complicated.
 
  • #3
Sure, just pretending we know K and it is fixed, just to derive the formula. Then stick in some real numbers. Or start from the 2.5 weeks and derive a fixed K, then work out standard deviation around the 2.5 weeks and compare this with observed deviation.
 
  • #4
There are (50 choose 5) options to pick lottery numbers, on average you expect a jackpot winner every (50 choose 5)/K weeks (shared jackpots count as multiple winners). Plug in numbers and see if they look realistic. The assumption of a constant number of players is very unrealistic, however. Larger jackpots attract more players.

Sometimes so "many players" that they change the lottery (long article, skip to the first mention of "Winfall" for the lottery part).
 
  • #5
What's (50 choose 5) short for?
 
  • #7
You can use the binomial distribution to calculate the number of likely winners each week. For that you need the number of players (K) and the probability of each player winning (1/[50 choose 5]). Then the probability of at least 1 person winning is 1 - the probability of 0 people winning. That latter probability is exactly what the binomial distribution gives.

That gives you the probability of a winner each draw, call that P. The probability that someone wins in the Nth week is then ##(1-P)^{N-1}P##

However, if you detect a significant discrepancy from this model then it is not an indication of fraud. This model assumes K is constant and that the players number selection is independent. Neither of those is an indication of fraud
 
  • #8
How many values of K and results do you need to calculate probability of fraud with decent accuracy?
 
Last edited:
  • #9
Do you know how many people play each time?

Your question is too broad to answer.
 
  • #10
Jonathan212 said:
How many values of K and results do you need to calculate probability of fraud with decent accuracy?
I don’t think that is possible. What you can calculate is the probability of the observed periods between wins according to the model that K is constant and all lottery picks are random. That will undoubtedly be some very low probability. But as we mentioned above there are many ways that model could be wrong besides fraud.
 
  • #11
K is not constant we said. It is given for each week. 100,123 the first week, 192,321 the second week, 255,233 the third week etc. Just 3 weeks is way too small a sample. But a year's worth, maybe. Presented with 50 values of K and 50 values of results (=number of winners this week), what is the probability that this happened by chance? If all K's were of the order of 100-1000, the probability that a win occurs every week is close to 0. A win once a year is probably close to 0 too. But with higher K's we need a mathematician.

Reminds me of drug testing against placebos. Someone must have heard of "statistical significance". It's figures like "< 0.001". Looks like a probability.
 
  • #12
Jonathan212 said:
Presented with 50 values of K and 50 values of results (=number of winners this week), what is the probability that this happened by chance?
No matter what K is it will be tiny because it is not the right question.
An equivalent question: Given a sequence of 20 coin tosses (HHTHTHTTHHHTTHTHHTHH), what is the probability that this happened? 1/220 or about 1 in a million. Should we be surprised by this particular result? No. All ~1 million possible sequences have this probability and the one I selected is nothing special.

What you need is the probability "this result or more extreme" where "more extreme" is to be defined. In the coin toss example you could ask "I got 15 times heads, how likely is it that I get 15 or more times the same result?" For the lottery you could consider the total number of wins: How many do you expect given the number of players, how many times did someone win? How likely is it to get so few wins or even fewer? So many or even more?
If you want to look for a pattern of "there is a winner if the jackpot is high" then it gets more complicated to define what we are looking for. This has to be done before analyzing the actual results, otherwise you might bias yourself by selecting a question specifically to find something unusual.
 
  • #13
Jonathan212 said:
Looks like a probability.
It is a probability. It is the probability that data this “extreme” occurred by chance given your data generating model. This is not the same as the probability that fraud was committed.
 
  • #14
This is exactly what they do with drugs in preliminary studies, they repeat the experiment 50 times or so and look at the results, just like we look at results of the lottery after the event. An extreme drug outcome would be all lab rats are cured. An extreme lottery outcome would be a single win every single week while K varies from 100 to 1000, which would be fishy as hell. Less extreme lab outcome, 60% of rats cured. Less extreme lottery outcome, you name it. Probability of fraud is a very realistic target, just like probability that the drug is NOT useless.
 
Last edited:
  • #15
Let's say a coin is a magnet and you throw it on a table with a huge but weak magnet underneath whereby the north pole faces upwards. We expect more of one face. 100 tosses with 60 heads, is a better result that 10 tosses with 6 heads if you were to bet your money where the dice's north pole is after the event. Or you don't know if there's any magnet involved, you don't know if there is a fraud. What is the probability of fraud with 6 heads and what is it with 60 heads?
 
  • #16
Jonathan212 said:
Probability of fraud is a very realistic target, just like probability that the drug is NOT useless.
That is not the probability that is measured. In medical testing the p value you are talking about is not the probability that the drug has no effect. It is the probability that the data would be that extreme given that the drug has no effect.

If you are familiar with probability notation, a p value gives you P(D|H) which in words is the probability of the data given your hypothesis.

What you are asking about is the opposite. P(H|D) is the probability of the hypothesis given the data. You would need Bayesian methods for that.
 
  • #17
Oopsa, I wrote "NOT useless" but I meant useless for the purposes tested. What's the Bayesian approach to the magnetic coin?
 
  • #18
Jonathan212 said:
What's the Bayesian approach to the magnetic coin?
Here is a good tutorial on the topic.

https://www.quantstart.com/articles...a-Binomial-Proportion-The-Analytical-Approach
One important concept in Bayesian statistics is the idea of a prior probability. It is a mathematical expression of your beliefs before looking at the data. So, in this case, do we go in assuming that this coin is probably like most coins or do we come in suspicious that this coin may not be typical?

Whatever our prior beliefs are, we express it as a beta distributed random variable, ##Beta(\alpha,\beta)##. Then, after we do the experiment we update our posterior beliefs as ##Beta(\alpha+n,\beta+m)## where n is the number of heads and m is the number of tails.
 
  • #19
If we have no beliefs, no assumptions like "the coin is magnetic" or "it is biased to produce heads 65% of the time", is the bayesian approach hopeless?
 
Last edited:
  • #20
Jonathan212 said:
If we have no beliefs, no assumptions like "the coin is magnetic" or "it is biased to produce heads 65% of the time", is the bayesian approach hopeless?

You could start with the belief that the lottery is a fraud and test that hypothesis. Then you would have to describe the data that would support your hypothesis and test for that. What data would indicate a fraud?

1) Your first claim is that the organisers are paying the winnings to themselves.

There would be no data as such to support this. Instead, you would need to investigate the list of winners and "follow the money" as they say. I suggest you pass any information you have on this to the police in your country.

2) Your second claim is that the organisers are controlling the weeks on which a win takes place. a) that wins are rare on the first week of a cycle; b) that wins are too frequent on the second week of a cycle; c) that wins are too frequent on the third week of a cycle.

This should be easy to test if you have access to the number of tickets bought every week. All you really need to look at is how often a win takes place each week of a cycle and whether this is consistent with the number of tickets bought on those weeks.

Note, however, that as others have said: if you study a set of data looking for any statistical anomalies and then test for those, then that is a meaningless approach. Instead, you should have a good idea of what you want to test before you look at the data.
 
  • #21
"Your first claim is that the organisers are paying the winnings to themselves. There would be no data as such to support this."

Of course there would be data to support the accusation if they paid the winnings to themselves TOO frequently or too anomalously, it probably wouldn't be evidence good enough for court but it would be evidence good enough for us mathematically oriented guys and anyone who would care to check our calculations. Btw, it is extremely easy for the organizers to pay the winnings to themselves if they wanted to, except they wouldn't put it in their... tax return, the money would be won by a thug of theirs and laundered and spent through offshore accounts. The state could easily prevent any fraud simply by forcing them to give the police a complete list of numbers played each week, so a cop could search for the winning numbers after each draw and the organizers would not be able to subsequently add tickets which is what we are accusing them of here.

"Your second claim is that the organizers are controlling the weeks on which a win takes place."

Yes but by paying the winnings to their thugs on those weeks, I don't know why you consider this a separate claim. By the way, wins are too frequent near the 5th draw after a win (2 draws per week as I said initially but let's keep it simple and pretend it's one draw per week, a win every 5 weeks). Such a peak is expected as the amount to be won accumulates and more and more tickets are sold. But it may occur too early if not enough people play and the organizers cheat to prevent demotivation and a collapse of sales.
 
Last edited:
  • #22
PeroK said:
Instead, you should have a good idea of what you want to test before you look at the data.
That won't be perfect as the suspicion for fraud comes from that data already. Using future data only would be perfect but that would take a long time.
50 draws on record still leave a lot of room to detect fraud if it is too obvious.

I suggest the following two tests:
- Sum all K for drawings in drawing 1 to 3. The number of winners should be a Poisson distribution. Calculate the probability that there are as many or more winners as observed. The suspected fraud shouldn't influence this number.
- Sum all K for drawings in drawing 4 and higher. The number of winners should be a Poisson distribution. Calculate the probability that there are as many or more winners as observed. The suspected fraud adds winners here.

And one "exploratory" approach: For each time between wins, calculate how many tickets have been sold before someone won. Consider half the tickets for the drawing where someone won. Make a plot of "number of rounds surviving" as function of the number of tickets sold. It should be roughly an exponential distribution. If it deviates too much from that it is suspicious (but not quantified).
 
  • #23
Jonathan212 said:
If we have no beliefs, no assumptions like "the coin is magnetic" or "it is biased to produce heads 65% of the time", is the bayesian approach hopeless?
Typically you use what is called an uninformed prior. You say something like the coin is biased to produce heads between 0% and 100% with uniform probability of any value in that range.
 
  • #24
Got myself a year's worth of K's. Surprise, they do not increase as the accumulated money increases, they decrease!

image2-jpg.jpg
 

Attachments

  • Image2.jpg
    Image2.jpg
    42.2 KB · Views: 273
  • #25
Could the data be sorted backwards from what you expect?
 
  • #26
Jonathan212 said:
"Your first claim is that the organisers are paying the winnings to themselves. There would be no data as such to support this."

Of course there would be data to support the accusation if they paid the winnings to themselves TOO frequently or too anomalously, it probably wouldn't be evidence good enough for court but it would be evidence good enough for us mathematically oriented guys and anyone who would care to check our calculations. Btw, it is extremely easy for the organizers to pay the winnings to themselves if they wanted to, except they wouldn't put it in their... tax return, the money would be won by a thug of theirs and laundered and spent through offshore accounts. The state could easily prevent any fraud simply by forcing them to give the police a complete list of numbers played each week, so a cop could search for the winning numbers after each draw and the organizers would not be able to subsequently add tickets which is what we are accusing them of here.

"Your second claim is that the organizers are controlling the weeks on which a win takes place."

Yes but by paying the winnings to their thugs on those weeks, I don't know why you consider this a separate claim. By the way, wins are too frequent near the 5th draw after a win (2 draws per week as I said initially but let's keep it simple and pretend it's one draw per week, a win every 5 weeks). Such a peak is expected as the amount to be won accumulates and more and more tickets are sold. But it may occur too early if not enough people play and the organizers cheat to prevent demotivation and a collapse of sales.
I think that these claims are off topic here. We can talk about the statistics. But evidence for these claims would not be statistical, it would come through forensic accounting and police investigation, neither of which we do here.

Let’s just stick with the statistical modeling here and not discuss fraud. Any further posts regarding fraud will be deleted.
 
  • #27
Dale said:
I think that these claims are off topic here. We can talk about the statistics. But evidence for these claims would not be statistical, it would come through forensic accounting and police investigation, neither of which we do here.

Let’s just stick with the statistical modeling here and not discuss fraud. Any further posts regarding fraud will be deleted.

I'm not sure I'm following all of this, but it's my understanding that many people tend to choose the same sort of numbers in the lottery. Numbers related to birthdays etc. So, you might expect that as more tickets get sold, you get more duplicates and not so many "new" numbers.

You also have to take into account: for small numbers of ticket sales the chance of there being a winning ticket increases approximately in proportion to the number of tickets sold; but, as the number of tickets sold increases the chance of there being a winning ticket increases more slowly. Even without any bias towards certain numbers.

In short, you would need some analysis of the numbers people tend to pick in addition to the total number of ticket sales to calculate the probability of there being a winning ticket on a given week.
 
  • Like
Likes Dale
  • #28
PeroK said:
you would need some analysis of the numbers people tend to pick in addition to the total number of ticket sales to calculate the probability of their being a winning ticket on a given week.
I agree. I was trying to convey that point earlier also.
 
  • Like
Likes PeroK
  • #29
It's definitely in the right order. It is the whole of year 2018. You shouldn't try to isolate statistics from real life considerations when faced with this table of K values that scream for a psychological explanation: lottery customers are getting de-motivated (K decreases) by default as time passes, even with an increasing prize! And it is only through massive advertising that the game is resuscitated periodically and you get the spikes. Additionally the organizers may legally buy lots of tickets themselves when sales go too low (it costs them nothing) in order to produce winners to show to the media and push the narrative that the high prize made everyone rush to buy tickets.
 
Last edited:
  • #30
Jonathan212 said:
It's definitely in the right order. It is the whole of year 2018. You shouldn't try to isolate statistics from real life considerations when faced with this table of K values that scream for a psychological explanation: lottery customers are getting de-motivated by default as time passes, even with an increasing prize, and it is only through massive advertizing that the game is resuscitated periodically and you get the spikes, plus the organizers may also legally buy lots of tickets themselves (it costs them nothing) to produce winners for the media and push the narrative that the high prize made everyone rush to buy tickets.

What on Earth are you talking about? There is not an iota of mathematics in that post.
 
  • #31
Let's take it step by step. Prize goes up, number of tickets K goes down. Then K suddenly jumps up 300%. Over and over and over. We want to establish from the graph whether this is anomalous statistically and fit mathematical models to theories about its cause.
 
  • #32
The correlation between the numbers people pick will increase the variance of the number of winners a bit, but probably not too much (unless we are really unlucky, but outliers can be removed).
Jonathan212 said:
We want to establish from the graph whether it is anomalous statistically and fit mathematical models to it.
Wait... we can't do that for K. It will depend on the price money, advertisement and many more that we can't control. We can only see if the number of winners is realistic given the values of K.
 
  • #33
Shall I graph the number of winners too? It's 1 wherever you see a peak and rarely 2 or more.
 
  • #34
Well, without the list of winners we can't determine if there are more or fewer winners than expected, obviously.
A table or other format that is easy to parse would be useful, too.
 
  • #35
"We can only see if the number of winners is realistic given the values of K."

That's very much the gist of it in the end. Let's see. Got 16 years worth of data now, except it's from another lottery where you choose 6 numbers out of 50. How do we use the attached table to detect the specific fraud where the organizers add a winning ticket after the draw?

lottery statistics.gif
 

Attachments

  • lottery statistics.txt
    25.5 KB · Views: 171
Last edited:
<h2>1. What is the mean time between lottery wins?</h2><p>The mean time between lottery wins refers to the average amount of time it takes for a person to win a lottery prize. This can vary depending on the specific lottery game and the number of players participating.</p><h2>2. How is the probability of fraud by lottery organizers calculated?</h2><p>The probability of fraud by lottery organizers is calculated by considering various factors such as the security measures in place, the reputation of the lottery organization, and any past incidents of fraud. It is important to note that this probability is not a guarantee and can change over time.</p><h2>3. Is there a correlation between the mean time between lottery wins and the probability of fraud by organizers?</h2><p>There is no direct correlation between the mean time between lottery wins and the probability of fraud by organizers. However, a longer mean time between wins may indicate a lower likelihood of fraudulent activity as it may be more difficult for organizers to manipulate the outcome.</p><h2>4. Are there any measures in place to prevent fraud by lottery organizers?</h2><p>Yes, lottery organizations have various security measures in place to prevent fraud. These can include random number generators, independent audits, and strict regulations and oversight by government agencies. However, it is important for players to also be cautious and report any suspicious activity.</p><h2>5. Can lottery players protect themselves from fraud by organizers?</h2><p>While lottery players cannot completely protect themselves from fraud by organizers, there are some precautions they can take. This includes only playing with reputable and regulated lottery organizations, being aware of common scams, and reporting any suspicious activity. It is also important to remember that winning a lottery is based on luck and there is no guaranteed way to win.</p>

1. What is the mean time between lottery wins?

The mean time between lottery wins refers to the average amount of time it takes for a person to win a lottery prize. This can vary depending on the specific lottery game and the number of players participating.

2. How is the probability of fraud by lottery organizers calculated?

The probability of fraud by lottery organizers is calculated by considering various factors such as the security measures in place, the reputation of the lottery organization, and any past incidents of fraud. It is important to note that this probability is not a guarantee and can change over time.

3. Is there a correlation between the mean time between lottery wins and the probability of fraud by organizers?

There is no direct correlation between the mean time between lottery wins and the probability of fraud by organizers. However, a longer mean time between wins may indicate a lower likelihood of fraudulent activity as it may be more difficult for organizers to manipulate the outcome.

4. Are there any measures in place to prevent fraud by lottery organizers?

Yes, lottery organizations have various security measures in place to prevent fraud. These can include random number generators, independent audits, and strict regulations and oversight by government agencies. However, it is important for players to also be cautious and report any suspicious activity.

5. Can lottery players protect themselves from fraud by organizers?

While lottery players cannot completely protect themselves from fraud by organizers, there are some precautions they can take. This includes only playing with reputable and regulated lottery organizations, being aware of common scams, and reporting any suspicious activity. It is also important to remember that winning a lottery is based on luck and there is no guaranteed way to win.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
25
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
928
Replies
1
Views
2K
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
4K
  • Precalculus Mathematics Homework Help
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Math POTW for Secondary and High School Students
Replies
1
Views
1K
Back
Top