# I Mean time between lottery wins and probability of fraud by organizers

#### Jonathan212

Looked at some lottery wins and something was fishy. This a lottery where you pick 5 numbers out of the set (1,2, ..., 50). When no one wins, the money goes to the next iteration of the game so the prize gets bigger and bigger. It seemed that a win was too regular around every 2 or 3 weeks and never occurred in consecutive draws. As if people were waiting for the money to accumulate, which is probably true but there is another possibility also, and that is fraud by the organizers: too few people play, no wins occur at all for ages, and because this demotivates players and could reduce sales to a possible collapse, the organizers cheat and win the prize themselves periodically. How would you investigate this mathematically based on the observed distribution of time between wins? What is the expected mean and standard deviation of the time between (full-match) wins? Given the number of 5-number sets played in each iteration: K = 100,000 and iterations per week = 2.

The observed time between wins seems to have a sharp distribution with a mean around 2.5 weeks and this is fishy. Based on an observation like this, what is the probability that the organizers cheat?

What should K be to match the observed mean time between wins?

Last edited:
Related Set Theory, Logic, Probability, Statistics News on Phys.org

#### Dale

Mentor
What is the expected mean and standard deviation of the time between (full-match) wins?
I don’t think the answer can be determined without knowing the number of tickets sold. If K fluctuates then it would be complicated.

#### Jonathan212

Sure, just pretending we know K and it is fixed, just to derive the formula. Then stick in some real numbers. Or start from the 2.5 weeks and derive a fixed K, then work out standard deviation around the 2.5 weeks and compare this with observed deviation.

#### mfb

Mentor
There are (50 choose 5) options to pick lottery numbers, on average you expect a jackpot winner every (50 choose 5)/K weeks (shared jackpots count as multiple winners). Plug in numbers and see if they look realistic. The assumption of a constant number of players is very unrealistic, however. Larger jackpots attract more players.

Sometimes so "many players" that they change the lottery (long article, skip to the first mention of "Winfall" for the lottery part).

#### Jonathan212

What's (50 choose 5) short for?

Mentor

#### Dale

Mentor
You can use the binomial distribution to calculate the number of likely winners each week. For that you need the number of players (K) and the probability of each player winning (1/[50 choose 5]). Then the probability of at least 1 person winning is 1 - the probability of 0 people winning. That latter probability is exactly what the binomial distribution gives.

That gives you the probability of a winner each draw, call that P. The probability that someone wins in the Nth week is then $(1-P)^{N-1}P$

However, if you detect a significant discrepancy from this model then it is not an indication of fraud. This model assumes K is constant and that the players number selection is independent. Neither of those is an indication of fraud

#### Jonathan212

How many values of K and results do you need to calculate probability of fraud with decent accuracy?

Last edited:

#### mfb

Mentor
Do you know how many people play each time?

#### Dale

Mentor
How many values of K and results do you need to calculate probability of fraud with decent accuracy?
I don’t think that is possible. What you can calculate is the probability of the observed periods between wins according to the model that K is constant and all lottery picks are random. That will undoubtedly be some very low probability. But as we mentioned above there are many ways that model could be wrong besides fraud.

#### Jonathan212

K is not constant we said. It is given for each week. 100,123 the first week, 192,321 the second week, 255,233 the third week etc. Just 3 weeks is way too small a sample. But a year's worth, maybe. Presented with 50 values of K and 50 values of results (=number of winners this week), what is the probability that this happened by chance? If all K's were of the order of 100-1000, the probability that a win occurs every week is close to 0. A win once a year is probably close to 0 too. But with higher K's we need a mathematician.

Reminds me of drug testing against placebos. Someone must have heard of "statistical significance". It's figures like "< 0.001". Looks like a probability.

#### mfb

Mentor
Presented with 50 values of K and 50 values of results (=number of winners this week), what is the probability that this happened by chance?
No matter what K is it will be tiny because it is not the right question.
An equivalent question: Given a sequence of 20 coin tosses (HHTHTHTTHHHTTHTHHTHH), what is the probability that this happened? 1/220 or about 1 in a million. Should we be surprised by this particular result? No. All ~1 million possible sequences have this probability and the one I selected is nothing special.

What you need is the probability "this result or more extreme" where "more extreme" is to be defined. In the coin toss example you could ask "I got 15 times heads, how likely is it that I get 15 or more times the same result?" For the lottery you could consider the total number of wins: How many do you expect given the number of players, how many times did someone win? How likely is it to get so few wins or even fewer? So many or even more?
If you want to look for a pattern of "there is a winner if the jackpot is high" then it gets more complicated to define what we are looking for. This has to be done before analyzing the actual results, otherwise you might bias yourself by selecting a question specifically to find something unusual.

#### Dale

Mentor
Looks like a probability.
It is a probability. It is the probability that data this “extreme” occurred by chance given your data generating model. This is not the same as the probability that fraud was committed.

#### Jonathan212

This is exactly what they do with drugs in preliminary studies, they repeat the experiment 50 times or so and look at the results, just like we look at results of the lottery after the event. An extreme drug outcome would be all lab rats are cured. An extreme lottery outcome would be a single win every single week while K varies from 100 to 1000, which would be fishy as hell. Less extreme lab outcome, 60% of rats cured. Less extreme lottery outcome, you name it. Probability of fraud is a very realistic target, just like probability that the drug is NOT useless.

Last edited:

#### Jonathan212

Let's say a coin is a magnet and you throw it on a table with a huge but weak magnet underneath whereby the north pole faces upwards. We expect more of one face. 100 tosses with 60 heads, is a better result that 10 tosses with 6 heads if you were to bet your money where the dice's north pole is after the event. Or you don't know if there's any magnet involved, you don't know if there is a fraud. What is the probability of fraud with 6 heads and what is it with 60 heads?

#### Dale

Mentor
Probability of fraud is a very realistic target, just like probability that the drug is NOT useless.
That is not the probability that is measured. In medical testing the p value you are talking about is not the probability that the drug has no effect. It is the probability that the data would be that extreme given that the drug has no effect.

If you are familiar with probability notation, a p value gives you P(D|H) which in words is the probability of the data given your hypothesis.

What you are asking about is the opposite. P(H|D) is the probability of the hypothesis given the data. You would need Bayesian methods for that.

#### Jonathan212

Oopsa, I wrote "NOT useless" but I meant useless for the purposes tested. What's the Bayesian approach to the magnetic coin?

#### Dale

Mentor
What's the Bayesian approach to the magnetic coin?
Here is a good tutorial on the topic.

One important concept in Bayesian statistics is the idea of a prior probability. It is a mathematical expression of your beliefs before looking at the data. So, in this case, do we go in assuming that this coin is probably like most coins or do we come in suspicious that this coin may not be typical?

Whatever our prior beliefs are, we express it as a beta distributed random variable, $Beta(\alpha,\beta)$. Then, after we do the experiment we update our posterior beliefs as $Beta(\alpha+n,\beta+m)$ where n is the number of heads and m is the number of tails.

#### Jonathan212

If we have no beliefs, no assumptions like "the coin is magnetic" or "it is biased to produce heads 65% of the time", is the bayesian approach hopeless?

Last edited:

#### PeroK

Homework Helper
Gold Member
2018 Award
If we have no beliefs, no assumptions like "the coin is magnetic" or "it is biased to produce heads 65% of the time", is the bayesian approach hopeless?
You could start with the belief that the lottery is a fraud and test that hypothesis. Then you would have to describe the data that would support your hypothesis and test for that. What data would indicate a fraud?

1) Your first claim is that the organisers are paying the winnings to themselves.

There would be no data as such to support this. Instead, you would need to investigate the list of winners and "follow the money" as they say. I suggest you pass any information you have on this to the police in your country.

2) Your second claim is that the organisers are controlling the weeks on which a win takes place. a) that wins are rare on the first week of a cycle; b) that wins are too frequent on the second week of a cycle; c) that wins are too frequent on the third week of a cycle.

This should be easy to test if you have access to the number of tickets bought every week. All you really need to look at is how often a win takes place each week of a cycle and whether this is consistent with the number of tickets bought on those weeks.

Note, however, that as others have said: if you study a set of data looking for any statistical anomalies and then test for those, then that is a meaningless approach. Instead, you should have a good idea of what you want to test before you look at the data.

#### Jonathan212

"Your first claim is that the organisers are paying the winnings to themselves. There would be no data as such to support this."

Of course there would be data to support the accusation if they paid the winnings to themselves TOO frequently or too anomalously, it probably wouldn't be evidence good enough for court but it would be evidence good enough for us mathematically oriented guys and anyone who would care to check our calculations. Btw, it is extremely easy for the organizers to pay the winnings to themselves if they wanted to, except they wouldn't put it in their... tax return, the money would be won by a thug of theirs and laundered and spent through offshore accounts. The state could easily prevent any fraud simply by forcing them to give the police a complete list of numbers played each week, so a cop could search for the winning numbers after each draw and the organizers would not be able to subsequently add tickets which is what we are accusing them of here.

"Your second claim is that the organizers are controlling the weeks on which a win takes place."

Yes but by paying the winnings to their thugs on those weeks, I don't know why you consider this a separate claim. By the way, wins are too frequent near the 5th draw after a win (2 draws per week as I said initially but let's keep it simple and pretend it's one draw per week, a win every 5 weeks). Such a peak is expected as the amount to be won accumulates and more and more tickets are sold. But it may occur too early if not enough people play and the organizers cheat to prevent demotivation and a collapse of sales.

Last edited:

#### mfb

Mentor
Instead, you should have a good idea of what you want to test before you look at the data.
That won't be perfect as the suspicion for fraud comes from that data already. Using future data only would be perfect but that would take a long time.
50 draws on record still leave a lot of room to detect fraud if it is too obvious.

I suggest the following two tests:
- Sum all K for drawings in drawing 1 to 3. The number of winners should be a Poisson distribution. Calculate the probability that there are as many or more winners as observed. The suspected fraud shouldn't influence this number.
- Sum all K for drawings in drawing 4 and higher. The number of winners should be a Poisson distribution. Calculate the probability that there are as many or more winners as observed. The suspected fraud adds winners here.

And one "exploratory" approach: For each time between wins, calculate how many tickets have been sold before someone won. Consider half the tickets for the drawing where someone won. Make a plot of "number of rounds surviving" as function of the number of tickets sold. It should be roughly an exponential distribution. If it deviates too much from that it is suspicious (but not quantified).

#### Dale

Mentor
If we have no beliefs, no assumptions like "the coin is magnetic" or "it is biased to produce heads 65% of the time", is the bayesian approach hopeless?
Typically you use what is called an uninformed prior. You say something like the coin is biased to produce heads between 0% and 100% with uniform probability of any value in that range.

#### Jonathan212

Got myself a year's worth of K's. Surprise, they do not increase as the accumulated money increases, they decrease! #### Attachments

• 50.9 KB Views: 16

#### Dale

Mentor
Could the data be sorted backwards from what you expect?

"Mean time between lottery wins and probability of fraud by organizers"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving