I Mean time between lottery wins and probability of fraud by organizers

27,832
4,285
"Your first claim is that the organisers are paying the winnings to themselves. There would be no data as such to support this."

Of course there would be data to support the accusation if they paid the winnings to themselves TOO frequently or too anomalously, it probably wouldn't be evidence good enough for court but it would be evidence good enough for us mathematically oriented guys and anyone who would care to check our calculations. Btw, it is extremely easy for the organizers to pay the winnings to themselves if they wanted to, except they wouldn't put it in their... tax return, the money would be won by a thug of theirs and laundered and spent through offshore accounts. The state could easily prevent any fraud simply by forcing them to give the police a complete list of numbers played each week, so a cop could search for the winning numbers after each draw and the organizers would not be able to subsequently add tickets which is what we are accusing them of here.

"Your second claim is that the organizers are controlling the weeks on which a win takes place."

Yes but by paying the winnings to their thugs on those weeks, I don't know why you consider this a separate claim. By the way, wins are too frequent near the 5th draw after a win (2 draws per week as I said initially but let's keep it simple and pretend it's one draw per week, a win every 5 weeks). Such a peak is expected as the amount to be won accumulates and more and more tickets are sold. But it may occur too early if not enough people play and the organizers cheat to prevent demotivation and a collapse of sales.
I think that these claims are off topic here. We can talk about the statistics. But evidence for these claims would not be statistical, it would come through forensic accounting and police investigation, neither of which we do here.

Let’s just stick with the statistical modeling here and not discuss fraud. Any further posts regarding fraud will be deleted.
 

PeroK

Science Advisor
Homework Helper
Insights Author
Gold Member
2018 Award
9,402
3,431
I think that these claims are off topic here. We can talk about the statistics. But evidence for these claims would not be statistical, it would come through forensic accounting and police investigation, neither of which we do here.

Let’s just stick with the statistical modeling here and not discuss fraud. Any further posts regarding fraud will be deleted.
I'm not sure I'm following all of this, but it's my understanding that many people tend to choose the same sort of numbers in the lottery. Numbers related to birthdays etc. So, you might expect that as more tickets get sold, you get more duplicates and not so many "new" numbers.

You also have to take into account: for small numbers of ticket sales the chance of there being a winning ticket increases approximately in proportion to the number of tickets sold; but, as the number of tickets sold increases the chance of there being a winning ticket increases more slowly. Even without any bias towards certain numbers.

In short, you would need some analysis of the numbers people tend to pick in addition to the total number of ticket sales to calculate the probability of there being a winning ticket on a given week.
 
27,832
4,285
you would need some analysis of the numbers people tend to pick in addition to the total number of ticket sales to calculate the probability of their being a winning ticket on a given week.
I agree. I was trying to convey that point earlier also.
 
It's definitely in the right order. It is the whole of year 2018. You shouldn't try to isolate statistics from real life considerations when faced with this table of K values that scream for a psychological explanation: lottery customers are getting de-motivated (K decreases) by default as time passes, even with an increasing prize! And it is only through massive advertising that the game is resuscitated periodically and you get the spikes. Additionally the organizers may legally buy lots of tickets themselves when sales go too low (it costs them nothing) in order to produce winners to show to the media and push the narrative that the high prize made everyone rush to buy tickets.
 
Last edited:

PeroK

Science Advisor
Homework Helper
Insights Author
Gold Member
2018 Award
9,402
3,431
It's definitely in the right order. It is the whole of year 2018. You shouldn't try to isolate statistics from real life considerations when faced with this table of K values that scream for a psychological explanation: lottery customers are getting de-motivated by default as time passes, even with an increasing prize, and it is only through massive advertizing that the game is resuscitated periodically and you get the spikes, plus the organizers may also legally buy lots of tickets themselves (it costs them nothing) to produce winners for the media and push the narrative that the high prize made everyone rush to buy tickets.
What on earth are you talking about? There is not an iota of mathematics in that post.
 
Let's take it step by step. Prize goes up, number of tickets K goes down. Then K suddenly jumps up 300%. Over and over and over. We want to establish from the graph whether this is anomalous statistically and fit mathematical models to theories about its cause.
 
32,815
8,660
The correlation between the numbers people pick will increase the variance of the number of winners a bit, but probably not too much (unless we are really unlucky, but outliers can be removed).
We want to establish from the graph whether it is anomalous statistically and fit mathematical models to it.
Wait... we can't do that for K. It will depend on the price money, advertisement and many more that we can't control. We can only see if the number of winners is realistic given the values of K.
 
Shall I graph the number of winners too? It's 1 wherever you see a peak and rarely 2 or more.
 
32,815
8,660
Well, without the list of winners we can't determine if there are more or fewer winners than expected, obviously.
A table or other format that is easy to parse would be useful, too.
 
"We can only see if the number of winners is realistic given the values of K."

That's very much the gist of it in the end. Let's see. Got 16 years worth of data now, except it's from another lottery where you choose 6 numbers out of 50. How do we use the attached table to detect the specific fraud where the organizers add a winning ticket after the draw?

lottery statistics.gif
 

Attachments

Last edited:
It may be simple: out of 1707 draws, 1356 draws produced no winner. That's no winner 79.4% of the time. Was a higher percentage expected given the average K of 6,126,358 and given (50 choose 6) = 15,890,700?
 
Last edited:

PeroK

Science Advisor
Homework Helper
Insights Author
Gold Member
2018 Award
9,402
3,431
It may be simple: out of 1707 draws, 1356 draws produced no winner. That's no winner 79.4% of the time. Was a higher percentage expected given the average K of 6,126,358 and given (50 choose 6) = 15,890,700?
If there is a winner about 20% of the time, then that implies that on average about 20% of the possible sets of numbers are covered. That's about 3.2 million different combinations.

Your figures suggest, therefore, that although 6.1 million tickets are sold, they represent only about 3.2 million combinations. I read yesterday that about 10,000 people play 1, 2, 3, 4, 5, 6 every week, for example. In any case, that would be the likely explanation. With 6 million random tickets I would expect about 5 million different combinations (rough guess). So, these figures are consistent with the hypothesis that players do not chose at random but typically favour certain types of combination.

The only way to verify this, of course, is to obtain figures for the number of combinations typically chosen on a weekly basis.

Note that with these figures, you will have to change your accusation to one where the operators suppress wins - there is no evidence here of excessive wins. It's how few wins there are given the ticket sales that needs to be explained.

PS the above data is consistent with there being an average of 2 winners each time the lottery is won. I.e. as there are 351 weeks when there was a winner there should be about 700 winners in total. Is that data available?
 
Last edited:
If you import the above text file to Excel and do the average of W when W > 0, it's 1.32. Not sure why you want that. The number of 1-winner draws is 280, the number of 2-winner draws is 55 etc. It's all in the summary at the beginning and the raw data is further down.
 

PeroK

Science Advisor
Homework Helper
Insights Author
Gold Member
2018 Award
9,402
3,431
If you import the above text file to Excel and do the average of W when W > 0, it's 1.32. Not sure why you want that. The number of 1-winner draws is 280, the number of 2-winner draws is 55 etc. It's all in the summary at the beginning and the raw data is further down.
There are fewer than 500 winners. That suggests that there may be certain combinations - possibly a relatively small number - with a lot of tickets. And that none of these tickets has won yet. At some time, however, one of these tickets will win and create a large number of winners that week. This would bring the average back towards 2 per win.

There may be another explanation. But, if there really are 10,000 people playing 1, 2, 3, 4, 5, 6 every week, then this is a possible explanation.
 
If we want to assess a single draw, how extreme a single draw is, given K for this draw, what's the proper way to do it?

(50 choose 6)/K must be ok as a factor for small K's, but it can't be right for K=(50 choose 6) even if people choose with a random number generator because even the random number generator will produce duplicates.
 
Last edited:

PeroK

Science Advisor
Homework Helper
Insights Author
Gold Member
2018 Award
9,402
3,431
If we want to assess a single draw, how extreme a single draw is, given K for this draw, what's the proper way to do it?

(50 choose 6)/K must be ok as a factor for small K's, but it can't be right for K=(50 choose 6) even if people choose with a random number generator because even the random number generator will produce duplicates.
If you only know how many tickets have been sold, but not how widely the tickets are distributed, then there is no way to predict the frequency of a lottery win. But, the total number of winners -over a potentially long time - should be more predictable.

Take an example of a lottery with 100 tickets and 50 players. If, for whatever reason, they all have different numbers, then you'll get one win every two weeks on average; and, only ever one winner.

At the other extreme, if they all have the same numbe, then you will only get one win every 100 weeks, but 50 winners every time.

And, if there is something between the two, with perhaps 40 different numbers, then you will get a win less than once every two weeks but sometimes more than one winner.

The common factor is the total number of winners, which relates only to the total number of tickets sold.

In the real lottery, out of 6.1 tickets sold, you might have only 3.2 million different numbers. Most of these would be held by only a few players: perhaps 1-5. But, some special "lucky" numbers might be held by thousands of different players. This could result in the pattern from your data. Most weeks there are a small numbers of winners, but if the lottery is played long enough, eventually one of the commonly held numbers will turn up and you'll get hundreds or thousands of winners.

In this case, it may take a long time for the number of winners to average out to match the ticket sales.

In the meantime, there is no definite, immediate way to know for sure why there are so few winners - given the number of ticket sales.
 
"there is no definite, immediate way to know for sure why there are so few winners - given the number of ticket sales."

Alright, I'm with you on this one. Going back to your simplified lottery, the extremes are

1. a win every 2 weeks
2. 50 wins every 100 weeks

So if we observe a win every single week, that's outside the above range and an anomaly, right? An extreme like the drug extremes previously mentioned. Can't we assign it a number like "p<0.001"?
 

PeroK

Science Advisor
Homework Helper
Insights Author
Gold Member
2018 Award
9,402
3,431
"there is no definite, immediate way to know for sure why there are so few winners - given the number of ticket sales."

Alright, I'm with you on this one. Going back to your simplified lottery, the extremes are

1. a win every 2 weeks
2. 50 wins every 100 weeks

So if we observe a win every single week, that's outside the above range and an anomaly, right? An extreme like the drug extremes previously mentioned. Can't we assign it a number like "p<0.001"?
If you had a win every week, then over time your confidence that the lottery was properly adminstered would reduce.

You're confusing probabilities with confidences.
 
Could go in the opposite direction. Assume numbers 1-30 are f times more popular than the rest and calculate f from the observations of W versus K, starting with W = 0.
 
32,815
8,660
If I interpret the txt right we had 10,457,692,468 tickets sold (that is a lot!) and 465 winners. At 1 in 15,890,700 we would expect 658 winners. To explain this difference with random chance we need a significant share of tickets going to a very small share of combinations. The 8 winners with the very small number of tickets sold (5.8 million) points in this direction, although I would (without calculating) expect more outliers.
 
The 8 winners with the very small number of tickets sold (5.8 million)
Here's the winning numbers at that draw, played in 8 different tickets.

34 27 13 17 6 13

Surprise, it can't be birthday numbers. It's as if someone knew what would happen and bought the same combination 8 times to ensure he wouldn't have to share too much of the prize.
 
Last edited:
What is the statistical significance of 465 instead of 658 winners? I think that is:

P( number of winners <= 465 | all numbers are equally popular )
 

PeroK

Science Advisor
Homework Helper
Insights Author
Gold Member
2018 Award
9,402
3,431
What is the statistical significance of 465 instead of 658 winners? I think that is:

P( number of winners <= 465 | all numbers are equally popular )
If the hypothesis is that ticket numbers were chosen at random (or equally popular), then that hypothesis would be false with almost 100% confidence. The calculated probability above would be close to zero.

But, we already know that numbers are chosen by people with certain biases. The data, from that point of view, tells us nothing. We would need many more weeks (millions perhaps) to see the full picture.

If you knew the distribution of numbers chosen each week, then you could test the hypothesis that the lottery is fair. Or, you could wait a few hundred million weeks or so.
 
"Or, you could wait a few hundred million weeks"

But whence that figure of a few hundred million?
 

PeroK

Science Advisor
Homework Helper
Insights Author
Gold Member
2018 Award
9,402
3,431
"Or, you could wait a few hundred million weeks"

But whence that figure of a few hundred million?
There are 1`5 million possible numbers. If a small number are very popular, let's d say 10, then one of these most popular numbers comes up only once every 1.5 million weeks.

If, for example, about 10,000 people choose 1, 2, 3, 4, 5, 6 every week, then either you look at the numbers chosen to see this; or, you run the lottery millions of times until this combination comes up and you get the data via the 10,000 winners that week.
 

Want to reply to this thread?

"Mean time between lottery wins and probability of fraud by organizers" You must log in or register to reply here.

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving
Top