Probability for the most frequent number in lottery?

In summary, the conversation discusses the frequency of the most frequent number in lottery draws and the probability of it converging to a certain estimate. It also mentions the possibility of estimating the standard deviation and the general form for the second most frequent number. The conversation then goes on to discuss the concept of ordered statistics and how it relates to the lottery draws, with one person mentioning using expectations rather than complicated probability distributions to analyze the problem. Finally, the conversation ends with a formula for calculating the mean of an ordered complete sequence of integers under a uniform distribution.
  • #1
Gerenuk
1,034
5
I was wondering, what is the estimated frequency for the most frequent number in lottery draws? Of course, I don't know which number it will be, but will the probability for that number converge to a certain estimate?

What would be the equation for possible N numbers (e.g. N=49) for the probability P of the most frequent number?
Can I even estimate the standard deviation on that estimate with an equation?

Is it even possible to give a general form for the second most frequent number and so on (i.e. P(1), P(2),...)?
 
Last edited:
Physics news on Phys.org
  • #2
If it's a fair lottery all numbers would have the same probability, that is 1/N.
 
  • #3
In advance they have 1/N. But after 1000 draws there is a very high probability that one of the numbers will appear more often.

For example the same is true for the 1D random walk, where a drunk sailor is walking either left or right each step. After N steps the expected distance from the center is sqrt(N) - so there is an inbalance expected.

I searched on the internet and this topic seems to be called "order statistics". I'm just not sure how do the maths and if correlations matter... :(

Experimentally I find for drawing 6 out of 49 numbers (10000 times) about 12.33(1)% for the most likely number and 12.17(1)% for the least likely number.
 
Last edited:
  • #4
Gerenuk said:
In advance they have 1/N. But after 1000 draws there is a very high probability that one of the numbers will appear more often.

For example the same is true for the 1D random walk, where a drunk sailor is walking either left or right each step. After N steps the expected distance from the center is sqrt(N) - so there is an inbalance expected.

I searched on the internet and this topic seems to be called "ordered statistics". I'm just not sure how do the maths and if correlations matter... :(

Experimentally I find for 49 numbers about 12.33(1)% for the most likely number and 12.17(1)% for the least likely number.

This is an important example of how distinct diverse patterns can arise out of a uniformly random process. If k small random of samples of size n are isolated from a large uniform randomly generated set of size N such that N/n is large, then the distribution of the means of k samples would have greater variance then if N/n were small. Each sample is then allowed to randomly grow according to its distribution parameters to large N' and the process repeated. One gets increasingly different distributions as the process is repeated. . This will occur without any non-random selection process. It can occur by isolation alone.
 
Last edited:
  • #5
Considering expectations may shed some light on this problem solution

The probability that a specified number will occur exactly j times in r drawings follows the binomial distribution:

p(j,r)=b(j;r,1/n)

(j is number of successes, r is number of drawings and 1/n is probability for success)

Thus expected number of numbers that will occur exactly j times in r drawings is simply

E=n*p(j,r)

So take n=49 and say r=188

Expected number of numbers that will not occur in 188 drawings is close to 1.
Expected number of numbers that will occur exactly 3 times in 188 drawings is close to 10.

Expected number of numbers that will occur exactly 8 times in 188 drawings is again close to 1.
 
  • #6
I don't need to know the number of numbers occurring j times.

I only want to find the occurance of the most frequent number.

Basically that's the just "ordered statistics" problem, but I don't know how to apply the equations and also not sure if correlation between the counts of all numbers play a role.
 
  • #7
In given case n=49 and r=188 the most frequent number will occur 8 times(in average)
Do you want to know the probability of this happening?
 
  • #8
Your number seems correct experimentally. Though, I haven't quite understood where it came from. Also I cannot imagine that one can dismiss order statistics or is your method equivalent in this case?

I'd be interested in the best analystical expression (normal approximation) to estimate the frequency of the most appearing number.

And how does it make a difference that I'm actually drawing 6 numbers from 49 in one go?
 
Last edited:
  • #9
Ok, i will try to analyze your input (drawing 6 out of 49 numbers (10000 times)) with my expectation approach. That means we set n=49 and r=6*10000=60000 in the expectation formula.

Below is a piece of the formula outputs:

j....E
1215 0.5456
1216 0.5495
1217 0.5530
1218 0.5560
1219 0.5586
1220 0.5607
1221 0.5623
1222 0.5635
1223 0.5642
1224 0.5644
1225 0.5642
1226 0.5635
1227 0.5623
1228 0.5607
1229 0.5586
1230 0.5561
1231 0.5531

From this we get that the max. expectation 0.5644 falls on j=1224 and this means that the most frequent number will occur 1224 times in average.
But note that differences with neighbors are negligible and in practice there is no reason to assume that one of the numbers will appear more often.

But let's try now with r=1000

Below is a piece of the formula outputs:

j...E
20 4.3787
21 4.2571
22 3.9467
23 3.4962
24 2.9651
25 2.4116
26 1.8841
27 1.4160
28 1.0251
29 0.7158
30 0.4827
31 0.3146
32 0.1985
33 0.1213
34 0.0719
35 0.0413
36 0.0231

Now we see a number with frequency 28 times certainly will appear because expectation is close to 1.
The same applies to a number with frequency 27.
But with high probability two numbers will occur with frequency 26 because expectation is close to 2
and so on.

A conclusion:

The higher the number of drawings, the lower the probability that one of the numbers will appear more often.

I think, very often it is easier to analyze things via expectations rather than via complicated probability distributions.
 
  • #10
Gerenuk said:
I was wondering, what is the estimated frequency for the most frequent number in lottery draws? Of course, I don't know which number it will be, but will the probability for that number converge to a certain estimate?

What would be the equation for possible N numbers (e.g. N=49) for the probability P of the most frequent number?
Can I even estimate the standard deviation on that estimate with an equation?

Is it even possible to give a general form for the second most frequent number and so on (i.e. P(1), P(2),...)?

For an ordered complete sequence of n integers a,b under a uniform distribution and some integer k such that:

[tex]a\leq k\leq b[/tex] and the probability mass function is 1/n

The mean is a+b/2 and the variance is [tex] \frac{(b-a-1)^2}{12}[/tex].

From this you can calculate a standard deviation (SD). However I don't think the SD is really defined for the uniform distribution so I don't believe your question can be answered analytically. The SD is based on the normal distribution.

EDIT: The uniform probability that you will draw a given number k from n=49 in r trials is 1-((n-1)/n)^r. The probability of k being drawn q times in r trials is ((1-((n-1)/n)^r)^q. There's no way to predict a maximal value of q in any given experiment to my knowledge.

I think the proper question is the one I alluded to in post 4. Given r random samples of size n (n>2) from a uniform distribution of size N (N>n), what is the probability of a sample mean equal to or exceeding some value k; [tex] a\leq k\leq b[/tex] as r grows large. This can be obtained from normal theory based on the Central Limit Theorem. It's understood that with the normal distribution, certain sample means will be more probable than others.
 
Last edited:
  • #11
Sounds like you're interested in showing whether or not some observed frequencies are statistically significant.

To formulate the problem more precisely, the lotto consists of k samples without replacement from a population of size n, repeated r times. Let the total counts of each lotto number be [tex](N_1,...,N_n)[/tex]. (I reversed the capitalizations to make it more obvious what are the random variables.) Let the observed frequency of each lotto number be [tex]X_i=N_i/r[/tex]. The fundamental questions are:
  1. What is the joint distribution of [tex](X_1,...,X_n)[/tex]?
  2. What is the distribution of [tex]\max(X_1,...,X_n)[/tex]?
  3. What is the joint distribution of the order statistics of [tex](X_1,...,X_n)[/tex]?
  4. What are the asymptotics of these?
This would be difficult if not intractable except for a few small cases (e.g. using multivariate generating polynomials).

With Eero's insight that the marginal distribution of each N is binomial, and adapting SW VandeCarr's CLT idea, each X would be [tex]k/n\pm O(1/\sqrt{r})[/tex]. This tells us that whatever their dependence structure they are all clustered around [tex]k/n[/tex], which agrees with your simulation.

Eero's next step defines [tex]Y_j=\#\{i:N_i=j\}[/tex] which can be written as a sum of indicator functions so his formula for [tex]E\left[Y_j\right][/tex] holds by linearity - but I don't yet understand how the distribution of the maximum frequency can be inferred this way. Wouldn't the distribution of the maximum vary with the dependence structure?
 

1. What is probability in the context of the most frequent number in lottery?

Probability refers to the likelihood or chance of a particular event occurring. In the context of the most frequent number in lottery, it refers to the chances of a specific number being drawn as the winning number more often than other numbers.

2. How is probability calculated for the most frequent number in lottery?

Probability for the most frequent number in lottery is calculated by dividing the number of times a specific number has been drawn as the winning number by the total number of draws. This will give a decimal value between 0 and 1, which can be multiplied by 100 to get a percentage probability.

3. Can probability be used to predict the most frequent number in lottery?

No, probability cannot be used to predict the most frequent number in lottery. It only gives an indication of the likelihood of a specific number being drawn as the winning number based on past data. The lottery is a random and unpredictable game, so past results do not guarantee future outcomes.

4. Is the most frequent number in lottery more likely to be drawn again in the future?

No, the most frequent number in lottery is not more likely to be drawn again in the future. Each draw is independent and the probability of a number being drawn remains the same regardless of past results. Therefore, the most frequent number in lottery has the same chances of being drawn as any other number.

5. How can probability help in choosing numbers for the lottery?

Probability can be used to inform your number selection for the lottery. If you want to increase your chances of winning, you can choose numbers that have been drawn more frequently in the past. However, it is important to remember that the lottery is a game of chance and there is no guaranteed way to win.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
25
Views
1K
  • Set Theory, Logic, Probability, Statistics
3
Replies
75
Views
6K
  • Set Theory, Logic, Probability, Statistics
2
Replies
47
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
335
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
187
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top