# Need help with statistics / distributions

• muffi
In summary, the individual is seeking help with calculating a distribution of winnings for a lottery based on the odds of matching 2, 3, 4, 5, and 6 numbers with their respective payouts. They have read about distributions and can get distributions for individual number matches, but are unsure how to combine them for total winnings. They are open to using a computer program for simulations. They clarify that they are looking for the distribution for a single drawing where they buy multiple tickets, not multiple drawings. They also mention using a tool called "Crystall Ball" for Monte Carlo simulations.

#### muffi

Hi, I am new here, so I apologize if my post is not appropriate for this forum. I have a background in chemical engineering and used to be really good at math, but after many weeks of trying to solve my problem, I am about ready to admit defeat. I hope someone here can help me out.

My goal is to plot a distribution of winnings that can be expected from a lottery. I can easily calculate the average winnings, but I would like to see the distribution, as well. If I know the individual odds of hitting 2, 3, 4, 5 and 6 numbers, as well as their respective payouts, how do I go about calculating a distribution of the total winnings? I have read a lot about distributions. I can get distributions for the number of winners that match, say 4 numbers, for example. But how do I go about adding distributions for 2, 3, 4, 5 and 6 number winners such that my resulting distribution gives me the probability of total winnings in dollar values? Any help is greatly appreciated.

Are you plotting the total winnings distribution after playing once, or playing n times? For a single play, you can use the hypergeometric distribution, except you convert the number of successes into the winnings.

http://mathworld.wolfram.com/HypergeometricDistribution.html

If it's n times, the fastest and easiest thing for you to do would be to write a computer program ("Monte Carlo") to simulate the lottery n times, and put the total winnings into a histogram of your total winnings. Repeat this algorithm, say 10000 times, filling the histogram each time. At the end of this, you will have a histogram with 10000 values that will match your winnings distribution fairly well. To get an even closer match, increase the number 10000 to something larger.

I only need distributions for playing one time. Thanks for the info, I am going to study the webpage you referred me to.

Correction: I want distributions for a single drawing where I buy many tickets. So I guess I need to take the programming approach?

Last edited:
I think a computer program would be the best way to go. However, if you are playing a single lottery n times, it's not quite the same as playing n lotteries, since in the first case, if you buy enough tickets, you are guaranteed to win, while in the second case, you could, in principle, never win.

There's probably some mega number or something you want to simulate right? If so, you can't use the hypergeometric distribution by itself anyhow.

Last edited:
Basically, I want to see what the winnings distribution is for a single drawing if I buy a large number of tickets, but certainly not a large enough number to guarantee a 5 or 6 number match.

You could get a "brute force" answer by using Monte Carlo simulation to combine the distributions of winnings for each amount of correct numbers, as suggested previously. "Crystall Ball" is a useful Excel-based tool for Monte Carlo.

Last edited:

## 1. What is the difference between descriptive and inferential statistics?

Descriptive statistics refer to the analysis of data that summarizes or describes the characteristics of a sample or population. This includes measures such as mean, median, mode, and standard deviation. Inferential statistics, on the other hand, involves making inferences or predictions about a larger population based on the data from a sample.

## 2. What is a normal distribution and why is it important?

A normal distribution is a bell-shaped curve that shows the distribution of a set of data. It is important because many natural phenomena, such as human height and IQ, tend to follow this distribution. It also allows for the use of various statistical tests and methods to make predictions and draw conclusions about a population.

## 3. How do I determine which statistical distribution to use for my data?

The distribution to use for your data depends on the nature of your data and the type of analysis you are conducting. For example, if your data is continuous and normally distributed, you can use parametric tests such as t-tests and ANOVA. If your data is non-normal or consists of counts or proportions, you may need to use non-parametric tests such as the chi-square test.

## 4. What is the Central Limit Theorem and why is it important?

The Central Limit Theorem states that if you take repeated samples from a population, the sample means will follow a normal distribution regardless of the underlying distribution of the population. This is important because it allows us to use the normal distribution and related statistical tests for a wide range of data, even if the data is not normally distributed.

## 5. How can I check if my data is normally distributed?

There are several methods for checking the normality of data, such as visual inspection using histograms or box plots, and statistical tests such as the Shapiro-Wilk test or Kolmogorov-Smirnov test. These tests compare the shape of your data to a normal distribution and provide a p-value, with a p-value greater than 0.05 indicating that your data is normally distributed.