Derivation of Bernoulli Binomial distribution

In summary, the bernoulli binomial distribution is derived by considering the probability of getting k successes in n independent Bernoulli trials, where each trial has two possible outcomes (success and failure) with probabilities p and q respectively. The formula for the probability of getting k successes in n trials is P(k,n) = ##\binom n k p^k (1-p)^{(n-k)}##, which takes into account all possible combinations of k successes and n-k failures in the trials.
  • #1
Pushoam
962
51

Homework Statement


Derive the bernoulli binomial distribution.

Homework Equations

The Attempt at a Solution


Each bernoulii trial could have only two possible outcomes .
Let’s name one outcome as success and another outcome as failure.
Let’s denote the probability of getting success and failure in each Bernoulli trial by p and q respectively.

Clearly, q = 1-p

For n independent Bernoulli trials, let’s denote the probability of getting k successes by P (k, n).

The probability of getting a success in each trial is p.
So, the probability of getting k sucesses in k trials is ## p^k##.

The probability of getting k sucesses in n independent Bernoulli trials is = probability of getting k sucesses and n-k failures in n independent Bernoulli trials.

The probability of getting k sucesses in k independent Bernoulli trials and getting n – k failures in n-k independent Bernoulli trials are ## p^k~ and~ (1-p)^{(n-k)} ## respectively.

Let’s consider the following events.

Event A : getting k successes in the 1st k trials of n independent Bernoulli trials

Event B : getting n- k failures in the next n- k trials of n independent Bernoulli trials

Now, the probability of getting both events A and B is ## p^k (1-p)^(n-k) ##

But, according to the problem the k successes could be in any k trials of the n independent Bernoulli trials. It is not necessary that these k trials should be the 1st k trials of the n independent Bernoulli trials.

So, the events corresponding to the problem is,

Event C: getting k successes in any of the k trials of n independent Bernoulli trials

Event D: : getting n- k failures in the rest of the n- k trials of n independent Bernoulli trials

Now, the probability of getting both events C and D is what the question is asking.For event C,

In how many ways can I get any of the k trials out of n independent trials?

This is ##\binom n k## i.e. choosing k boxes out of n boxes.

After being chosen a way of getting k trials out of n independent Bernoulli trials , the probability of getting k successes in these chosen k trials is ## p^k##.

After being chosen the k trials, there is only one way of getting n – k trials.

So, after being chosen the k trials, the probability of getting k sucesses in these trials and n-k failures in the rest of n-k trials is ## p^k (1-p)^{(n-k)} ##.

Since, there are ##\binom n k## of choosing k trials and for each choice, the probability of getting k sucesses in these trials and n-k failures in the rest of n-k trials is ## p^k (1-p)^{(n-k) } ##,
the probability of getting k sucesses in any of the k trials of n independent Bernoulli trials and n-k failures in the rest of n-k trials is = sum of the probability of getting k sucesses in these trials and n-k failures in the rest of n-k trials i.e.## p^k (1-p)^{(n-k)} ## for ##\binom n k## times.

Hence, the probability of getting k sucesses out of n independent Bernoulli trials is =P(k,n) = ##\binom n k p^k (1-p)^{(n-k)}##Is this correct?
 
Physics news on Phys.org
  • #2
Pushoam said:

Homework Statement


Derive the bernoulli binomial distribution.

Homework Equations

The Attempt at a Solution


Each bernoulii trial could have only two possible outcomes .
Let’s name one outcome as success and another outcome as failure.
Let’s denote the probability of getting success and failure in each Bernoulli trial by p and q respectively.

Clearly, q = 1-p

For n independent Bernoulli trials, let’s denote the probability of getting k successes by P (k, n).

The probability of getting a success in each trial is p.
So, the probability of getting k sucesses in k trials is ## p^k##.

The probability of getting k sucesses in n independent Bernoulli trials is = probability of getting k sucesses and n-k failures in n independent Bernoulli trials.

The probability of getting k sucesses in k independent Bernoulli trials and getting n – k failures in n-k independent Bernoulli trials are ## p^k~ and~ (1-p)^{(n-k)} ## respectively.

Let’s consider the following events.

Event A : getting k successes in the 1st k trials of n independent Bernoulli trials

Event B : getting n- k failures in the next n- k trials of n independent Bernoulli trials

Now, the probability of getting both events A and B is ## p^k (1-p)^(n-k) ##

But, according to the problem the k successes could be in any k trials of the n independent Bernoulli trials. It is not necessary that these k trials should be the 1st k trials of the n independent Bernoulli trials.

So, the events corresponding to the problem is,

Event C: getting k successes in any of the k trials of n independent Bernoulli trials

Event D: : getting n- k failures in the rest of the n- k trials of n independent Bernoulli trials

Now, the probability of getting both events C and D is what the question is asking.For event C,

In how many ways can I get any of the k trials out of n independent trials?

This is ##\binom n k## i.e. choosing k boxes out of n boxes.

After being chosen a way of getting k trials out of n independent Bernoulli trials , the probability of getting k successes in these chosen k trials is ## p^k##.

After being chosen the k trials, there is only one way of getting n – k trials.

So, after being chosen the k trials, the probability of getting k sucesses in these trials and n-k failures in the rest of n-k trials is ## p^k (1-p)^{(n-k)} ##.

Since, there are ##\binom n k## of choosing k trials and for each choice, the probability of getting k sucesses in these trials and n-k failures in the rest of n-k trials is ## p^k (1-p)^{(n-k) } ##,
the probability of getting k sucesses in any of the k trials of n independent Bernoulli trials and n-k failures in the rest of n-k trials is = sum of the probability of getting k sucesses in these trials and n-k failures in the rest of n-k trials i.e.## p^k (1-p)^{(n-k)} ## for ##\binom n k## times.

Hence, the probability of getting k sucesses out of n independent Bernoulli trials is =P(k,n) = ##\binom n k p^k (1-p)^{(n-k)}##Is this correct?
Yes, it is correct.
 
  • #3
Thanks.
 
  • #4
Essentially, albeit a bit convoluted.
 
  • #5
Orodruin said:
Essentially, albeit a bit convoluted.
I didn't summarize the steps because I used to forget the derivation. This time I did it as my intuition led me.

I am not understanding, <k>.
The book gives <k> = np
k is the number of sucesses which we want in n independent Bernoulli trials.

So, does <k> mean average number of sucesses we can get in the n independent Bernoulli trials?

Is it that in the 1st trial we get 0 success, in the 2nd trial we get one success, in the 3rd trial we get 0 one success ; i.e. we took the outcomes of n trials, sum them and divide them by n. This gives the average number of success in n trials.<k> = ## \Sigma_{k=0 } ^ { n} {k P(k,n)} ##, where P(k,n) = ##\binom n k p^k (1-p)^{(n-k)}##

But, then how does <k> = np?
 
Last edited:
  • #6
upload_2017-12-2_16-44-1.png

I thought of Binomial distribution as the probability of getting k sucesses out of n independent Bernoulli trials.

I am not understanding how to get the following understanding from the above understanding:
The Binomial distribution is the sum of n independent Bernoulli trials.

Let’s associate a random variable with each trial. We have n random variables ## X_1, X_2, …, X_n ##.
Now, these variables are outcome of each corresponding trial.
When the outcome is success, the variable has the value 1 and when the outcome is failure , the variable has the value 0.
Then, k could be written as the sum of these variables.

So, <k> = ## < X_1 + X_2+…+X_n> ##
Since the variables are independent, ## < X_1 + X_2+…+X_n> = < X_1> +<X_2> +… + <X_n>##

Now,## <X_n> = 1*p + 0*q = p##

So, <k> = np.

I derived this with the help of the following idea:
When the outcome is success, the variable has the value 1 and when the outcome is failure , the variable has the value 0.
But, I do not know why I should take the help of this idea.

Why is it necessary to know that <k> = np?
Does it have any physical significance? I mean is there anything here to understand or is this just a mathematical calculation?
 

Attachments

  • upload_2017-12-2_16-44-1.png
    upload_2017-12-2_16-44-1.png
    12.7 KB · Views: 951
  • #7
Summary of derivation of Binomial distribution
RUber said:
If p is the probability of a win, then p^k is the probability of winning k times in a row.
If you have n trials and only win k times, then you lose the rest (n-k) of te trials. So the probability of winning the first k and then losing the rest would be ##p^k(1-p)^{n-k}##.
Now comes the combination part.
As I said, the probability of winning the first k and losing the rest is that piece of the formula. But the probability would be the same if you lost the first n-k and won the last k.
In all, you have to add up all the possible ways to win k times out of n. Each way has the same probability, so the total probability of winning k times is the probability of one of the ways times the number of combinations of k wins and n-k losses.
 
  • #8
Is there anyway to visualize standard deviation also? Or is it, too, just a mathematical calculation?
How does it matter whether the standard deviation for a given data is more or less?

I have not understood the following part.
Will you please put some light upon the following:
upload_2017-12-2_18-13-47.png


Why is it said that the standard deviation tells us the width of a distribution?
 

Attachments

  • upload_2017-12-2_18-13-13.png
    upload_2017-12-2_18-13-13.png
    38.3 KB · Views: 487
  • upload_2017-12-2_18-13-47.png
    upload_2017-12-2_18-13-47.png
    38.3 KB · Views: 927
Last edited:
  • #9
Pushoam said:
So, does <k> mean average number of sucesses we can get in the n independent Bernoulli trials?
It is the expected number of successful trials.

Pushoam said:
But, then how does <k> = np?
If you compute the sum, you will find ##np##. The easier way of seeing it is by linearity of expectation value. Each trial gives an expectation value of ##p## and you have ##n## trials.

Pushoam said:
Why is it necessary to know that <k> = np?
It is a property of the distribution, just as its variance. Whether it is necessary to know it or not depends on what information you want to extract from the distribution.

Pushoam said:
Is there anyway to visualize standard deviation also?
The standard deviation (or more conveniently, its square - the variance) is a measure of the spread in the distribution, i.e., how much the result will typically differ from the expectation value.
 
  • #10
Pushoam said:
Is there anyway to visualize standard deviation also? Or is it, too, just a mathematical calculation?
How does it matter whether the standard deviation for a given data is more or less?

I have not understood the following part.
Will you please put some light upon the following:
View attachment 215952

Why is it said that the standard deviation tells us the width of a distribution?

Because that is exactly what it does.

You can have two different distributions with the same mean but different standard deviations, and it is handy to have some (at least crude) way to speak about some of their differences in numerical terms. If one distribution is more "spread out" than another it has a higher standard deviation. The simplest example would be when comparing the two distributions Unif(-1,1) and Unif(-5,5). The first distribution describes outcomes that are uniformly distributed between -1 and +1, while the second between -5 and +5. If you draw a random sample from the first distribution your values would always lie sprinkled between -1 and +1, but a sample from the second distribution will have some outcomes near +5 or -5.

The fact that the mean increases linearly in ##n## but the standard deviation increases like ##\sqrt{n}## is what makes the world possible! Looking at averages ##\bar{X} =(1/n) \sum_1^n X_i##, if the individual random terms have mean ##\mu## and standard deviation ##\sigma##, the average has mean ##\mu## but standard deviation ##\sigma/\sqrt{n}##. For very, very large ##n## the average ##\bar{X}## is "almost non-random", very much like a deterministic quantity. That is why Physics works, and is why life is possible in the universe: the huge number of atomic particles undergoing their random motions look "organized" on the macro scale of everyday life. We don't see all the underlying randomness, and that is good because it allows events to happen predictably; it makes the cells in our bodies behave as they should and it makes it possible for the world to exist.
 
  • Like
Likes Pushoam
  • #11
Pushoam said:
View attachment 215946
I thought of Binomial distribution as the probability of getting k sucesses out of n independent Bernoulli trials.

I am not understanding how to get the following understanding from the above understanding:
The Binomial distribution is the sum of n independent Bernoulli trials.

Let’s associate a random variable with each trial. We have n random variables ## X_1, X_2, …, X_n ##.
Now, these variables are outcome of each corresponding trial.
When the outcome is success, the variable has the value 1 and when the outcome is failure , the variable has the value 0.
Then, k could be written as the sum of these variables.

So, <k> = ## < X_1 + X_2+…+X_n> ##
Since the variables are independent, ## < X_1 + X_2+…+X_n> = < X_1> +<X_2> +… + <X_n>##

Now,## <X_n> = 1*p + 0*q = p##

So, <k> = np.

I derived this with the help of the following idea:
When the outcome is success, the variable has the value 1 and when the outcome is failure , the variable has the value 0.
But, I do not know why I should take the help of this idea.

Why is it necessary to know that <k> = np?
Does it have any physical significance? I mean is there anything here to understand or is this just a mathematical calculation?

When you ask for the number of heads in 20 coin tosses, you are asking for the number of heads on toss 1 + the number of heads on toss 2 + ... + the number of heads on toss 20. For each toss that number is either 0 (if you get a tail) or 1 (if you get a head).

As to why it is necessary to know ##\langle k \rangle = np##: well, that just tells you the expected number of "successes". If I toss a pair of fair dice, the probability of getting a '7' on any toss is p = 1/6. If I toss a pair of dice 600 times I would expect to get a number of '7's near 600/6 = 100. In any actual experiment (consisting of tossing dice 600 times) the number of '7's will likely be different from 100 most of the time, but not all that different: sometimes higher, sometimes lower but hovering around 100. If you were asked to bet on the number of '7's in 600 tosses, the number 100 would be your best guess. Over the long run you would win the bet more often by picking 100 than by picking any other number. (Admittedly, you would not win very often but you would win even less often if you picked a number other than 100.)
 
Last edited:
  • Like
Likes Pushoam
  • #12
Ray Vickson said:
If you were asked to bet on the number of '7's in 600 tosses, the number 100 would be your best guess.
Just to add that the expectation value is not always the best guess. If I flip 101 coins, 50.5 is a (very) bad guess for the number of heads.
 
  • Like
Likes Pushoam
  • #13
Orodruin said:
Just to add that the expectation value is not always the best guess. If I flip 101 coins, 50.5 is a (very) bad guess for the number of heads.

Do you mean that one has to use common sense, too, before declaring one's guess?
But, then the best guess should be close to the mean and physically possible. Right?
So, the best guess, here, would be 50 or 51. Right?

How best our guess would be is determined by the fractional width of the distribution i.e.## \frac{\sigma_k}{< k>}##. Right? The smaller the fractional width, the better the guess.
 
Last edited:
  • #14
Pushoam said:
Do you mean that one has to use common sense, too, before declaring one's guess?
But, then the best guess should be close to the mean and physically possible. Right?
So, the best guess, here, would be 50 or 51. Right?
You need to know the distribution. If the distribution is a sum of several independent variables (as in this case) the clt tells you it will be approximately Gaussian and therefore peaked near the expected value. However, for a general distribution this may not be the case.
 
  • #15
Orodruin said:
You need to know the distribution. If the distribution is a sum of several independent variables (as in this case) the clt tells you it will be approximately Gaussian and therefore peaked near the expected value. However, for a general distribution this may not be the case.
Do you mean that a distribution is a variable as you are saying the distribution is a sum of several independent variables?
What I meant by distribution function is the probability of getting k where k is sum of several independent variables and the distribution is the graph of the graph of the probability of getting k wrt. k. Is this correct?

What is clt here?
For any value of n, the probability that the k will be equal to the expected value is maximum. But, if the expected value is physically impossible as in the example you gave, then the best guess will be that value which is physically possible and closest to the mean value. Is this correct?
 
  • #16
Ray Vickson said:
When you ask for the number of heads in 20 coin tosses, you are asking for the number of heads on toss 1 + the number of heads on toss 2 + ... + the number of heads on toss 20. For each toss that number is either 0 (if you get a tail) or 1 (if you get a head).
So, the variable ## X_i ## does not represent the outcome of ith trial. It represents the no. of success in the ith trial.
In the following:
Pushoam said:
Let’s associate a random variable with each trial. We have n random variables ##X_1, X_2, …, X_n ##.
Now, these variables are outcome of each corresponding trial.
When the outcome is success, the variable has the value 1 and when the outcome is failure , the variable has the value 0.
Then, k could be written as the sum of these variables.

So, <k> = ## < X_1 + X_2+…+X_n>##
this
Pushoam said:
Now, these variables are outcome of each corresponding trial.
is wrong.
 

What is the Bernoulli Binomial distribution?

The Bernoulli Binomial distribution is a discrete probability distribution that describes the number of successes in a series of independent Bernoulli trials. It is often used to model binary outcomes, where there are only two possible outcomes (such as success or failure).

What are the key assumptions of the Bernoulli Binomial distribution?

The key assumptions of the Bernoulli Binomial distribution are that the trials are independent, each trial has only two possible outcomes, the probability of success remains the same for each trial, and the number of trials is fixed.

How is the Bernoulli Binomial distribution derived?

The Bernoulli Binomial distribution is derived by considering a series of independent Bernoulli trials and calculating the probability of a certain number of successes in a fixed number of trials. This can be done using the binomial coefficient and the probability of success for each trial.

What is the formula for the Bernoulli Binomial distribution?

The formula for the Bernoulli Binomial distribution is P(X=k) = nCk * p^k * (1-p)^(n-k), where n is the total number of trials, p is the probability of success, and k is the number of successes.

What are some real-world applications of the Bernoulli Binomial distribution?

The Bernoulli Binomial distribution is used in various fields, such as finance, biology, and psychology. Some examples of its applications include modeling stock price movements, analyzing the success rates of medical treatments, and studying human decision-making processes.

Similar threads

  • Precalculus Mathematics Homework Help
Replies
2
Views
2K
  • Precalculus Mathematics Homework Help
Replies
4
Views
708
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
933
  • Precalculus Mathematics Homework Help
Replies
5
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Precalculus Mathematics Homework Help
Replies
5
Views
2K
  • Precalculus Mathematics Homework Help
Replies
5
Views
4K
  • Precalculus Mathematics Homework Help
Replies
10
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top