Math: discrete probability distribution

masterchiefo · Feb 15, 2016

Ray Vickson said:

This one is subtle and not entirely straightforward. It uses several results/methods in probability.

Actually, the final result is surprisingly easy (almost unbelievable, if fact), but getting to the solution is not simple.

When I first saw the form of the solution I could not believe it; I had to check several numerical examples before finally hitting on a simple proof and convincing myself that the result really is true.

This is the answer:
Variance = p*n*q
= 8*0.5*0.5
=2

I Cant believe it was that sample and I took 3 days...
I complicated my life for nothing trying to do some complex prob with multiple formulas.

geoffrey159 · Feb 16, 2016

@Ray Vickson,do you mind to share what you have for ##P(y = k) ## ?

Because I have tried it, and even though I still have errors in my expression (it sums to 98.02 % on ##[[0,8]]##), I find a complicated expression. What do you have ?EDIT: it sums to 100.000073 %

Ray Vickson · Feb 16, 2016

geoffrey159 said:

@Ray Vickson,do you mind to share what you have for ##P(y = k) ## ?

Because I have tried it, and even though I still have errors in my expression (it sums to 98.02 % on ##[[0,8]]##), I find a complicated expression. What do you have ?EDIT: it sums to 100.000073 %

There are two ways to do it, and both are instructive in their own ways. First, let X = number of "butter" answers, Y = number of "butte": answers that are truly "butter" = number of correctly-identified "butter" cookies. X has distribution Binomial(18,1/2), while (Y | X=n) has distribution Hypergeom(8,10,n). Below, denote the binomial coefficient ##{a \choose b}## as ##C(a,b)##.

Method (1) conditioning:
\begin{array}{rcl} P(Y = k) &=& \sum_{n=0}^{18} P(X = n) P(Y = k | X = n) \\<br /> &=& \displaystyle \sum_{n=0}^{18} C(18,n)2^{-18} \frac{C(8,k) C(10,n-k)}{C(18,n)} \\<br /> &=& C(8,k) 2^{-8} \sum_{n=0}^{18} 2^{-10} C(10,n-k)<br /> \end{array}
In the last summation, ##n## actually runs from ##n = k## to ##n = 10+k##, so changing ##n## to ## n = k+j, j=0,1, \ldots, 10##, the last line above becomes
C(8,k) 2^{-8} \left( 2^{-10} \sum_{j=0}^{10} C(10,j) \right) = C(8,k) 2^{-8} \left( 2^{-10} (1+1)^{10} \right) = C(8,k) 2^{-8}.
In other words, ##Y \sim \text{binomial}(8,1/2)##.

Method (2) directly:

Each of the 8 "butter" cookies gets labelled as "butter" or "margarine" independently at random, with probabilities of 1/2 for each. Thus, the number of correctly-labelled butter cookies is binomial with parameters 8 and 1/2.

This is a good illustration of how changing a point of view can alter the analysis. In the first way, we initially look at the "butter" labels and then ask now many of them are correct; in the second way we look at the "butter" cookies and ask how many of them are labelled correctly. The final event is the same in both views, but the methods of analysis are very different.

geoffrey159 · Feb 16, 2016

I'm impressed, you have solved this so easily and convincingly by conditioning. I have a lot to learn :-)
Thank you for the demo !

Ray Vickson · Feb 16, 2016

geoffrey159 said:

@Ray Vickson,do you mind to share what you have for ##P(y = k) ## ?

Because I have tried it, and even though I still have errors in my expression (it sums to 98.02 % on ##[[0,8]]##), I find a complicated expression. What do you have ?EDIT: it sums to 100.000073 %

Often, that type of discrepancy is just a result of using floating-point numbers instead of exact rationals, so you get inevitable roundoff errors. Perhaps that is what you are seeing?

masterchiefo · Feb 16, 2016

Ray Vickson said:

Often, that type of discrepancy is just a result of using floating-point numbers instead of exact rationals, so you get inevitable roundoff errors. Perhaps that is what you are seeing?

I do not really appreciate the fact that I proposed I would use binomial with n p q and being told this is not how we do this problem.
and also I've said multiple times n=8 and you have told me this is incorrect and made me feel like a retard because at the end now you are saying its a binomial with n=8 and p=0.5 which was the early thinking of my problem.

This is the true reason why I took 3 days because of all the things you said confused me and made me look for complex resolution involving multiple probabilities formulas.

Ray Vickson · Feb 16, 2016

masterchiefo said:

I do not really appreciate the fact that I proposed I would use binomial with n p q and being told this is not how we do this problem.
and also I've said multiple times n=8 and you have told me this is incorrect and made me feel like a retard because at the end now you are saying its a binomial with n=8 and p=0.5 which was the early thinking of my problem.

This is the true reason why I took 3 days because of all the things you said confused me and made me look for complex resolution involving multiple probabilities formulas.

I still do not regard your solution as entirely satisfactory, because it ought to include a statement such as "The random variable has distribution Binomial(8,0.5) because ______________ fill in your reasons here ____________________________________". You never gave any reasons, and you did not convince me.

masterchiefo · Feb 16, 2016

Ray Vickson said:

I still do not regard your solution as entirely satisfactory, because it ought to include a statement such as "The random variable has distribution Binomial(8,0.5) because ______________ fill in your reasons here ____________________________________". You never gave any reasons, and you did not convince me.

Because the probability is always 0.5 and in Binomial the probability does not change every trials, stays the same.
P is constant in Binomial while in Hypergeomtric its not constant.

geoffrey159 · Feb 17, 2016

Ray Vickson said:

Often, that type of discrepancy is just a result of using floating-point numbers instead of exact rationals, so you get inevitable roundoff errors. Perhaps that is what you are seeing?

The way I have done it is really over complicated compared to your solution.

Man has to choose at each try between
(0,B) for butter incorrect,
(1,B) for butter correct,
(0,M) for margarine incorrect,
(1,M) for margarine correct,

therefore the random experiment is a uniform choice among the ## (m_1,...,m_{18}) ## such that
if ## k_1 ## (resp. ##k_2##, ##k_3##, ##k_4##) denotes the number of (1,B) ( resp. (0,M), (1,M), (0,B) ),
we have ## k_1 + k_2 = 8 ## and ## k_3 + k_4 = 10 ##.
We call ##\Omega ## this set.

Then calling ## I = \{ k_1,k_2,k_3,k_4:\ k_1 + k_2 = 8, \ k_3+k_4 = 10 \} ##, the cardinal of ##\Omega## is :##|\Omega| = \sum_I { 8 \choose k_1 }{ 8-k_1 \choose k_2}{ \max ( 18 - k_1 -k_2 , 10 ) \choose k_3 }{ 10 - k_3 \choose k_4} = \sum_I { 8 \choose k_1 }{ 10 \choose k_3 } = \sum_{0 \le k_1 \le 8} { 8 \choose k_1} \sum_{0 \le k_3 \le 10} { 10 \choose k_3} = 2^8.2^{10} = 2^{18} ##

For event ## \{ Y = k_1 \} ##, writing ##J_{k_1} = \{ k_2,k_3,k_4: \ k_2 = 8 -k_1, \ k_3+k_4 = 10 \} ## we have

## |Y = k_1 | = { 8 \choose k_1 } \sum_{J_{k_1}} { 8-k_1 \choose k_2}{ \max ( 18 - k_1 -k_2 , 10 ) \choose k_3 }{ 10 - k_3 \choose k_4} = 2^{10} . { 8 \choose k_1 } ##

Giving ##P(Y= k_1 ) = \frac{2^{10} . { 8 \choose k_1 }}{2^{18}} = 2^{-8} { 8 \choose k_1 } ##

and ##Y ## is a ##B(8,1/2)##

Ray Vickson · Feb 17, 2016

geoffrey159 said:

The way I have done it is really over complicated compared to your solution.

Man has to choose at each try between
(0,B) for butter incorrect,
(1,B) for butter correct,
(0,M) for margarine incorrect,
(1,M) for margarine correct,

therefore the random experiment is a uniform choice among the ## (m_1,...,m_{18}) ## such that
if ## k_1 ## (resp. ##k_2##, ##k_3##, ##k_4##) denotes the number of (1,B) ( resp. (0,M), (1,M), (0,B) ),
we have ## k_1 + k_2 = 8 ## and ## k_3 + k_4 = 10 ##.
We call ##\Omega ## this set.

Then calling ## I = \{ k_1,k_2,k_3,k_4:\ k_1 + k_2 = 8, \ k_3+k_4 = 10 \} ##, the cardinal of ##\Omega## is :##|\Omega| = \sum_I { 8 \choose k_1 }{ 8-k_1 \choose k_2}{ \max ( 18 - k_1 -k_2 , 10 ) \choose k_3 }{ 10 - k_3 \choose k_4} = \sum_I { 8 \choose k_1 }{ 10 \choose k_3 } = \sum_{0 \le k_1 \le 8} { 8 \choose k_1} \sum_{0 \le k_3 \le 10} { 10 \choose k_3} = 2^8.2^{10} = 2^{18} ##

For event ## \{ Y = k_1 \} ##, writing ##J_{k_1} = \{ k_2,k_3,k_4: \ k_2 = 8 -k_1, \ k_3+k_4 = 10 \} ## we have

## |Y = k_1 | = { 8 \choose k_1 } \sum_{J_{k_1}} { 8-k_1 \choose k_2}{ \max ( 18 - k_1 -k_2 , 10 ) \choose k_3 }{ 10 - k_3 \choose k_4} = 2^{10} . { 8 \choose k_1 } ##

Giving ##P(Y= k_1 ) = \frac{2^{10} . { 8 \choose k_1 }}{2^{18}} = 2^{-8} { 8 \choose k_1 } ##

and ##Y ## is a ##B(8,1/2)##

I like this solution, too, although I must admit I have not verified all the details.

Math: discrete probability distribution

Similar threads

Prove that the integral is equal to ##\pi^2/8##

Calculating radius of gyration of plane figure about x-axis

Limit of piecewise function using epsilon delta

New axis of rotation for composite of rotations in Euclidean space

Dot diagrams and Jordan canonical forms

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers