Probability for a binomially distributed variable X

TheSodesa
Messages
224
Reaction score
7

Homework Statement


Let ##X \sim Bin(n, p)## where ##n=20## and ##p=0.1##. Calculate ##P(|X-\mu| \leq \sigma)##.

Give your answer up to three decimal places.

Homework Equations


For a binomially distributed random variable, using moment generating functions we have:
\begin{equation}
\mu= E(X) = np
\end{equation}

\begin{equation}
\sigma^{2} = Var(X) = np(1-p)
\end{equation}

The probability density function is
\begin{equation}
f(x) = {n \choose x} p^{x}(1-p)^{n-x}
\end{equation}

The Attempt at a Solution


Now the asked probability looked a lot like Tsebyshev's inequality, but that just gave me a zero, and the electronic return system complained about it. It also gave me a hint:

First solve ##|X - \mu| \leq \sigma## and then calculate
<br /> P(|X - \mu| \leq \sigma) = P(X=x_1 \text{ OR } X = x_2 ...)<br />

I started out by solving for ##X##:

\begin{align*}
|X-\mu| \leq \sigma\\
\iff\\
-\sigma \leq X-\mu \leq \sigma\\
\iff\\
\mu-\sigma \leq X \leq \mu + \sigma\\
\iff\\
np - \sqrt{np(1-p)} \leq X \leq np + \sqrt{np(1-p)}\\
\iff\\
\stackrel{\approx 0.658}{2 - \sqrt{2(0.9)}} \leq X \leq \stackrel{\approx 3.342}{2 + \sqrt{2(0.9)}}\\
\end{align*}

Alright. Now I have some numerical values. But now what? I don't know what ##x_1##, ##x_2## etc. are in the hint. Are they referring to ##X=1##, ##X=2## and so on?
 
Last edited:
Physics news on Phys.org
TheSodesa said:

Homework Statement


Let ##X \sim Bin(n, p)## where ##n=20## and ##p=0.1##. Calculate ##P(|X-\mu| \leq \sigma)##.

Give your answer up to three decimal places.

Homework Equations


For a binomially distributed random variable, using moment generating functions we have:
\begin{equation}
\mu= E(X) = np
\end{equation}

\begin{equation}
\sigma^{2} = Var(X) = np(1-p)
\end{equation}

The probability density function is
\begin{equation}
f(x) = {n \choose x} p^{x}(1-p)^{n-x}
\end{equation}

The Attempt at a Solution


Now the asked probability looked a lot like Tsebyshev's inequality, but that just gave me a zero, and the electronic return system complained about it. It also gave me a hint:

First solve ##|X_\mu| \leq \sigma## and then calculate
<br /> P(|X - \mu| \leq \sigma) = P(X=x_1 \text{ OR } X = x_2 ...)<br />

I started out by solving for ##X##:

\begin{align*}
|X-\mu| \leq \sigma\\
\iff\\
-\sigma \leq X-\mu \leq \sigma\\
\iff\\
\mu-\sigma \leq X \leq \mu + \sigma\\
\iff\\
np - \sqrt{np(1-p)} \leq X \leq np + \sqrt{np(1-p)}\\
\iff\\
\stackrel{\approx 0.658}{2 - \sqrt{2(0.9)}} \leq X \leq \stackrel{\approx 3.342}{2 + \sqrt{2(0.9)}}\\
\end{align*}

Alright. Now I have some numerical values. But now what? I don't know what ##x_1##, ##x_2## etc. are in the hint. Are they referring to ##X=1##, ##X=2## and so on?

If you are asking what I think you are the answer is yes. What values of can X take that satisfy that last inequality? What is their probability?
 
LCKurtz said:
If you are asking what I think you are the answer is yes. What values of can X take that satisfy that last inequality? What is their probability?

Well, for a discrete variable ##X##, the point probabilities are given by the distribution function ##f(x)##. If I wanted to find the probability of a certain interval, I would have to calculate the cumulative function in the interval ##0 < X \leq 4##. Or should it be ##0 < X \leq 3##? According the my course handout, for a binomially distributed variable the cumulative function
\begin{equation}
F(X) = P(X \leq x) = \sum_{t=0}^{\lfloor x \rfloor}b(t; n,p)
\end{equation}
I do not know what the small ##b## is in the definition, and the handout doesn't say how to calculate these values. It simply says to look them up in a table or a computer program.

The floor function ##{\lfloor x \rfloor}## is supposedly the largest whole number ##\leq x##, so I guess my cumulative function would be
F(X) = P(1 \leq X \leq 3) = \sum_{t=0}^{\lfloor x \rfloor}b(t; n,p)
 
Last edited:
LCKurtz said:
If you are asking what I think you are the answer is yes. What values of can X take that satisfy that last inequality? What is their probability?

Alright, so I just summed the probabilities ##f(1)##, ##f(2)## and ##f(3)## together, which apparently gave me the right answer. I'm still a bit baffled (might have something to do with the fact that it's almost 4am here), but I'm going to have to let this one go for now.

Thanks for the assistance.
 
TheSodesa said:
Well, for a discrete variable ##X##, the point probabilities are given by the distribution function ##f(x)##. If I wanted to find the probability of a certain interval, I would have to calculate the cumulative function in the interval ##0 < X \leq 4##. Or should it be ##0 < X \leq 3##? According the my course handout, for a binomially distributed variable the cumulative function
\begin{equation}
F(X) = P(X \leq x) = \sum_{t=0}^{\lfloor x \rfloor}b(t; n,p)
\end{equation}
I do not know what the small ##b## is in the definition, and the handout doesn't say how to calculate these values. It simply says to look them up in a table or a computer program.

The floor function ##{\lfloor x \rfloor}## is supposedly the largest whole number ##\leq x##, so I guess my cumulative function would be
F(X) = P(1 \leq X \leq 3) = \sum_{t=0}^{\lfloor x \rfloor}b(t; n,p)

The notation ##b(k)## or ##b(k;n,p)## is used for the probability mass function (NOT density function!):
$$b(k;n,p) = {n \choose k} p^k (1-p)^{n-k}$$.
So the formula in the handout is just ##\sum_k P(X=k) ##, where the sum is over all non-negative integers ##0 \leq k \leq x##.

Your sum ##F(X) = P(1 \leq X \leq 3)## is incorrect (if you meant to write ##F(3)## instead of ##F(X)##). Can you see why?
 
Ray Vickson said:
The notation ##b(k)## or ##b(k;n,p)## is used for the probability mass function (NOT density function!):
$$b(k;n,p) = {n \choose k} p^k (1-p)^{n-k}$$.
So the formula in the handout is just ##\sum_k P(X=k) ##, where the sum is over all non-negative integers ##0 \leq k \leq x##.

Your sum ##F(X) = P(1 \leq X \leq 3)## is incorrect (if you meant to write ##F(3)## instead of ##F(X)##). Can you see why?

I guess I should have written ##F(1 \leq X \leq 3)## instead? ##F(3)## would imply ##P(X\leq 3)##, if I've understood the notation correctly. I have to say it is kind of annoying they switch symbols for the mass functions between distributions. Can't they just stick to ##f##? :confused:
 
TheSodesa said:
I guess I should have written ##F(1 \leq X \leq 3)## instead? ##F(3)## would imply ##P(X\leq 3)##, if I've understood the notation correctly. I have to say it is kind of annoying they switch symbols for the mass functions between distributions. Can't they just stick to ##f##? :confused:

They should NOT use ##f## routinely for the probability mass function of a discrete random variable; it should be used primarily for the probability density function of a continuous random variable. However, ##F## is often used for the distribution function for both types. (Older sources routinely called this the cumulative distribution, but nowadays the adjective "cumulative" is being dropped more and more often.)

It would be much more annoying if they used the same symbol ##f## for different mass/density functions, but perhaps with subscripts and other modifiers. It often more convenient use something like ##b(k)## for the binomial probability ##P(X_{\text{binomial}}=k)##, something like ##po(k)## for ##P(X_{\text{poisson}}= k),## etc. This adds to communication clarity----as long as the source (books, notes, or whatever) makes sure to first define the terms before using them. If they do use the notation without definition or explanation then that, indeed, would be annoying.

Back when I was teaching this material I tried to use symbols like ##f(x)## etc, for density functions of continuous random variables (except when using ##\phi(z)## for the density function of the standard normal variable ##Z \sim N(0,1)##, and things like ##p(k)## for the probability mass function of a discrete random variable. Unfortunately that left something like ##F(x)## or ##F(k)## for both.

One final remark about notation: do not use ##X## as an argument of ##F## or ##f##, because it will hardly ever mean what you think it does. It is important to distinguish between the random variable ##X## (upper case) and its possible values ##x## (lower case). If ##F## is the (cumulative) distribution of ##X##, then for any distribution at all the quantity ##F(X)## is a continuous random variable with density uniform on ##(0,1)##, so unless you mean that you should not write it.
 
Ray Vickson said:
They should NOT use ##f## routinely for the probability mass function of a discrete random variable; it should be used primarily for the probability density function of a continuous random variable. However, ##F## is often used for the distribution function for both types. (Older sources routinely called this the cumulative distribution, but nowadays the adjective "cumulative" is being dropped more and more often.)

It would be much more annoying if they used the same symbol ##f## for different mass/density functions, but perhaps with subscripts and other modifiers. It often more convenient use something like ##b(k)## for the binomial probability ##P(X_{\text{binomial}}=k)##, something like ##po(k)## for ##P(X_{\text{poisson}}= k),## etc. This adds to communication clarity----as long as the source (books, notes, or whatever) makes sure to first define the terms before using them. If they do use the notation without definition or explanation then that, indeed, would be annoying.

Back when I was teaching this material I tried to use symbols like ##f(x)## etc, for density functions of continuous random variables (except when using ##\phi(z)## for the density function of the standard normal variable ##Z \sim N(0,1)##, and things like ##p(k)## for the probability mass function of a discrete random variable. Unfortunately that left something like ##F(x)## or ##F(k)## for both.

One final remark about notation: do not use ##X## as an argument of ##F## or ##f##, because it will hardly ever mean what you think it does. It is important to distinguish between the random variable ##X## (upper case) and its possible values ##x## (lower case). If ##F## is the (cumulative) distribution of ##X##, then for any distribution at all the quantity ##F(X)## is a continuous random variable with density uniform on ##(0,1)##, so unless you mean that you should not write it.

Got it.
 
Back
Top