# Variance Question

• Gridvvk
In summary, the conversation was discussing how to find the variance of Y^p for any p > 0, where Y is defined as { X if the coin toss is heads and 1 if the coin toss is tails. Suggestions were given to find the conditional distribution of Y given C, use the joint distribution over all possibilities, and determine the probability distribution of Y. It was also noted that the approach may involve mixing integrals and sums together.

#### Gridvvk

X ~ standard uniform random variable
We toss a coin randomly and define
Y := { X if the coin toss is heads
...{ 1 is the coin toss is tails

Question wants the Var(Y^p) for any p > 0.

My work:
Var(Y^p) = E(Y^(2p)) - E(Y^p)^2

I'm not sure how to go about finding E(Y^p) and E(Y^(2p)). I thought of using moment generating functions, but the preferred method is supposed to utilize conditional probabilities. Any hint on how to compute E(Y^p) would help.

Thanks

Hey Gridvvk.

If your coin is Bernoulli = C (for coin) then your Y random variable is defined by:

Y = C + (1-C)X where C = 0 corresponds to heads and C = 1 corresponds to tails. You could also switch the values around to give you:

Y = (1-C) + CX.

I would suggest that you find the conditional distribution of Y given C first. To do this you have:

P(Y=1|C=1) = 1, P(Y=y|C=0) = 1, P(C=0) = P(C=1) = 0.5. From this you obtain the joint distribution and get:

P(A=a|B=b) = P(A=a,B=b)/P(B=b) which implies P(A=a,B=b) = P(A=a|B=b)*P(B=b).

You then use the joint distribution over all possibilities to get the moments and thus the Var[Y^p].

Note that you will be mixing integrals and sums together.

• 1 person
chiro said:
P(Y=y|C=0) = 1
I'm not at all sure what you mean by that. Rather than work with P(Y|C), how about going straight to E(Yp|C)?

Last edited:
Gridvvk said:
X ~ standard uniform random variable
We toss a coin randomly and define
Y := { X if the coin toss is heads
...{ 1 is the coin toss is tails

Question wants the Var(Y^p) for any p > 0.

My work:
Var(Y^p) = E(Y^(2p)) - E(Y^p)^2

I'm not sure how to go about finding E(Y^p) and E(Y^(2p)). I thought of using moment generating functions, but the preferred method is supposed to utilize conditional probabilities. Any hint on how to compute E(Y^p) would help.

Thanks

Besides the other suggestions, you could do it by determining the probability distribution of Y. For example, it is not hard to get ##F(y) = P\{Y \leq y\}## for y ≥ 0, and from that get ##F_p(z) = P\{ Y^p \leq z \}## for z ≥ 0. Since ##Z = Y^p \geq 0## you can get its expected value by using the more-or-less standard expression
$$EZ = \int_0^{\infty} P\{ Z > z \} \, dz$$
Similarly, you can get ##E Y^{2p}##.

haruspex said:
I'm not at all sure what you mean by that. Rather than work with P(Y|C), how about going straight to E(Yp|C)?

It has a uniform distribution over [0,1] which is P(X=x) = 1 where x is in [0,1].

Thanks for all the hints and quick replies. I saw them right away, but didn't fully know how to proceed, so I thought I'd come back look at it later and figure it out, but I'm drawing a blank.

chiro said:
Hey Gridvvk.

If your coin is Bernoulli = C (for coin) then your Y random variable is defined by:

Y = C + (1-C)X where C = 0 corresponds to heads and C = 1 corresponds to tails. You could also switch the values around to give you:

Y = (1-C) + CX.

I would suggest that you find the conditional distribution of Y given C first. To do this you have:

P(Y=1|C=1) = 1, P(Y=y|C=0) = 1, P(C=0) = P(C=1) = 0.5. From this you obtain the joint distribution and get:

P(A=a|B=b) = P(A=a,B=b)/P(B=b) which implies P(A=a,B=b) = P(A=a|B=b)*P(B=b).

You then use the joint distribution over all possibilities to get the moments and thus the Var[Y^p].

Note that you will be mixing integrals and sums together.

I really liked the idea letting Y = C + (1-C)X and C = 0 corresponds to heads and C = 1 corresponds to tails. But I wasn't sure on how that exactly translates to getting the the joint distribution since Y is defined piecewise as a mixed random variable.

Instead I tried finding E[Y] = 1/2 * E[X] + 1/2 * 1 = 1/2 * 1/2 + 1/2 * 1 = 3/4.
I thought E[Y^2] = 1/2 * E[X^2] + 1/2 * 1 = 1/2 * 1/3 + 1/2 * 1 = 2/3

But this seems E[Y^2] - E[Y]^2 < 0, which cannot happen, so I'm going about it the wrong way.

Ray Vickson said:
Besides the other suggestions, you could do it by determining the probability distribution of Y. For example, it is not hard to get ##F(y) = P\{Y \leq y\}## for y ≥ 0, and from that get ##F_p(z) = P\{ Y^p \leq z \}## for z ≥ 0. Since ##Z = Y^p \geq 0## you can get its expected value by using the more-or-less standard expression
$$EZ = \int_0^{\infty} P\{ Z > z \} \, dz$$
Similarly, you can get ##E Y^{2p}##.

To determine the CDF don't I need the PDF of Y? Let's suppose I do get it, how can one go from ##F(y) = P\{Y \leq y\}## to ##F_p(z) = P\{ Y^p \leq z \}## ?

chiro said:
It has a uniform distribution over [0,1] which is P(X=x) = 1 where x is in [0,1].
Only if you redefine P(X=x) to mean dP[X<x]/dx.

This is just a conditional distribution where C = 1: what do you find wrong with this?

Gridvvk said:
Instead I tried finding E[Y] = 1/2 * E[X] + 1/2 * 1 = 1/2 * 1/2 + 1/2 * 1 = 3/4.
I thought E[Y^2] = 1/2 * E[X^2] + 1/2 * 1 = 1/2 * 1/3 + 1/2 * 1 = 2/3

But this seems E[Y^2] - E[Y]^2 < 0, which cannot happen
I think you've checked E[Y^2] - E[Y] instead of E[Y^2] - E[Y]^2.

chiro said:
This is just a conditional distribution where C = 1: what do you find wrong with this?
X has a uniform distribution, so continuous. The probability that it takes any specific value is zero.

Gridvvk said:
Thanks for all the hints and quick replies. I saw them right away, but didn't fully know how to proceed, so I thought I'd come back look at it later and figure it out, but I'm drawing a blank.

I really liked the idea letting Y = C + (1-C)X and C = 0 corresponds to heads and C = 1 corresponds to tails. But I wasn't sure on how that exactly translates to getting the the joint distribution since Y is defined piecewise as a mixed random variable.

Instead I tried finding E[Y] = 1/2 * E[X] + 1/2 * 1 = 1/2 * 1/2 + 1/2 * 1 = 3/4.
I thought E[Y^2] = 1/2 * E[X^2] + 1/2 * 1 = 1/2 * 1/3 + 1/2 * 1 = 2/3

But this seems E[Y^2] - E[Y]^2 < 0, which cannot happen, so I'm going about it the wrong way.

To determine the CDF don't I need the PDF of Y? Let's suppose I do get it, how can one go from ##F(y) = P\{Y \leq y\}## to ##F_p(z) = P\{ Y^p \leq z \}## ?

If I tell you that ##Y^p \leq z## what can you tell me about Y?

haruspex said:
I think you've checked E[Y^2] - E[Y] instead of E[Y^2] - E[Y]^2.

Oh, that is true, but when I try to generalize the same method for E[Y^p] I experience some unexpected results. But then again, I didn't use any conditional probabilities this way.

Var[Y^p] = E[Y^(2p)] - E[Y^p]^2

E[Y^p] = (1/2) * E[X^p] + 1/2 * 1 = 1/2(1 / (p + 1)) + 1/2 = (p + 2) / (2p + 2)
Where I used the power-rule for integration to get E[X^p]

Similarly, E[Y^(2p)] = (1/2) * E[X^(2p)] + (1/2) * 1 = 1/2(1 / (2p + 1)) + 1/2 = (p + 1) / (2p + 1)

Var[Y^p] = (p + 1) / (2p + 1) - [(p + 2) / (2p + 2)]^2 = p^2(2p + 3) / [4(p + 1)^2(2p + 1)]

However, the limit of Var[Y^p] as p tends to infinity is 1/4, and not 0 as I thought it would be by the law of large numbers.

Ray Vickson said:
If I tell you that ##Y^p \leq z## what can you tell me about Y?

I'm not entirely sure, would it be ##Y \leq z## as well?

Gridvvk said:
However, the limit of Var[Y^p] as p tends to infinity is 1/4, and not 0 as I thought it would be by the law of large numbers.
In that limit, Y^p is equally likely 1 or 0. Sounds like a var of 1/4 to me.

• 1 person
haruspex said:
In that limit, Y^p is equally likely 1 or 0. Sounds like a var of 1/4 to me.

Yes that does make sense. Just to clarify, the reason this doesn't violate the law of large numbers is because the law doesn't necessarily say that any variance dies out?

Does that mean the method I utilized is correct, even though I did not rely on any conditional probabilities?

Gridvvk said:
Oh, that is true, but when I try to generalize the same method for E[Y^p] I experience some unexpected results. But then again, I didn't use any conditional probabilities this way.

Var[Y^p] = E[Y^(2p)] - E[Y^p]^2

E[Y^p] = (1/2) * E[X^p] + 1/2 * 1 = 1/2(1 / (p + 1)) + 1/2 = (p + 2) / (2p + 2)
Where I used the power-rule for integration to get E[X^p]

Similarly, E[Y^(2p)] = (1/2) * E[X^(2p)] + (1/2) * 1 = 1/2(1 / (2p + 1)) + 1/2 = (p + 1) / (2p + 1)

Var[Y^p] = (p + 1) / (2p + 1) - [(p + 2) / (2p + 2)]^2 = p^2(2p + 3) / [4(p + 1)^2(2p + 1)]

However, the limit of Var[Y^p] as p tends to infinity is 1/4, and not 0 as I thought it would be by the law of large numbers.

I'm not entirely sure, would it be ##Y \leq z## as well?

NO! Think about it. If ##Y \leq 1/2## do you honestly believe that ##Y^{10}## can be as large as 1/2, or does it have to be a lot less than 1/2? Conversely, it you know that ##Y^{10} \leq 1/2## do you really think that Y cannot be larger than 1/2?

Ray Vickson said:
NO! Think about it. If ##Y \leq 1/2## do you honestly believe that ##Y^{10}## can be as large as 1/2, or does it have to be a lot less than 1/2? Conversely, it you know that ##Y^{10} \leq 1/2## do you really think that Y cannot be larger than 1/2?

I thought you were referring to a random variable, Y, so I wasn't sure if the that properties was true for higher moments, but yes if you keep taking higher powers of a number that is in (0,1) you approach 0. Conversely, if Y^p is bounded by a z in (0,1) then the bigger that p gets, Y approaches 1.

I fail to see the connection this has with the problem though.

Gridvvk said:
I thought you were referring to a random variable, Y, so I wasn't sure if the that properties was true for higher moments, but yes if you keep taking higher powers of a number that is in (0,1) you approach 0. Conversely, if Y^p is bounded by a z in (0,1) then the bigger that p gets, Y approaches 1.

I fail to see the connection this has with the problem though.

For ##Z = Y^p## you can get a simple, explicit formula for ##P(Z \leq z)## and can use that to get explicit expressions for ##EZ## and ##\text{Var}\,Z##. In other words, if you know the probability distribution of a random variable you can use that to get the mean and variance, etc.

Ray Vickson said:
For ##Z = Y^p## you can get a simple, explicit formula for ##P(Z \leq z)## and can use that to get explicit expressions for ##EZ## and ##\text{Var}\,Z##. In other words, if you know the probability distribution of a random variable you can use that to get the mean and variance, etc.

Var[Z] = E[Z^2] - E[Z]^2

Doesn't this correspond directly to what I did without resorting to making the substitution ##Z = Y^p##.

Var[Y^p] = E[Y^(2p)] - E[Y^p]^2

E[Y^p] = (1/2) * E[X^p] + 1/2 * 1 = 1/2(1 / (p + 1)) + 1/2 = (p + 2) / (2p + 2)
Where I used the power-rule for integration to get E[X^p]

Similarly, E[Y^(2p)] = (1/2) * E[X^(2p)] + (1/2) * 1 = 1/2(1 / (2p + 1)) + 1/2 = (p + 1) / (2p + 1)

Var[Y^p] = (p + 1) / (2p + 1) - [(p + 2) / (2p + 2)]^2 = p^2(2p + 3) / [4(p + 1)^2(2p + 1)]

Unless it is the case that my work/answer was wrong, or that by letting Z = Y^p, you streamline the thinking and present in a better manner.

Gridvvk said:
Var[Z] = E[Z^2] - E[Z]^2

Doesn't this correspond directly to what I did without resorting to making the substitution ##Z = Y^p##.

Var[Y^p] = E[Y^(2p)] - E[Y^p]^2

E[Y^p] = (1/2) * E[X^p] + 1/2 * 1 = 1/2(1 / (p + 1)) + 1/2 = (p + 2) / (2p + 2)
Where I used the power-rule for integration to get E[X^p]

Similarly, E[Y^(2p)] = (1/2) * E[X^(2p)] + (1/2) * 1 = 1/2(1 / (2p + 1)) + 1/2 = (p + 1) / (2p + 1)

Var[Y^p] = (p + 1) / (2p + 1) - [(p + 2) / (2p + 2)]^2 = p^2(2p + 3) / [4(p + 1)^2(2p + 1)]

Unless it is the case that my work/answer was wrong, or that by letting Z = Y^p, you streamline the thinking and present in a better manner.

No problem. These are just two (slightly different) ways of doing the same problem. If you prefer one way, go for it.

• 1 person
haruspex said:
X has a uniform distribution, so continuous. The probability that it takes any specific value is zero.

You are way too anal.

Stating P(X=x) = 1 for x in [0,1] is a very standard way of describing a probability density function.

You don't need to bring up measure theory: it's entirely un-necessary for this problem.

chiro said:
Stating P(X=x) = 1 for x in [0,1] is a very standard way of describing a probability density function.
Never seen that done before. I'm used to forms like fX(x) for continuous pdfs. That's why I couldn't understand what you'd written.

In the books I've used and seen, typically the way that the probability function is interpreted is based on the domain.

You are right in that if its on a continuous space like the real line, the probability of a single value is zero (and the proofs are done in theoretical probability and measure theory). However it is implicitly understood that the density function corresponds to P(X=x) which takes a value at some value x (possibly a real number).

The density function is denoted as P(X=x) regardless of measure and domain (again in the books I've used) but the probabilistic interpretation will depend on the nature of the random variable.

I can understand that in rigorous treatments, the above might be seen as "hand-wavy", but in your normal A-level books, this is done quite a lot.

chiro said:
In the books I've used and seen, typically the way that the probability function is interpreted is based on the domain.

You are right in that if its on a continuous space like the real line, the probability of a single value is zero (and the proofs are done in theoretical probability and measure theory). However it is implicitly understood that the density function corresponds to P(X=x) which takes a value at some value x (possibly a real number).

The density function is denoted as P(X=x) regardless of measure and domain (again in the books I've used) but the probabilistic interpretation will depend on the nature of the random variable.

I can understand that in rigorous treatments, the above might be seen as "hand-wavy", but in your normal A-level books, this is done quite a lot.

I have seen many books on probability and have never seen the notation you use in the context of continuous random variables---only for discrete random variables. Of course, there are dozens and dozens of probability books I have not seen, so maybe that notation appears in some of them. What are the titles/authors of the books you cite, so we can know to stay away from them? There are very good reasons to be concerned about that notation, as it will---guranteed-- confuse students and cause some of them to musunderstand the material.

BTW: I think it is inappropriate of you to refer to another poster as "anal"; it borders on abuse.

I apologize for the anal comment.

I should have clarified that most of my education comes from internal lecture notes as opposed to books (although I do use many books as supplements).

In any case you are right about the notation in the books, which leaves me to look at my notes for any instances of abused notation.

For the OP and any other readers, please disregard my post regarding equating probabilities in continuous random variables with the density function.