Conditional expectation and covariance of function of RVs

perplexabot · Jun 26, 2016

Hey all, I have been doing some math lately where I need to find the conditional expectation of a function of random variables. I also at some point need to find a derivative with respect to the variable that has been conditioned. I am not sure of my work and would appreciate it if you guys can maybe have a look.

Let us assume you have the following function (an AR(1) process): [itex]z_{k+1} = x_{k+1}z_k + \epsilon_{k+1}[/itex], where
[itex]x_{k+1}=x_k + v_{k+1}[/itex],
[itex]v_{k+1}\sim\mathcal{N}(0,\sigma_v^2)[/itex],
[itex]\epsilon_{k+1}\sim\mathcal{N}(0,\sigma_{\epsilon}^2)[/itex] and [itex]\epsilon[/itex] is iid.

So first, here is the conditional expectation of [itex]z_{k+1}[/itex] given [itex]x_{k+1}[/itex] and [itex]z_k[/itex]
[itex]E[z_{k+1}|x_{k+1},z_k]=x_{k+1}z_k = \mu[/itex] (is this correct?)
Here is the conditional covariance of [itex]z_{k+1}[/itex] given [itex]x_{k+1}[/itex] and [itex]z_k[/itex]
[itex] \begin{equation} \begin{split} cov(z_{k+1}|x_{k+1},z_k) &= E[(z_{k+1}-\mu)(z_{k+1}-\mu)^T|x_{k+1},z_k] \\ &= E[z_{k+1}^2|x_{k+1},z_k]-E[z_{k+1}\mu|x_{k+1},z_k]-E[\mu z_{k+1}|x_{k+1},z_k]+E[\mu^2|x_{k+1},z_k] \\ &= E[z_{k+1}^2|x_{k+1},z_k]-\mu^2-\mu^2+\mu^2 \\ &= E[z_{k+1}^2|x_{k+1},z_k]-\mu^2 \\ &= E[(x_{k+1}z_k)^2 + 2x_{k+1}z_k\epsilon_{k+1}+\epsilon^2|x_{k+1},z_k] - \mu^2 \\ &= \mu^2 + \sigma^2 - \mu^2 \\ & = \sigma^2 \end{split} \end{equation}[/itex]
(is this correct?)

As for my derivative of conditioned RV, I think I will wait for your opinion/feedback on the above before I ask.

Any help and or comments will be greatly appreciated.
Thank you for reading : )

EDIT: Fixed an error

andrewkirk · Jun 27, 2016

perplexabot said:

[itex]E[z_{k+1}|x_{k+1},z_k]=x_{k+1}z_k = \mu[/itex] (is this correct?)

Not necessarily, because ##\epsilon_{k+1}## may be affected by ##z+{k+1}## or ##z_k##. A sufficient condition for it to be correct would be if, in addition to what you have specified, ##\epsilon_{k+1}##is independent of ##v_j## for ##1\leq j\leq k+1##. The iid condition only tells us about dependencies within the sequence of ##\epsilon##s, not between elements of the sequence and other random variables.

Here is the conditional covariance of [itex]z_{k+1}[/itex] given [itex]x_{k+1}[/itex] and [itex]z_k[/itex]

It is a variance, not a covariance. This is only a matter of naming and does not affect the calculations.

[itex] \begin{equation} \begin{split} &= E[(x_{k+1}z_k)^2 + 2x_{k+1}z_k\epsilon_{k+1}+\epsilon^2|x_{k+1},z_k] - \mu^2 \\ &= \mu^2 + \sigma^2 - \mu^2 \\ \end{split} \end{equation}[/itex]

This step also relies on the above additional assumption that the ##\epsilon##s are independent of the ##v##s.

In summary, by adding the extra assumption about independence, and replacing 'covariance' by 'variance', it will become sound.

perplexabot · Jun 27, 2016

andrewkirk said:

Not necessarily, because ##\epsilon_{k+1}## may be affected by ##z+{k+1}## or ##z_k##. A sufficient condition for it to be correct would be if, in addition to what you have specified, ##\epsilon_{k+1}##is independent of ##v_j## for ##1\leq j\leq k+1##. The iid condition only tells us about dependencies within the sequence of ##\epsilon##s, not between elements of the sequence and other random variables.

It is a variance, not a covariance. This is only a matter of naming and does not affect the calculations.

This step also relies on the above additional assumption that the ##\epsilon##s are independent of the ##v##s.

In summary, by adding the extra assumption about independence, and replacing 'covariance' by 'variance', it will become sound.

Thank you very much! I now realize why [itex]v_{k+1}[/itex] and [itex]\epsilon_{k+1}[/itex] must be independent and that I must include this in my assumptions. : ) That's great help!

Now if I may ask my follow up question about differentiating with respect to a conditioned variable. Let [itex]f(x,y)[/itex] be a function of random variables [itex]x[/itex] and [itex]y[/itex]. Let us say that the [itex]E[f(x,y)|y]=y^2[/itex]. What is the derivative of [itex]E[f(x,y)|y][/itex] with respect to random variable [itex]y[/itex]? In other words, what is the answer to [itex]\frac{\partial E[f(x,y)|y]}{\partial y}[/itex]?

Is it simply, [itex]\frac{\partial y^2}{\partial y} = 2y[/itex]?

I ask this question because I am a little confused about [itex]y[/itex] after conditioning. I know that when you condition the random variable, it is setting it to some possible value of its distribution ([itex]E[f(x,y)|y]=E[f(x,y)|y=y_i][/itex], where [itex]y_i[/itex] is some instance of random variable [itex]y[/itex]). So is [itex]y[/itex] still considered when differentiating with respect to it after conditioning it (woops, that is definitely badly worded, sorry), or is it treated as a constant so that [itex]\frac{\partial y_i^2}{\partial y} = 0[/itex]? You may just disregard this paragraph : /

Thanks again.

andrewkirk · Jun 27, 2016

perplexabot said:

Is it simply, [itex]\frac{\partial y^2}{\partial y} = 2y[/itex]?

Yes it is.
I find these things appear clearer if one is fairly formal with one's symbolism, for instance using capital letters for random variables and lower case for items that are not random. Then the expression you wrote as ##E[f(x,y)|y] =y^2## is written as:
$$E\left[f(X,Y)\ |\ Y=y\right]=y^2$$
Since ##y## is an ordinary old non-random variable, we are free to differentiate both sides with respect to it, to get
$$\frac\partial{\partial y}E\left[f(X,Y)\ |\ Y=y\right]=\frac\partial{\partial y}y^2=2y$$

Stephen Tashi · Jun 27, 2016

perplexabot said:

Let us assume you have the following function (an AR(1) process): [itex]z_{k+1} = x_{k+1}z_k + \epsilon_{k+1}[/itex], where
[itex]x_{k+1}=x_k + v_{k+1}[/itex],
[itex]v_{k+1}\sim\mathcal{N}(0,\sigma_v^2)[/itex],
[itex]\epsilon_{k+1}\sim\mathcal{N}(0,\sigma_{\epsilon}^2)[/itex] and [itex]\epsilon[/itex] is iid.

That model has a "multiplicative shock" due to ##v_k##, so should you call it an AR(1) process ?

So first, here is the conditional expectation of [itex]z_{k+1}[/itex] given [itex]x_{k+1}[/itex] and [itex]z_k[/itex]
[itex]E[z_{k+1}|x_{k+1},z_k]=x_{k+1}z_k = \mu[/itex] (is this correct?)

Does your notation imply that ##\mu## is a number that can be different value for (say) k = 5 than it is for k = 6 ?

perplexabot · Jun 27, 2016

andrewkirk said:

Yes it is.
I find these things appear clearer if one is fairly formal with one's symbolism, for instance using capital letters for random variables and lower case for items that are not random. Then the expression you wrote as ##E[f(x,y)|y] =y^2## is written as:
$$E\left[f(X,Y)\ |\ Y=y\right]=y^2$$
Since ##y## is an ordinary old non-random variable, we are free to differentiate both sides with respect to it, to get
$$\frac\partial{\partial y}E\left[f(X,Y)\ |\ Y=y\right]=\frac\partial{\partial y}y^2=2y$$

Thank you! That makes more sense now. Does one ever differentiate with respect to the random variable (using the notation you provided, [itex]\frac{\partial}{\partial Y}[/itex] rather than [itex]\frac{\partial}{\partial y}[/itex])? I am thinking of cases such as the fisher information matrix.

Stephen Tashi said:

That model has a "multiplicative shock" due to ##v_k##, so should you call it an AR(1) process ?
Does your notation imply that ##\mu## is a number that can be different value for (say) k = 5 than it is for k = 6 ?

Hmmm, I see your point, so you are saying since it has a multiplicative shock component then maybe it is a ARMA process? I'm not sure what to call it. I still think it is an AR process.

I have not assumed stationarity so I guess [itex]\mu[/itex] should be [itex]\mu_{k+1}[/itex], right? I guess my work above only applies when [itex]|x_{k+1}|< 1[/itex].

Thank you!

Stephen Tashi · Jun 27, 2016

perplexabot said:

Hmmm, I see your point, so you are saying since it has a multiplicative shock component then maybe it is a ARMA process?

No, if it has a multiplicative shock it is not a "ARMA" process. I think the terminology "ARMA" is restricted to models where the current value of the process is a linear combination of past values plus an additive noise.

In searching the web, "multiplicative shock" produces many hits dealing with economic modeling, so people have ways of dealing with multiplicative shock in particular situations. However, I haven't found any math notes on stochastic modelling that give general procedures for fitting models with multiplicative shock. By contrast, the method of dealing with ARMA models is standardized ("Box-Jenkins").

I have not assumed stationarity so I guess [itex]\mu[/itex] should be [itex]\mu_{k+1}[/itex], right?

Yes.

perplexabot · Jun 27, 2016

Stephen Tashi said:

No, if it has a multiplicative shock it is not a "ARMA" process. I think the terminology "ARMA" is restricted to models where the current value of the process is a linear combination of past values plus an additive noise.

In searching the web, "multiplicative shock" produces many hits dealing with economic modeling, so people have ways of dealing with multiplicative shock in particular situations. However, I haven't found any math notes on stochastic modelling that give general procedures for fitting models with multiplicative shock. By contrast, the method of dealing with ARMA models is standardized ("Box-Jenkins").
Yes.

First, thank you for all the help and clarification.

ARMA, as you said, is where the current value of the process is a linear combination of past values plus an additive noise. BUT, if I am not mistaken, ARMA also depends on past shocks, or past noise values. This can be seen in wiki. I still think that the model above is an AR process. It is an AR process which has random variables as parameters. At least, that is how I am looking at it.

andrewkirk · Jun 27, 2016

perplexabot said:

Does one ever differentiate with respect to the random variable (using the notation you provided, [itex]\frac{\partial}{\partial Y}[/itex] rather than [itex]\frac{\partial}{\partial y}[/itex])? I am thinking of cases such as the fisher information matrix.

A real random variable is actually a function from the sample space ##\Omega## to the real numbers. One cannot perform differentiation, as understood in its 'vanilla' form, with respect to a function. There are extensions, such as in Calculus of Variations, where one can differentiate with respect to functions. There is also the Radon-Nikodym derivative, which is a generalisation of the usual notion of derivative, that is used in probability theory. The latter is important in the analysis of Wiener processes (Brownian Motions), which are closely related to your example. However, IIRC, the Radon-Nikodym derivative differentiates with respect to a measure, not a random variable.

I think most simple cases in probability theory can be expressed in terms of vanilla differentiation with respect to non-random variables.

Stephen Tashi · Jun 27, 2016

perplexabot said:

First, thank you for all the help and clarification.

ARMA, as you said, is where the current value of the process is a linear combination of past values plus an additive noise. BUT, if I am not mistaken, ARMA also depends on past shocks, or past noise values.

If you have multiplicative noise, a linear combination of past values gives a dependence on products of past noise values. In an ARMA process the expression for the current value doesn't involve products of past noise values.

perplexabot · Jun 27, 2016

andrewkirk said:

A real random variable is actually a function from the sample space ##\Omega## to the real numbers. One cannot perform differentiation, as understood in its 'vanilla' form, with respect to a function. There are extensions, such as in Calculus of Variations, where one can differentiate with respect to functions. There is also the Radon-Nikodym derivative, which is a generalisation of the usual notion of derivative, that is used in probability theory. The latter is important in the analysis of Wiener processes (Brownian Motions), which are closely related to your example. However, IIRC, the Radon-Nikodym derivative differentiates with respect to a measure, not a random variable.

I think most simple cases in probability theory can be expressed in terms of vanilla differentiation with respect to non-random variables.

Wow, that definitely expanded my general knowledge of derivatives. I have never even heard of these extensions you speak of. Very cool! Thank you : )

Stephen Tashi said:

If you have multiplicative noise, a linear combination of past values gives a dependence on products of past noise values. In an ARMA process the expression for the current value doesn't involve products of past noise values.

Once again thank you for directing me in the right direction. Your help is much appreciated.

Thank you all.

perplexabot · Jun 27, 2016

I have one more question if I may. Let us use the following equation (the same one from above): [itex]x_{k+1}=x_k + v_{k+1}[/itex], where once again, [itex]v_{k+1}\sim\mathcal{N}(0,\sigma_v^2)[/itex]. It can be shown that
[itex]E[x_{k+1}|x_k]=x_k[/itex]
and that
[itex]cov(x_{k+1}|x_k)=\sigma_v^2[/itex].
Now, substituting and setting up the normal distribution, we get:
[itex]p(x_{k+1}|x_k)=\frac{1}{\sqrt{2\sigma_v^2\pi}}e^{\frac{(x_{k+1}-x_k)^2}{2\sigma_v^2}}[/itex]

Now for my question, can you do the following substitution:
[itex]p(x_{k+1}|x_k)=\frac{1}{\sqrt{2\sigma_v^2\pi}}e^{\frac{((x_k+v_{k+1})-x_k)^2}{2\sigma_v^2}}=\frac{1}{\sqrt{2\sigma_v^2\pi}}e^{\frac{(v_{k+1})^2}{2\sigma_v^2}}[/itex] ?

Surely that substitution (or the following subtraction) is wrong. Why is it wrong? I have a feeling that the subtraction of the two [itex]x_k[/itex] does not apply because one is a random variable while the other is a conditioned version of the random variable. Is that what is wrong?

Thank you.

andrewkirk · Jun 28, 2016

@perplexabot Again, resorting to formalism can help us to clarify what is otherwise a fairly confusing move.

What you have written as ##p(x_{k+1}|x_k)## is actually
$$p_{X_{k+1}|(X_k=x_k)}(x_{k+1})$$
where ##p_{X_{k+1}|(X_k=x_k)}## is a function from ##\mathbb R## to ##\mathbb R## that is the conditional probability density function of the random variable ##X_{k+1}##, conditioned on the information that ##X_k=x_k##. Here ##x_k,x_{k+1}## are ordinary, non-random variables. Take careful note of which items are upper and which are lower case. It's important.

So we have
$$p_{X_{k+1}|(X_k=x_k)}(x_{k+1})=\frac{1}{\sqrt{2\sigma_v^2\pi}}e^{\frac{(x_{k+1}-x_k)^2}{2\sigma_v^2}}$$
This is an ordinary old non-random equation, with the only bit that relates to random variables at all being that they were used to identify the function ##p_{X_{k+1}|(X_k=x_k)}##. But having identified that function, it is now a perfectly ordinary, non-random, function.

Hence, if we define ##v_{k+1}## to be ##x_{k+1}-x_k## it follows that:
$$p_{X_{k+1}|(X_k=x_k)}(x_{k+1})=\frac{1}{\sqrt{2\sigma_v^2\pi}}e^{\frac{v_{k+1}{}^2}{2\sigma_v^2}}$$

Stephen Tashi · Jun 28, 2016

perplexabot said:

I have a feeling that the subtraction of the two [itex]x_k[/itex] does not apply because one is a random variable while the other is a conditioned version of the random variable. Is that what is wrong?

To add to what andrewkirk wrote, there is a saying:

Random variables are not random and they are not variables.

A "random variable" is defined by a distribution on a probability space, not by a single value. So if ##X## is a random variable, it has a distribution and this distribution is not "randomly" changing from one distribution to another.

Furthermore, if ##X## denotes a "random variable" then it is not a "variable" in the ordinary sense of the word - i.e. it is not a symbol that represents a single (but perhaps unknown) number.

A probability density function is not "a function of a random variable". It is a function of ordinary variables that represent numbers. The interpretation of the value of a probability density function can be related to an event in the probability space of the associated random variable.

For a random variable ##X## we can define a "function of a random variable" like ## Y = X + e^{(5+X)}##. Defining ##Y## in this manner makes ##Y## another random variable - and not a function of ordinary variables.

It would be fair to say that we use the notation for a function of ordinary variables ( ## y = f(x) = x + e^{(5+x)} ##) when we give the definition of ##Y##. However if when you define an ordinary function ##f(x)##, you don't automatically get that there is some other function associated with ##f(x)## that is a "distribution" of ##f(x)##. When you define a function ##f(X)## of a random variable ##X##, you do automatically know that ##f(X)## has some distribution.

To enforce those distinctions, some people prefer to denote random variables by capital letters and possible realizations of those variables by lower case letters. So instead of ##p( x) ## they write ##p(X=x)##.

This isn't a perfect scheme because continuous probability density functions have values that are probability densities - not probabilities. (It analogous to the difference between the physical units of kilograms (which is a mass) vs the physical units of kilograms per meter ( which is a linear mass density).

For example, if you use ##p(...)## to denote "the probability of" then its tempting to write ##p(X = x) =\frac{1}{\sqrt{2\sigma^2\pi}} \ e^{ \frac{x^2}{2\sigma^2}}## However, the normal density function evaluated at ##x## doesn't give you the probability that ##X## is exactly equal to ##x##. Instead, the density function gives the probability density at ##x##. (By analogy, a physical object cannot have a mass of 5 kg at a point, but it can have a mass density of 5 kg/ meter at a point.)

Using the notation convention in andrewkirk's post, we write ##p_Y(x)## to denote the probability density of the random variable ##Y## evaluated at the value given by the ordinary variable ##x##.

So ##p_{X_{k+1}|(X_k=x_k)}(x_{k+1})## denotes the probability density of the random variable "##X_{k+1}| (X_k=x_k)##" evaluated at the the ordinary variable denoted by ##x_k##. And to completely appreciate that notation, you have to understand that "conditioning" is another way to define a new random variable in terms of other random variables.

Not all texts use such precise notation.

perplexabot · Jun 28, 2016

Wow! Just WOW! I cannot explain how much both your last posts have helped me. You have cleared up some serious amount of confusion I have been carrying for a while. I am very thankful for your time. Tremendously valuable information!

Thank you andrewkirk!
Thank you Stephen Tashi!

Conditional expectation and covariance of function of RVs

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad The countability paradox of computable numbers

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect