# A Derivative of log of normal distribution

1. Aug 2, 2016

### perplexabot

Hey all,
I've had this point of confusion for a bit and I have thought that with time I may be able to clear it out myself. Nope, hasn't happened. I think I need help.

Let us say we have the following
$$\phi_{k+1}=\phi_{k}+v_k$$ where, $v_k\overset{iid}{\sim}\mathcal{N}(0,\sigma^2)$ and $\phi_{k+1}$ be a scalar.
Let us find the following first two conditional moments
$$\begin{equation*} \begin{split} E[\phi_{k+1}|\phi_k] &= \phi_k \\ cov[\phi_{k+1}|\phi_k] &= E[(\phi_{k+1}-\phi_k)(\phi_{k+1}-\phi_k)^T] = E[v_k^2] = \sigma^2 \end{split} \end{equation*}$$ Where we know $p(\phi_{k+1}|\phi_k)$ is a normal distribution, finding $log[p(\phi_{k+1}|\phi_k)]$
$$log[p(\phi_{k+1}|\phi_k)] = \frac{-1}{2\sigma^2}(\phi_{k+1}-\phi_{k})^2$$

I need to find the derivative (with respect to $\phi_k$) of $log[p(\phi_{k+1}|\phi_k)]$.
Finally, my question... When I find the derivative of this quantity, do I need to substitute for $\phi_{k+1}$ such that
$$log[p(\phi_{k+1}|\phi_k)] = \frac{-1}{2\sigma^2}((\phi_k+v_k)-\phi_k)^2 = \frac{-1}{2\sigma^2}(v_k)^2$$
This will end up giving me zero if I take the derivative with respect to $\phi_k$ (right? or is this just telling me that I can substitute $v_k$ for $\phi_{k+1}-\phi_k$... i have no clue... if i do that, then i am back to where i started and i will forever be stuck in a loop of substition, LOL).

On the other hand, I could NOT do that substitution and up with a non zero answer. But that is kind of weird if I do that (not substituting), I am basically disregarding the fact that $\phi_{k+1}$ is a function of $\phi_k$

I find this really confusing. What is the correct way to do this? Please help me clear this problem out as it has been an issue for a while : /

2. Aug 2, 2016

### andrewkirk

Re the confusion: what you are facing is the classic problem of lack of formality when referring to derivatives. A derivative, whether partial or total, is a transformation that takes one function and gives another function. So we need to be absolutely clear what the function is before we start talking about derivatives.

Here is one function:
$$g:\mathbb R\to\mathbb R\textrm{ such that }g(x)=\log f_{\phi_{k+1}}(\phi_{k+1}|\phi_k=x)$$
and here is another
$$h:\mathbb R\to\mathbb R\textrm{ such that }h(x)=\log f_{\phi_{k+1}}(x+v_k|\phi_k=x)$$
The question is, which function do you want to differentiate with respect to $x$? The statement 'the derivative (with respect to $\phi_k$) of $\log p(\phi_{k+1}|\phi_k)$ is ambiguous and could mean either, buecause it does not specify a function. One doesn't differentiate quantities - notwithstanding that many sloppily-written texts imply that we do. One only ever differentiates functions.

3. Aug 3, 2016

### perplexabot

NOTE: If you don't want to read this whole post, just read "EDIT2," it contains the questions and remarks I have concluded. Everything else is just my train of thought.
Hmmm, are those functions not the same, given that $\phi_{k+1}=\phi_{k}+v_k$ with $v_k\overset{iid}{\sim}\mathcal{N}(0,\sigma^2)$? I would have assumed that taking the derivative with respect to $x$ for both $g$ and $h$ would result in the same answer (since we have define the relation just stated).
Also, could $h$ be written as?:
$$h:\mathbb R\to\mathbb R\textrm{ such that }h(x)=\log f_{\phi_{k+1}}(\phi_k+v_k|\phi_k=x)$$

I can't answer this question because I see those two functions you defined to be the same. I believe both PDFs, $f_{\phi_{k+1}}(\phi_{k+1}|\phi_k=x)$ and $f_{\phi_{k+1}}(x+v_k|\phi_k=x)$, will have the same means and variances, right? In the end I need to derive the function that uses the mean and variance I calculated in my initial post.

EDIT1: If I say it is the function $g$ that I need to derive (wrt $x$), would it be legitimate to write $\phi_{k+1}=x+v_k$ ?
EDIT2: I think I know what I am getting confused. When I take $E[\phi_{k+1}|\phi_k]$, I make the following substitution in order to find the expected value, $E[\phi_k+v_k|\phi_k]$, which I think is correct. I am dealing with a random variable and I use the random variable function ($\phi_{k+1}=\phi_{k}+v_k$) to calculate the expectation. When I try to find the derivative of the log of the pdf, I am no longer dealing with random variables (but instances of the random variables) and so I am no longer allowed to make that substitution, right? If I am right about this, I have a follow up question, if I may.

Last edited: Aug 3, 2016
4. Aug 3, 2016

### andrewkirk

Yes I think you are correct, if I have interpreted EDIT2 correctly.

Something that I find helps when there is danger of getting confused between random and non-random variables is to write the random variables out fully, as functions $\phi$ and $v$ from $\mathbb N\times \Omega$ to $\mathbb R$, where $\Omega$ is the sample space. So $\phi_k$ is the random variable that is the function $\omega\mapsto \phi(k,\omega)$ and $v_k$ is the random variable that is the function $\omega\mapsto v(k,\omega)$. Since these items are functions, rather than values, we can differentiate them but we cannot differentiate with respect to them - leaving aside fancy concepts like Radon-Nikodyn derivatives, that we don't want to get into here.

The expected value $E[\phi_{k+1}|\phi_k]$ is also a random variable, and so, when written out formally, also has an $\omega$ argument so that an instantiated value is actually $E[\phi_{k+1}|\phi_k](\omega)$. Hence, if we want to differentiate it we need to be careful about what it is that we are differentiating.

Anyway, rather than me rabbit on further, ask your follow-up question if it is not yet answered, and we'll see what can be made of it.

5. Aug 3, 2016

### perplexabot

Ok, since I am right in saying (along with your help in this older thread) that the PDF which describes the random variable, is itself (the pdf) not random but made up of "ordinary" variables, then after taking the derivative we will once again end up with a function that does NOT contain random variables. Is this correct? If this is correct then, in the case of the fisher information matrix, given by:
$$E[(\frac{\partial}{\partial\theta}logf(X;\theta))^2]$$
how is it that the expectation here has any meaning or point? Since we just said (or rather, I just said) that the function achieved after taking the derivative does not contain any random variables? Isn't the argument of the expectation considered to be a "constant" or a non random variable?

Please rabbit on as much as you please, this so called "rabbiting on" has benefited me very much. : p Thank you!

Last edited: Aug 3, 2016
6. Aug 3, 2016

### andrewkirk

@perplexabot Here is how I would try to make sense of that one. I will use $H$ instead of $\theta$ in order to make it easier for me to use my beloved convention of using capital letters for RVs and lower case for non-RVs.
First I would make the following definitions:
$$F:\mathbb R^2\to[0,1]\textrm{ such that } F(x,h)\equiv Prob(X(\omega)\leq x\ |\ H(\omega)=h)$$
$$f:\mathbb R^2\to[0,1]\textrm{ such that } f(x,h)\equiv \partial_x F(x,h)$$
and we define a stochastic process $W:\mathbb R\times\Omega\to\mathbb R$ (with associated RVs $W_h$) such that:
$$W(h,\omega)\equiv W_h(\omega)\equiv \partial_h\log f(X(\omega),h)$$
This is well-defined because, while $X$ is a RV, $X(\omega)$ is not.
Then we can write the conditional Fisher expectation as a stochastic process $Y:\mathbb R\times\Omega\to\mathbb R$ (with associated RVs $Y_h$) such that:
$$Y(h,\omega)\equiv Y_h(\omega)=E\left[W_h\ |\ H(\omega)=h\right]$$

If we want an unconditional expectation we define RV $U:\Omega\to\mathbb R$ by $U(\omega)=W(H(\omega),\omega)$. The unconditional expectation is then simply $E\left[U\right]$ which, because it is unconditional, is not a RV.

I think that's it, although, because there are several definitions, it's not unlikely that I messed up somewhere.

7. Aug 5, 2016

### perplexabot

I have been trying to digest this notation that you have bestowed upon me. I can see its benefit and am willing to put some time to understand it better. I would like to know if you have some references for me that may help me better understand it?

I am very thankful for your help : )

EDIT: By "notation" I mean that of probability spaces (and random variables as functions of events)

8. Aug 5, 2016

### andrewkirk

The approach is that of Kolmogorov who, I believe, was the first to put the study of probability on a firm theoretical footing - in the 1930s. It bases probability theory on measure theory. There are some good texts on this, which maybe others can suggest. I don't have a text because at first I learned probability theory informally and then picked up the Kolmogorov approach later when I was having trouble understanding continuous stochastic processes in finance. The Kolmogorov approach, which I learned by reading pieces off the web in my case, was what finally enabled me to break through the confusion. Until then, the idea of a continuous stochastic process seemed to me to be impossible nonsense.

A good place to start is the wiki page on probability spaces. The notion of sigma algebra is important, so you may want to visit the page on that too, although it goes off into various complications that are not essential to probability theory, so you might like to use the probability space page as a base and only go to the other as needed. If there are concepts you get stuck on feel free to ask here.

9. Aug 6, 2016

### perplexabot

Thank you very much. I will continue to read on probability spaces and touch up up sigma algebra. I am really thankful for your time and help. Thank you for offering more help on this topic (I will take you up on your offer : p).

10. Aug 14, 2016