Derivative of log of normal distribution

In summary, the author is confused about how to differentiate a function. He has two functions that are supposed to be the same, but he can't differentiate between them because they have different replacements for their derivatives.
  • #1
perplexabot
Gold Member
329
5
Hey all,
I've had this point of confusion for a bit and I have thought that with time I may be able to clear it out myself. Nope, hasn't happened. I think I need help.

Let us say we have the following
[tex] \phi_{k+1}=\phi_{k}+v_k [/tex] where, [itex]v_k\overset{iid}{\sim}\mathcal{N}(0,\sigma^2)[/itex] and [itex]\phi_{k+1}[/itex] be a scalar.
Let us find the following first two conditional moments
[tex]
\begin{equation*}
\begin{split}
E[\phi_{k+1}|\phi_k] &= \phi_k \\
cov[\phi_{k+1}|\phi_k] &= E[(\phi_{k+1}-\phi_k)(\phi_{k+1}-\phi_k)^T] = E[v_k^2] = \sigma^2
\end{split}
\end{equation*}
[/tex] Where we know [itex]p(\phi_{k+1}|\phi_k)[/itex] is a normal distribution, finding [itex]log[p(\phi_{k+1}|\phi_k)][/itex]
[tex]
log[p(\phi_{k+1}|\phi_k)] = \frac{-1}{2\sigma^2}(\phi_{k+1}-\phi_{k})^2
[/tex]

I need to find the derivative (with respect to [itex]\phi_k[/itex]) of [itex]log[p(\phi_{k+1}|\phi_k)][/itex].
Finally, my question... When I find the derivative of this quantity, do I need to substitute for [itex]\phi_{k+1}[/itex] such that
[tex]
log[p(\phi_{k+1}|\phi_k)] = \frac{-1}{2\sigma^2}((\phi_k+v_k)-\phi_k)^2 = \frac{-1}{2\sigma^2}(v_k)^2
[/tex]
This will end up giving me zero if I take the derivative with respect to [itex]\phi_k[/itex] (right? or is this just telling me that I can substitute [itex]v_k[/itex] for [itex]\phi_{k+1}-\phi_k[/itex]... i have no clue... if i do that, then i am back to where i started and i will forever be stuck in a loop of substition, LOL).

On the other hand, I could NOT do that substitution and up with a non zero answer. But that is kind of weird if I do that (not substituting), I am basically disregarding the fact that [itex]\phi_{k+1}[/itex] is a function of [itex]\phi_k[/itex]

I find this really confusing. What is the correct way to do this? Please help me clear this problem out as it has been an issue for a while : /

Thank you for reading.
 
Physics news on Phys.org
  • #2
Re the confusion: what you are facing is the classic problem of lack of formality when referring to derivatives. A derivative, whether partial or total, is a transformation that takes one function and gives another function. So we need to be absolutely clear what the function is before we start talking about derivatives.

Here is one function:
$$g:\mathbb R\to\mathbb R\textrm{ such that }g(x)=\log f_{\phi_{k+1}}(\phi_{k+1}|\phi_k=x)$$
and here is another
$$h:\mathbb R\to\mathbb R\textrm{ such that }h(x)=\log f_{\phi_{k+1}}(x+v_k|\phi_k=x)$$
The question is, which function do you want to differentiate with respect to ##x##? The statement 'the derivative (with respect to ##\phi_k##) of ##\log p(\phi_{k+1}|\phi_k)## is ambiguous and could mean either, buecause it does not specify a function. One doesn't differentiate quantities - notwithstanding that many sloppily-written texts imply that we do. One only ever differentiates functions.
 
  • Like
Likes perplexabot
  • #3
NOTE: If you don't want to read this whole post, just read "EDIT2," it contains the questions and remarks I have concluded. Everything else is just my train of thought.
Hmmm, are those functions not the same, given that [itex]\phi_{k+1}=\phi_{k}+v_k[/itex] with ##v_k\overset{iid}{\sim}\mathcal{N}(0,\sigma^2)##? I would have assumed that taking the derivative with respect to ##x## for both ##g## and ##h## would result in the same answer (since we have define the relation just stated).
Also, could [itex]h[/itex] be written as?:
[tex]
h:\mathbb R\to\mathbb R\textrm{ such that }h(x)=\log f_{\phi_{k+1}}(\phi_k+v_k|\phi_k=x)
[/tex]

andrewkirk said:
The question is, which function do you want to differentiate with respect to ##x##? The statement 'the derivative (with respect to ##\phi_k##) of ##\log p(\phi_{k+1}|\phi_k)## is ambiguous and could mean either, buecause it does not specify a function. One doesn't differentiate quantities - notwithstanding that many sloppily-written texts imply that we do. One only ever differentiates functions.
I can't answer this question because I see those two functions you defined to be the same. I believe both PDFs, ## f_{\phi_{k+1}}(\phi_{k+1}|\phi_k=x)## and ##f_{\phi_{k+1}}(x+v_k|\phi_k=x)##, will have the same means and variances, right? In the end I need to derive the function that uses the mean and variance I calculated in my initial post.

EDIT1: If I say it is the function ##g## that I need to derive (wrt ##x##), would it be legitimate to write ##\phi_{k+1}=x+v_k## ?
EDIT2: I think I know what I am getting confused. When I take ##E[\phi_{k+1}|\phi_k]##, I make the following substitution in order to find the expected value, ##E[\phi_k+v_k|\phi_k]##, which I think is correct. I am dealing with a random variable and I use the random variable function (##\phi_{k+1}=\phi_{k}+v_k##) to calculate the expectation. When I try to find the derivative of the log of the pdf, I am no longer dealing with random variables (but instances of the random variables) and so I am no longer allowed to make that substitution, right? If I am right about this, I have a follow up question, if I may.
 
Last edited:
  • #4
Yes I think you are correct, if I have interpreted EDIT2 correctly.

Something that I find helps when there is danger of getting confused between random and non-random variables is to write the random variables out fully, as functions ##\phi## and ##v## from ##\mathbb N\times \Omega## to ##\mathbb R##, where ##\Omega## is the sample space. So ##\phi_k## is the random variable that is the function ##\omega\mapsto \phi(k,\omega)## and ##v_k## is the random variable that is the function ##\omega\mapsto v(k,\omega)##. Since these items are functions, rather than values, we can differentiate them but we cannot differentiate with respect to them - leaving aside fancy concepts like Radon-Nikodyn derivatives, that we don't want to get into here.

The expected value ##E[\phi_{k+1}|\phi_k]## is also a random variable, and so, when written out formally, also has an ##\omega## argument so that an instantiated value is actually ##E[\phi_{k+1}|\phi_k](\omega)##. Hence, if we want to differentiate it we need to be careful about what it is that we are differentiating.

Anyway, rather than me rabbit on further, ask your follow-up question if it is not yet answered, and we'll see what can be made of it.
 
  • Like
Likes perplexabot
  • #5
Ok, since I am right in saying (along with your help in this older thread) that the PDF which describes the random variable, is itself (the pdf) not random but made up of "ordinary" variables, then after taking the derivative we will once again end up with a function that does NOT contain random variables. Is this correct? If this is correct then, in the case of the fisher information matrix, given by:
[tex]E[(\frac{\partial}{\partial\theta}logf(X;\theta))^2][/tex]
how is it that the expectation here has any meaning or point? Since we just said (or rather, I just said) that the function achieved after taking the derivative does not contain any random variables? Isn't the argument of the expectation considered to be a "constant" or a non random variable?

andrewkirk said:
Anyway, rather than me rabbit on further, ask your follow-up question if it is not yet answered, and we'll see what can be made of it.
Please rabbit on as much as you please, this so called "rabbiting on" has benefited me very much. : p Thank you!
 
Last edited:
  • #6
@perplexabot Here is how I would try to make sense of that one. I will use ##H## instead of ##\theta## in order to make it easier for me to use my beloved convention of using capital letters for RVs and lower case for non-RVs.
First I would make the following definitions:
$$F:\mathbb R^2\to[0,1]\textrm{ such that } F(x,h)\equiv Prob(X(\omega)\leq x\ |\ H(\omega)=h)$$
$$f:\mathbb R^2\to[0,1]\textrm{ such that } f(x,h)\equiv \partial_x F(x,h)$$
and we define a stochastic process ##W:\mathbb R\times\Omega\to\mathbb R## (with associated RVs ##W_h##) such that:
$$W(h,\omega)\equiv W_h(\omega)\equiv \partial_h\log f(X(\omega),h)$$
This is well-defined because, while ##X## is a RV, ##X(\omega)## is not.
Then we can write the conditional Fisher expectation as a stochastic process ##Y:\mathbb R\times\Omega\to\mathbb R## (with associated RVs ##Y_h##) such that:
$$Y(h,\omega)\equiv Y_h(\omega)=E\left[W_h\ |\ H(\omega)=h\right]$$

If we want an unconditional expectation we define RV ##U:\Omega\to\mathbb R## by ##U(\omega)=W(H(\omega),\omega)##. The unconditional expectation is then simply ##E\left[U\right]## which, because it is unconditional, is not a RV.

I think that's it, although, because there are several definitions, it's not unlikely that I messed up somewhere.
 
  • Like
Likes perplexabot
  • #7
andrewkirk said:
@perplexabot Here is how I would try to make sense of that one. I will use ##H## instead of ##\theta## in order to make it easier for me to use my beloved convention of using capital letters for RVs and lower case for non-RVs.
First I would make the following definitions:
$$F:\mathbb R^2\to[0,1]\textrm{ such that } F(x,h)\equiv Prob(X(\omega)\leq x\ |\ H(\omega)=h)$$
$$f:\mathbb R^2\to[0,1]\textrm{ such that } f(x,h)\equiv \partial_x F(x,h)$$
and we define a stochastic process ##W:\mathbb R\times\Omega\to\mathbb R## (with associated RVs ##W_h##) such that:
$$W(h,\omega)\equiv W_h(\omega)\equiv \partial_h\log f(X(\omega),h)$$
This is well-defined because, while ##X## is a RV, ##X(\omega)## is not.
Then we can write the conditional Fisher expectation as a stochastic process ##Y:\mathbb R\times\Omega\to\mathbb R## (with associated RVs ##Y_h##) such that:
$$Y(h,\omega)\equiv Y_h(\omega)=E\left[W_h\ |\ H(\omega)=h\right]$$

If we want an unconditional expectation we define RV ##U:\Omega\to\mathbb R## by ##U(\omega)=W(H(\omega),\omega)##. The unconditional expectation is then simply ##E\left[U\right]## which, because it is unconditional, is not a RV.

I think that's it, although, because there are several definitions, it's not unlikely that I messed up somewhere.
I have been trying to digest this notation that you have bestowed upon me. I can see its benefit and am willing to put some time to understand it better. I would like to know if you have some references for me that may help me better understand it?

I am very thankful for your help : )

EDIT: By "notation" I mean that of probability spaces (and random variables as functions of events)
 
  • #8
The approach is that of Kolmogorov who, I believe, was the first to put the study of probability on a firm theoretical footing - in the 1930s. It bases probability theory on measure theory. There are some good texts on this, which maybe others can suggest. I don't have a text because at first I learned probability theory informally and then picked up the Kolmogorov approach later when I was having trouble understanding continuous stochastic processes in finance. The Kolmogorov approach, which I learned by reading pieces off the web in my case, was what finally enabled me to break through the confusion. Until then, the idea of a continuous stochastic process seemed to me to be impossible nonsense.

A good place to start is the wiki page on probability spaces. The notion of sigma algebra is important, so you may want to visit the page on that too, although it goes off into various complications that are not essential to probability theory, so you might like to use the probability space page as a base and only go to the other as needed. If there are concepts you get stuck on feel free to ask here.
 
  • Like
Likes perplexabot
  • #9
andrewkirk said:
The approach is that of Kolmogorov who, I believe, was the first to put the study of probability on a firm theoretical footing - in the 1930s. It bases probability theory on measure theory. There are some good texts on this, which maybe others can suggest. I don't have a text because at first I learned probability theory informally and then picked up the Kolmogorov approach later when I was having trouble understanding continuous stochastic processes in finance. The Kolmogorov approach, which I learned by reading pieces off the web in my case, was what finally enabled me to break through the confusion. Until then, the idea of a continuous stochastic process seemed to me to be impossible nonsense.

A good place to start is the wiki page on probability spaces. The notion of sigma algebra is important, so you may want to visit the page on that too, although it goes off into various complications that are not essential to probability theory, so you might like to use the probability space page as a base and only go to the other as needed. If there are concepts you get stuck on feel free to ask here.
Thank you very much. I will continue to read on probability spaces and touch up up sigma algebra. I am really thankful for your time and help. Thank you for offering more help on this topic (I will take you up on your offer : p).
 
  • #10

1. What is the derivative of log of normal distribution?

The derivative of the log of normal distribution is the probability density function of the normal distribution, which is represented by the bell-shaped curve. It is given by the formula:
f(x) = (1/σ√2π)e^-(x-μ)^2/2σ^2, where μ is the mean and σ is the standard deviation.

2. Why is the derivative of log of normal distribution important?

The derivative of log of normal distribution is important because it helps in understanding the behavior of the normal distribution and its relationship with other statistical concepts such as mean, variance, and standard deviation. It is also used in various statistical and machine learning models to analyze and make predictions based on normal distribution data.

3. How is the derivative of log of normal distribution calculated?

The derivative of log of normal distribution can be calculated using basic calculus principles. We first take the natural log of the normal distribution formula and then apply the chain rule to find the derivative. The resulting derivative formula is the probability density function of the normal distribution.

4. Can the derivative of log of normal distribution be negative?

Yes, the derivative of log of normal distribution can be negative. This means that the probability density function of the normal distribution is decreasing at that particular point. However, since the normal distribution is a continuous probability distribution, the derivative can also be positive or zero at different points.

5. What is the relationship between the derivative of log of normal distribution and the standard deviation?

The derivative of log of normal distribution is directly proportional to the standard deviation. This means that as the standard deviation increases, the derivative also increases, and vice versa. This relationship can be seen in the formula for the probability density function, where the standard deviation is represented by σ. A larger standard deviation implies a wider and flatter curve, and a smaller standard deviation implies a narrower and taller curve.

Similar threads

  • Linear and Abstract Algebra
Replies
2
Views
1K
  • Special and General Relativity
Replies
4
Views
290
  • Advanced Physics Homework Help
Replies
8
Views
834
  • Advanced Physics Homework Help
Replies
4
Views
1K
  • Advanced Physics Homework Help
Replies
1
Views
1K
  • Advanced Physics Homework Help
Replies
1
Views
654
Replies
5
Views
1K
  • Differential Equations
Replies
5
Views
2K
  • Special and General Relativity
Replies
1
Views
828
Replies
2
Views
1K
Back
Top