Derivative of log of normal distribution

Click For Summary

Discussion Overview

The discussion revolves around the derivative of the logarithm of the probability density function (PDF) of a normal distribution, specifically in the context of a stochastic process defined by the equation \(\phi_{k+1}=\phi_{k}+v_k\), where \(v_k\) follows a normal distribution. Participants explore the implications of substituting variables and the nature of differentiation in this probabilistic framework.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant expresses confusion about whether to substitute \(\phi_{k+1}\) with \(\phi_k + v_k\) when taking the derivative of \(\log[p(\phi_{k+1}|\phi_k)]\), noting that this substitution leads to a zero derivative.
  • Another participant emphasizes the importance of clearly defining the function being differentiated, suggesting that the ambiguity in the original statement complicates the differentiation process.
  • A later reply questions whether the functions defined by the participants are equivalent, given the relationship \(\phi_{k+1}=\phi_{k}+v_k\), and whether both PDFs share the same means and variances.
  • One participant reflects on the distinction between random variables and their instantiated values, suggesting that this distinction is crucial when differentiating functions related to probability distributions.
  • Another participant raises a question about the meaning of expectations in the context of the Fisher information matrix, arguing that the function resulting from differentiation does not contain random variables.
  • A final post introduces a stochastic process to clarify the relationship between random variables and their derivatives, proposing a framework for understanding conditional expectations in this context.

Areas of Agreement / Disagreement

Participants exhibit a mix of agreement and disagreement, particularly regarding the treatment of random versus non-random variables in differentiation and the implications of substitutions in the context of derivatives. The discussion remains unresolved, with multiple competing views on the correct approach to the problem.

Contextual Notes

Participants note the potential for confusion when transitioning between random variables and their instantiated values, which may affect the differentiation process. The discussion also highlights the need for clarity in defining functions before differentiation.

perplexabot
Gold Member
Messages
328
Reaction score
5
Hey all,
I've had this point of confusion for a bit and I have thought that with time I may be able to clear it out myself. Nope, hasn't happened. I think I need help.

Let us say we have the following
\phi_{k+1}=\phi_{k}+v_k where, v_k\overset{iid}{\sim}\mathcal{N}(0,\sigma^2) and \phi_{k+1} be a scalar.
Let us find the following first two conditional moments
<br /> \begin{equation*}<br /> \begin{split}<br /> E[\phi_{k+1}|\phi_k] &amp;= \phi_k \\<br /> cov[\phi_{k+1}|\phi_k] &amp;= E[(\phi_{k+1}-\phi_k)(\phi_{k+1}-\phi_k)^T] = E[v_k^2] = \sigma^2 <br /> \end{split}<br /> \end{equation*}<br /> Where we know p(\phi_{k+1}|\phi_k) is a normal distribution, finding log[p(\phi_{k+1}|\phi_k)]
<br /> log[p(\phi_{k+1}|\phi_k)] = \frac{-1}{2\sigma^2}(\phi_{k+1}-\phi_{k})^2<br />

I need to find the derivative (with respect to \phi_k) of log[p(\phi_{k+1}|\phi_k)].
Finally, my question... When I find the derivative of this quantity, do I need to substitute for \phi_{k+1} such that
<br /> log[p(\phi_{k+1}|\phi_k)] = \frac{-1}{2\sigma^2}((\phi_k+v_k)-\phi_k)^2 = \frac{-1}{2\sigma^2}(v_k)^2<br />
This will end up giving me zero if I take the derivative with respect to \phi_k (right? or is this just telling me that I can substitute v_k for \phi_{k+1}-\phi_k... i have no clue... if i do that, then i am back to where i started and i will forever be stuck in a loop of substition, LOL).

On the other hand, I could NOT do that substitution and up with a non zero answer. But that is kind of weird if I do that (not substituting), I am basically disregarding the fact that \phi_{k+1} is a function of \phi_k

I find this really confusing. What is the correct way to do this? Please help me clear this problem out as it has been an issue for a while : /

Thank you for reading.
 
Physics news on Phys.org
Re the confusion: what you are facing is the classic problem of lack of formality when referring to derivatives. A derivative, whether partial or total, is a transformation that takes one function and gives another function. So we need to be absolutely clear what the function is before we start talking about derivatives.

Here is one function:
$$g:\mathbb R\to\mathbb R\textrm{ such that }g(x)=\log f_{\phi_{k+1}}(\phi_{k+1}|\phi_k=x)$$
and here is another
$$h:\mathbb R\to\mathbb R\textrm{ such that }h(x)=\log f_{\phi_{k+1}}(x+v_k|\phi_k=x)$$
The question is, which function do you want to differentiate with respect to ##x##? The statement 'the derivative (with respect to ##\phi_k##) of ##\log p(\phi_{k+1}|\phi_k)## is ambiguous and could mean either, buecause it does not specify a function. One doesn't differentiate quantities - notwithstanding that many sloppily-written texts imply that we do. One only ever differentiates functions.
 
  • Like
Likes   Reactions: perplexabot
NOTE: If you don't want to read this whole post, just read "EDIT2," it contains the questions and remarks I have concluded. Everything else is just my train of thought.
Hmmm, are those functions not the same, given that \phi_{k+1}=\phi_{k}+v_k with ##v_k\overset{iid}{\sim}\mathcal{N}(0,\sigma^2)##? I would have assumed that taking the derivative with respect to ##x## for both ##g## and ##h## would result in the same answer (since we have define the relation just stated).
Also, could h be written as?:
<br /> h:\mathbb R\to\mathbb R\textrm{ such that }h(x)=\log f_{\phi_{k+1}}(\phi_k+v_k|\phi_k=x)<br />

andrewkirk said:
The question is, which function do you want to differentiate with respect to ##x##? The statement 'the derivative (with respect to ##\phi_k##) of ##\log p(\phi_{k+1}|\phi_k)## is ambiguous and could mean either, buecause it does not specify a function. One doesn't differentiate quantities - notwithstanding that many sloppily-written texts imply that we do. One only ever differentiates functions.
I can't answer this question because I see those two functions you defined to be the same. I believe both PDFs, ## f_{\phi_{k+1}}(\phi_{k+1}|\phi_k=x)## and ##f_{\phi_{k+1}}(x+v_k|\phi_k=x)##, will have the same means and variances, right? In the end I need to derive the function that uses the mean and variance I calculated in my initial post.

EDIT1: If I say it is the function ##g## that I need to derive (wrt ##x##), would it be legitimate to write ##\phi_{k+1}=x+v_k## ?
EDIT2: I think I know what I am getting confused. When I take ##E[\phi_{k+1}|\phi_k]##, I make the following substitution in order to find the expected value, ##E[\phi_k+v_k|\phi_k]##, which I think is correct. I am dealing with a random variable and I use the random variable function (##\phi_{k+1}=\phi_{k}+v_k##) to calculate the expectation. When I try to find the derivative of the log of the pdf, I am no longer dealing with random variables (but instances of the random variables) and so I am no longer allowed to make that substitution, right? If I am right about this, I have a follow up question, if I may.
 
Last edited:
Yes I think you are correct, if I have interpreted EDIT2 correctly.

Something that I find helps when there is danger of getting confused between random and non-random variables is to write the random variables out fully, as functions ##\phi## and ##v## from ##\mathbb N\times \Omega## to ##\mathbb R##, where ##\Omega## is the sample space. So ##\phi_k## is the random variable that is the function ##\omega\mapsto \phi(k,\omega)## and ##v_k## is the random variable that is the function ##\omega\mapsto v(k,\omega)##. Since these items are functions, rather than values, we can differentiate them but we cannot differentiate with respect to them - leaving aside fancy concepts like Radon-Nikodyn derivatives, that we don't want to get into here.

The expected value ##E[\phi_{k+1}|\phi_k]## is also a random variable, and so, when written out formally, also has an ##\omega## argument so that an instantiated value is actually ##E[\phi_{k+1}|\phi_k](\omega)##. Hence, if we want to differentiate it we need to be careful about what it is that we are differentiating.

Anyway, rather than me rabbit on further, ask your follow-up question if it is not yet answered, and we'll see what can be made of it.
 
  • Like
Likes   Reactions: perplexabot
Ok, since I am right in saying (along with your help in this older thread) that the PDF which describes the random variable, is itself (the pdf) not random but made up of "ordinary" variables, then after taking the derivative we will once again end up with a function that does NOT contain random variables. Is this correct? If this is correct then, in the case of the fisher information matrix, given by:
E[(\frac{\partial}{\partial\theta}logf(X;\theta))^2]
how is it that the expectation here has any meaning or point? Since we just said (or rather, I just said) that the function achieved after taking the derivative does not contain any random variables? Isn't the argument of the expectation considered to be a "constant" or a non random variable?

andrewkirk said:
Anyway, rather than me rabbit on further, ask your follow-up question if it is not yet answered, and we'll see what can be made of it.
Please rabbit on as much as you please, this so called "rabbiting on" has benefited me very much. : p Thank you!
 
Last edited:
@perplexabot Here is how I would try to make sense of that one. I will use ##H## instead of ##\theta## in order to make it easier for me to use my beloved convention of using capital letters for RVs and lower case for non-RVs.
First I would make the following definitions:
$$F:\mathbb R^2\to[0,1]\textrm{ such that } F(x,h)\equiv Prob(X(\omega)\leq x\ |\ H(\omega)=h)$$
$$f:\mathbb R^2\to[0,1]\textrm{ such that } f(x,h)\equiv \partial_x F(x,h)$$
and we define a stochastic process ##W:\mathbb R\times\Omega\to\mathbb R## (with associated RVs ##W_h##) such that:
$$W(h,\omega)\equiv W_h(\omega)\equiv \partial_h\log f(X(\omega),h)$$
This is well-defined because, while ##X## is a RV, ##X(\omega)## is not.
Then we can write the conditional Fisher expectation as a stochastic process ##Y:\mathbb R\times\Omega\to\mathbb R## (with associated RVs ##Y_h##) such that:
$$Y(h,\omega)\equiv Y_h(\omega)=E\left[W_h\ |\ H(\omega)=h\right]$$

If we want an unconditional expectation we define RV ##U:\Omega\to\mathbb R## by ##U(\omega)=W(H(\omega),\omega)##. The unconditional expectation is then simply ##E\left[U\right]## which, because it is unconditional, is not a RV.

I think that's it, although, because there are several definitions, it's not unlikely that I messed up somewhere.
 
  • Like
Likes   Reactions: perplexabot
andrewkirk said:
@perplexabot Here is how I would try to make sense of that one. I will use ##H## instead of ##\theta## in order to make it easier for me to use my beloved convention of using capital letters for RVs and lower case for non-RVs.
First I would make the following definitions:
$$F:\mathbb R^2\to[0,1]\textrm{ such that } F(x,h)\equiv Prob(X(\omega)\leq x\ |\ H(\omega)=h)$$
$$f:\mathbb R^2\to[0,1]\textrm{ such that } f(x,h)\equiv \partial_x F(x,h)$$
and we define a stochastic process ##W:\mathbb R\times\Omega\to\mathbb R## (with associated RVs ##W_h##) such that:
$$W(h,\omega)\equiv W_h(\omega)\equiv \partial_h\log f(X(\omega),h)$$
This is well-defined because, while ##X## is a RV, ##X(\omega)## is not.
Then we can write the conditional Fisher expectation as a stochastic process ##Y:\mathbb R\times\Omega\to\mathbb R## (with associated RVs ##Y_h##) such that:
$$Y(h,\omega)\equiv Y_h(\omega)=E\left[W_h\ |\ H(\omega)=h\right]$$

If we want an unconditional expectation we define RV ##U:\Omega\to\mathbb R## by ##U(\omega)=W(H(\omega),\omega)##. The unconditional expectation is then simply ##E\left[U\right]## which, because it is unconditional, is not a RV.

I think that's it, although, because there are several definitions, it's not unlikely that I messed up somewhere.
I have been trying to digest this notation that you have bestowed upon me. I can see its benefit and am willing to put some time to understand it better. I would like to know if you have some references for me that may help me better understand it?

I am very thankful for your help : )

EDIT: By "notation" I mean that of probability spaces (and random variables as functions of events)
 
The approach is that of Kolmogorov who, I believe, was the first to put the study of probability on a firm theoretical footing - in the 1930s. It bases probability theory on measure theory. There are some good texts on this, which maybe others can suggest. I don't have a text because at first I learned probability theory informally and then picked up the Kolmogorov approach later when I was having trouble understanding continuous stochastic processes in finance. The Kolmogorov approach, which I learned by reading pieces off the web in my case, was what finally enabled me to break through the confusion. Until then, the idea of a continuous stochastic process seemed to me to be impossible nonsense.

A good place to start is the wiki page on probability spaces. The notion of sigma algebra is important, so you may want to visit the page on that too, although it goes off into various complications that are not essential to probability theory, so you might like to use the probability space page as a base and only go to the other as needed. If there are concepts you get stuck on feel free to ask here.
 
  • Like
Likes   Reactions: perplexabot
andrewkirk said:
The approach is that of Kolmogorov who, I believe, was the first to put the study of probability on a firm theoretical footing - in the 1930s. It bases probability theory on measure theory. There are some good texts on this, which maybe others can suggest. I don't have a text because at first I learned probability theory informally and then picked up the Kolmogorov approach later when I was having trouble understanding continuous stochastic processes in finance. The Kolmogorov approach, which I learned by reading pieces off the web in my case, was what finally enabled me to break through the confusion. Until then, the idea of a continuous stochastic process seemed to me to be impossible nonsense.

A good place to start is the wiki page on probability spaces. The notion of sigma algebra is important, so you may want to visit the page on that too, although it goes off into various complications that are not essential to probability theory, so you might like to use the probability space page as a base and only go to the other as needed. If there are concepts you get stuck on feel free to ask here.
Thank you very much. I will continue to read on probability spaces and touch up up sigma algebra. I am really thankful for your time and help. Thank you for offering more help on this topic (I will take you up on your offer : p).
 
  • #10

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K