I Relation with Hessian and Log-likelihood

AI Thread Summary
The discussion centers on deriving a relationship between the Hessian of the log-likelihood function and its expected value, specifically aiming to prove the equation E[∂L/∂θ ∂L'/∂θ] = E[-∂²L/∂θ ∂θ'] using a log-likelihood defined as L = log(Πi f(xi)). The author initially misapplies integration over the observed data instead of the parameters, leading to confusion about the expectations involved. An important update clarifies that the expectation should be taken over the parameters θi and θj, suggesting a potential independence assumption for the joint distribution of parameters. The discussion highlights the need for proper justification in transitioning between equations and emphasizes the role of joint distributions in calculating expectations. The thread ultimately seeks clarity on the correct approach to proving the desired relationship.
fab13
Messages
300
Reaction score
7
TL;DR Summary
I would like to get a rigorous demonstration of relation (1) given at the beginning of my post, implyint the Hessian and the general exporession of log-Likelihood.
I would like to demonstrate the equation (1) below in the general form of the Log-likelihood :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

with the \log of Likelihood \mathcal{L} defined by \mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg) with x_{i} all experimental/observed values.

For the instant, if I start from the second derivative (left member of (1)), I can get :

##\dfrac{\partial \mathcal{L}}{\partial \theta_{i}} = \dfrac{\partial \log\big(\Pi_{k} f(x_{k})\big)}{\partial \theta_{i}} = \dfrac{\big(\partial \sum_{k} \log\,f(x_{k})\big)}{\partial \theta_{i}}
=\sum_{k} \dfrac{1}{f(x_{k})} \dfrac{\partial f(x_{k})}{\partial \theta_{i}}##

Now I have to compute : ##\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=\dfrac{\partial}{\partial \theta_j} \left(\sum_{k}\dfrac{1}{f(x_{k})}\,\dfrac{\partial f(x_{k})}{\partial \theta_{i}} \right)##
##= -\sum_{k} \bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{ \partial \theta_i \partial \theta_j}\bigg)##
##=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}
\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+
\dfrac{1}{f(x_{k})}
\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)##

As we compute an expectation on both sides, the second term is vanishing to zero under regularity conditions, i.e :##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}
\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)##

Second term can be expressed as :

##\int\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\text{d}x_k =\dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\int f(x_{k})\text{d}x_k=0##

since \int f(x_{k})\,\text{d}x_k = 1

Finally, I get the relation :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\,\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k##

##=\int \sum_{k}\,f(x_k) \bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k\quad\quad(3)##

But I don't know to make equation (3) equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

##=\int \sum_{k}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\sum_{l}\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k##

##=\int \sum_k f(x_k) \bigg(\dfrac{\partial \log(\Pi_{k}f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(\Pi_{l}f(x_{l})}{\partial \theta_{j}}\bigg)\,\text{d}x_k\quad\quad(4)##
##=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]##

I just want to prove the equality between (3) and (4) : where is my error ?

IMPORTANT UPDATE : I realized that I made an error for the calculation of the expectation into equation (2), when I write :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)##

Indeed, I have not to integrate on the parameter \text{d}x_{k} but rather on the variables (\theta_i, \theta_j).

But if I do this, to compute the expectation, it seems that I need the joint distribution (f(x_k) = f(x_k, \theta_i, \theta_j).

So I guess we can rewrite this joint distribution like if \theta_i and \theta_j were independent, i.e :

##f(x_k, \theta_i, \theta_j)= f_1(x_k, \theta_i)\quad f_2(x_k, \theta_j)##


This way, I could have :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\int \sum_{k} f_1(x_k,\theta_i)\quad f_2(x_k,\theta_j)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f_1(x_{k})}{\partial \theta_{j}}\dfrac{\partial f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i\theta_j##

instead of :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k##

So finally, I could obtain from equation (2) :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int \int f\sum_{k} f_1(x_k,\theta_i,\theta_j)\bigg(\dfrac{\partial \log(f_2(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j##

##=\int \int f_1(x_k,\theta_i) f_2(x_l,\theta_j) \bigg(\dfrac{\partial \sum_{k} \log(f_1(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \sum_l \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j\quad(5)##

##= \int\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j ##

##\int\int f_1(x_k, \theta_i, \theta_j) \bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j##

##=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]##

But I have difficulties to the step implying the 2 sums into equation(5): \sum_k and \sum_l that I introduce above without justification ? Moreover, are my calculations of expectation are correct (I mean integrate over \theta_i and \theta_j ?

If somecone could help me, this would be nice.

Regards
 
Physics news on Phys.org
fab13 said:
Indeed, I have not to integrate on the parameter \text{d}x_{k} but rather on the variables (\theta_i, \theta_j).
How can we integrate with respect to the ##\theta_i## without having a prior joint distribution for them?

The notes: https://mervyn.public.iastate.edu/stat580/Notes/s09mle.pdf indicate that the expectation is taken with respect to the variables representing the observations.
 
fab13 said:


I would like to demonstrate the equation (1) below in the general form of the Log-likelihood :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

with the \log of Likelihood \mathcal{L} defined by \mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg) with x_{i} all experimental/observed values.

For the instant, if I start from the second derivative (left member of (1)), I can get :

You mean "the right member" of 1

But I don't know to make equation (3) equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

One thought is that for ##k \ne l##, the random variables ##\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}} ## and ##\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j} }## are independent. So the expected value of their product is the product of their expected values. Is each expected value equal to zero?
 
@Stephen Tashi Yes, sorry, I meant the "right member" when I talk about the second derivative.

Don't forget that I have added an important UPDATE, you can see it since I have to integrate over ##\theta_i## and ##\theta_j## and not on ##x_k##.

Have you got a clue/track/suggestion ? Regards
 
fab13 said:
Don't forget that I have added an important UPDATE, you can see it since I have to integrate over ##\theta_i## and ##\theta_j## and not on ##x_k##.
I'm curious why you think eq. 1 is correct when you interpret it as asking for taking the expectation with respect to ##\theta_i, \theta_j##.
 
@Stephen Tashi

Do you mean that I made confusions between (##\theta,\theta'##) and (##\theta_i,\theta_j##) ? or maybe that equation (1) is false ?

If this is the case, which track do you suggest to conclude about this relation (1) ? As I said above, I don't know how to justify the passing from step (5) to the next step : there is something wrong in my attempt of demo and I don't know where it comes from.

Any help would be nice, I begin to despair ... Regards
 
fab13 said:
@Stephen Tashi

Do you mean that I made confusions between (##\theta,\theta'##) and (##\theta_i,\theta_j##) ? or maybe that equation (1) is false ?

I mean that equations like eq. 1 that I've seen (for the Fisher Information Matrix) say that the expectation is taken with respect to the observations ##x_i##. The expectations are not taken with respect to the parameters ##\theta_i##. Where did you see eq. 1?
fab13 said:
##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

? Is that notation equivalent to:
##E\Big[\frac{\partial \mathcal{L}}{\partial \theta_r} \frac{\partial \mathcal{L}}{\partial \theta_s}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta_r \partial \theta_s}\Big]\quad(1)##

If so, I doubt eq. 1 is always true when expectations are taken with respect to ##\theta_r, \theta_s##.

But I don't know to make equation (3) equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

I don't understand your notation for taking the expected value of a function. The expected value of ##w(x_1,x_2,...,x_n)## with respect to the joint density ##p(x_1,x_2,...x_n) = \Pi_{k} f(x_k) ## should be a multiple integral: ##\int \int ...\int p(x_1,x_2,...x_n) w(x_1,x_2,...x_n) dx_1 dx_2 ...dx_n ##

How did you get an expression involving only ##dx_k##?

It seems to me that the terms involved have a pattern like ##\int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_k) h(x_j) dx_1 dx_2 ...dx_n = ##
## \int \int f(x_k) w(x_k) f(x_j) h(x_j) dx_k dx_j = (\int f(x_k) w(x_k) dx_k) ( \int f(x_j) h(x_j) dx_j)##
 
@Stephen Tashi

Thanks for your answer. So from I understand, the quantity ##w(x_k)## would correspond to :

##w(x_{k,i})=\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)##

and if I assume that : ##f(x_1,x_2,.. x_n) = f(x_1) f(x_2) .. f(x_n)##, I get for example :

##E[w_i] = \int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_{k,i}) dx_1 dx_2 ...dx_n##

which can be applied by adding a second quantity ##h_{l,j}##.

However, I have 2 last requests : in my first post, I have calculated that

##\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+ \dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\quad(6)##

1) If I follow your reasoning, I should write for the first term of ##(6)##:

##E[\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}] = -\int \int ...\int f(x_1) f(x_2) .. f(x_n) \sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)\,dx_1 dx_2 ...dx_n##

I don't know how to deal with the summation ##\sum_{k}## on ##
\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)## terms, in order to convert it as :

##\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)##

?

Sorry if this is evident for some of you ...

2) Morevoer, to make vanish the second term of ##(6)##, I divide for each ##k-th## iteration of sum the total joint density probability by ##f(x_k)## : is it enough to justify that :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

##=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

The division by ##f(x_k)## is compensated with the multiplication by ##f(x_k)##, so this way, we keep the relation :

##\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1##

don't we ?

Is this reasoning correct ?

Regards
 
Last edited:
fab13 said:
I don't know how to deal with the summation ##\sum_{k}## on ##
\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)## terms, in order to convert it as :

##\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)##

?

To do that directly would involve introducing terms that are zero so that ##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))## where terms of the form ##h(r)w(s)## are zero except when ##r = s##.

It's more natural to begin with the left side of 1) where the pattern ## (\sum_r h(r))(\sum_s w(s))## appears and show that the terms in that expression are zero when ##r \ne s##.

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

For ##r \ne s##, ##E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})## since ##x_r, x_s## are independent random variables.

So the only nonzero terms on the left side of 1) are those of the above form when ##r = s##.

2) Morevoer, to make vanish the second term of ##(6)##, I divide for each ##k-th## iteration of sum the total joint density probability by ##f(x_k)## : is it enough to justify that :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

##=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

The division by ##f(x_k)## is compensated with the multiplication by ##f(x_k)##, so this way, we keep the relation :

##\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1##

don't we ?

Is this reasoning correct ?

Yes, I agree. However, it never hurts to check abstract arguments about summations by writing out a particular case such as ##n = 2##
 
Last edited:
  • #10
1)
Stephen Tashi said:
To do that directly would involve introducing terms that are zero so that ##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))## where terms of the form ##h(r)w(s)## are zero except when ##r = s##.

It's more natural to begin with the left side of 1) where the pattern ## (\sum_r h(r))(\sum_s w(s))## appears and show that the terms in that expression are zero when ##r \ne s##.

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

I don't understand the last part, i.e when you say :

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

Under which conditions have we got this expression ?

Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

i.e, ##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]## ?

The ideal would be to introduce a Kronecker symbol ##\delta_{rs}## to get this expression :

##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))\,\delta_{rs}##

But I don't know how to justify that ##h(r)\,w(s)=0## with ##r\neq s## or introduce a Kronecker symbol.

@Stephen Tashi : please if you could develop your reasoning, this woul be nice.

2)

On the other side, I understand well the relation :

##E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})##

since ##\dfrac{\partial log(f(x_r))}{\partial \theta_i}## and ##\dfrac{\partial log(f(x_s))}{\partial \theta_j}## are indepedant variables.

3)

Just a little remark : Why I make things complicated in this demo ? :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

I should have directly swap ##\sum_k## and ##E\bigg[...\bigg]##, this way, I could write directly :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\sum_k\,E\bigg[...\bigg] =\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

Regards
 
  • #11
fab13 said:
Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

i.e, ##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]## ?

Yes.

I suggest you take the case of two observations ##x_1, x_2## and write out the left hand side of 1).

fab13 said:
I should have directly swap ∑k\sum_k and E

Yes.

Apply that idea to the left hand side of 1). You will be taking the expectation in sums that involve terms like:

##E(\frac{\partial log(f(x_1))}{\partial \theta_i} \frac{\partial log(f(x_2))}{\partial \theta_j} )##
## = E(\frac{\partial log(f(x_1))}{\partial \theta_i}) E( \frac{\partial log(f(x_2))}{\partial \theta_j} )##
## = (0)(0) = 0 ##
 
  • #12
@Stephen Tashi

I think I have finally understood, maybe by writing :

Given assimilating ##X_k## and ##Y_l## to :

##X_k=\dfrac{\partial log(f(x_k))}{\partial \theta_i}##
##Y_l=\dfrac{\partial log(f(x_l))}{\partial \theta_j}##

(##E[X_r]=0## and ## E[Y_s]=0)\,\implies\,(E[\sum_k X_k]=0## and ##E[\sum_l Y_l]=0##) ##\implies (E[\sum_k\sum_l\,X_k Y_l]=\sum_k\sum_l\,E[ X_k Y_l]=\sum_k\,E[ X_k Y_k] = E[\sum_k X_k Y_k])##,

since ##E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]## (no more independence between ##X_k## and ##Y_k##).

Is this correct ?

thanks
 
Last edited:
  • Like
Likes Stephen Tashi
  • #13
fab13 said:
since ##E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]## (no more independence between ##X_k## and ##Y_k##).

Is this correct ?

Yes.
 
Back
Top