Relation with Hessian and Log-likelihood

fab13 · Mar 12, 2020

I would like to demonstrate the equation [itex](1)[/itex] below in the general form of the Log-likelihood :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

with the [itex]\log[/itex] of Likelihood [itex]\mathcal{L}[/itex] defined by [itex]\mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg)[/itex] with [itex]x_{i}[/itex] all experimental/observed values.

For the instant, if I start from the second derivative (left member of [itex](1)[/itex]), I can get :

##\dfrac{\partial \mathcal{L}}{\partial \theta_{i}} = \dfrac{\partial \log\big(\Pi_{k} f(x_{k})\big)}{\partial \theta_{i}} = \dfrac{\big(\partial \sum_{k} \log\,f(x_{k})\big)}{\partial \theta_{i}}
=\sum_{k} \dfrac{1}{f(x_{k})} \dfrac{\partial f(x_{k})}{\partial \theta_{i}}##

Now I have to compute : ##\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=\dfrac{\partial}{\partial \theta_j} \left(\sum_{k}\dfrac{1}{f(x_{k})}\,\dfrac{\partial f(x_{k})}{\partial \theta_{i}} \right)##
##= -\sum_{k} \bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{ \partial \theta_i \partial \theta_j}\bigg)##
##=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}
\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+
\dfrac{1}{f(x_{k})}
\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)##

As we compute an expectation on both sides, the second term is vanishing to zero under regularity conditions, i.e :##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}
\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)##

Second term can be expressed as :

##\int\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\text{d}x_k =\dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\int f(x_{k})\text{d}x_k=0##

since [itex]\int f(x_{k})\,\text{d}x_k = 1[/itex]

Finally, I get the relation :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\,\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k##

##=\int \sum_{k}\,f(x_k) \bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k\quad\quad(3)##

But I don't know to make equation [itex](3)[/itex] equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

##=\int \sum_{k}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\sum_{l}\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k##

##=\int \sum_k f(x_k) \bigg(\dfrac{\partial \log(\Pi_{k}f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(\Pi_{l}f(x_{l})}{\partial \theta_{j}}\bigg)\,\text{d}x_k\quad\quad(4)##
##=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]##

I just want to prove the equality between (3) and (4) : where is my error ?

IMPORTANT UPDATE : I realized that I made an error for the calculation of the expectation into equation [itex](2)[/itex], when I write :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)##

Indeed, I have not to integrate on the parameter [itex]\text{d}x_{k}[/itex] but rather on the variables [itex](\theta_i, \theta_j)[/itex].

But if I do this, to compute the expectation, it seems that I need the joint distribution [itex](f(x_k) = f(x_k, \theta_i, \theta_j)[/itex].

So I guess we can rewrite this joint distribution like if [itex]\theta_i[/itex] and [itex]\theta_j[/itex] were independent, i.e :

##f(x_k, \theta_i, \theta_j)= f_1(x_k, \theta_i)\quad f_2(x_k, \theta_j)##

This way, I could have :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\int \sum_{k} f_1(x_k,\theta_i)\quad f_2(x_k,\theta_j)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f_1(x_{k})}{\partial \theta_{j}}\dfrac{\partial f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i\theta_j##

instead of :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k##

So finally, I could obtain from equation (2) :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int \int f\sum_{k} f_1(x_k,\theta_i,\theta_j)\bigg(\dfrac{\partial \log(f_2(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j##

##=\int \int f_1(x_k,\theta_i) f_2(x_l,\theta_j) \bigg(\dfrac{\partial \sum_{k} \log(f_1(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \sum_l \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j\quad(5)##

##= \int\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j ##

##\int\int f_1(x_k, \theta_i, \theta_j) \bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j##

##=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]##

But I have difficulties to the step implying the 2 sums into equation[itex](5)[/itex]: [itex]\sum_k[/itex] and [itex]\sum_l[/itex] that I introduce above without justification ? Moreover, are my calculations of expectation are correct (I mean integrate over [itex]\theta_i[/itex] and [itex]\theta_j[/itex] ?

If somecone could help me, this would be nice.

Regards

Stephen Tashi · Mar 12, 2020

fab13 said:

Indeed, I have not to integrate on the parameter [itex]\text{d}x_{k}[/itex] but rather on the variables [itex](\theta_i, \theta_j)[/itex].

How can we integrate with respect to the ##\theta_i## without having a prior joint distribution for them?

The notes: https://mervyn.public.iastate.edu/stat580/Notes/s09mle.pdf indicate that the expectation is taken with respect to the variables representing the observations.

Stephen Tashi · Mar 17, 2020

fab13 said:

I would like to demonstrate the equation [itex](1)[/itex] below in the general form of the Log-likelihood :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

with the [itex]\log[/itex] of Likelihood [itex]\mathcal{L}[/itex] defined by [itex]\mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg)[/itex] with [itex]x_{i}[/itex] all experimental/observed values.

For the instant, if I start from the second derivative (left member of [itex](1)[/itex]), I can get :

You mean "the right member" of 1

But I don't know to make equation [itex](3)[/itex] equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

One thought is that for ##k \ne l##, the random variables ##\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}} ## and ##\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j} }## are independent. So the expected value of their product is the product of their expected values. Is each expected value equal to zero?

fab13 · Mar 23, 2020

@Stephen Tashi Yes, sorry, I meant the "right member" when I talk about the second derivative.

Don't forget that I have added an important UPDATE, you can see it since I have to integrate over ##\theta_i## and ##\theta_j## and not on ##x_k##.

Have you got a clue/track/suggestion ? Regards

Stephen Tashi · Mar 24, 2020

fab13 said:

Don't forget that I have added an important UPDATE, you can see it since I have to integrate over ##\theta_i## and ##\theta_j## and not on ##x_k##.

I'm curious why you think eq. 1 is correct when you interpret it as asking for taking the expectation with respect to ##\theta_i, \theta_j##.

fab13 · Mar 24, 2020

@Stephen Tashi

Do you mean that I made confusions between (##\theta,\theta'##) and (##\theta_i,\theta_j##) ? or maybe that equation (1) is false ?

If this is the case, which track do you suggest to conclude about this relation (1) ? As I said above, I don't know how to justify the passing from step (5) to the next step : there is something wrong in my attempt of demo and I don't know where it comes from.

Any help would be nice, I begin to despair ... Regards

Stephen Tashi · Mar 25, 2020

fab13 said:

@Stephen Tashi

Do you mean that I made confusions between (##\theta,\theta'##) and (##\theta_i,\theta_j##) ? or maybe that equation (1) is false ?

I mean that equations like eq. 1 that I've seen (for the Fisher Information Matrix) say that the expectation is taken with respect to the observations ##x_i##. The expectations are not taken with respect to the parameters ##\theta_i##. Where did you see eq. 1?

fab13 said:

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

? Is that notation equivalent to:
##E\Big[\frac{\partial \mathcal{L}}{\partial \theta_r} \frac{\partial \mathcal{L}}{\partial \theta_s}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta_r \partial \theta_s}\Big]\quad(1)##

If so, I doubt eq. 1 is always true when expectations are taken with respect to ##\theta_r, \theta_s##.

But I don't know to make equation [itex](3)[/itex] equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

I don't understand your notation for taking the expected value of a function. The expected value of ##w(x_1,x_2,...,x_n)## with respect to the joint density ##p(x_1,x_2,...x_n) = \Pi_{k} f(x_k) ## should be a multiple integral: ##\int \int ...\int p(x_1,x_2,...x_n) w(x_1,x_2,...x_n) dx_1 dx_2 ...dx_n ##

How did you get an expression involving only ##dx_k##?

It seems to me that the terms involved have a pattern like ##\int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_k) h(x_j) dx_1 dx_2 ...dx_n = ##
## \int \int f(x_k) w(x_k) f(x_j) h(x_j) dx_k dx_j = (\int f(x_k) w(x_k) dx_k) ( \int f(x_j) h(x_j) dx_j)##

fab13 · Mar 25, 2020

@Stephen Tashi

Thanks for your answer. So from I understand, the quantity ##w(x_k)## would correspond to :

##w(x_{k,i})=\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)##

and if I assume that : ##f(x_1,x_2,.. x_n) = f(x_1) f(x_2) .. f(x_n)##, I get for example :

##E[w_i] = \int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_{k,i}) dx_1 dx_2 ...dx_n##

which can be applied by adding a second quantity ##h_{l,j}##.

However, I have 2 last requests : in my first post, I have calculated that

##\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+ \dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\quad(6)##

1) If I follow your reasoning, I should write for the first term of ##(6)##:

##E[\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}] = -\int \int ...\int f(x_1) f(x_2) .. f(x_n) \sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)\,dx_1 dx_2 ...dx_n##

I don't know how to deal with the summation ##\sum_{k}## on ##
\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)## terms, in order to convert it as :

##\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)##

?

Sorry if this is evident for some of you ...

2) Morevoer, to make vanish the second term of ##(6)##, I divide for each ##k-th## iteration of sum the total joint density probability by ##f(x_k)## : is it enough to justify that :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

##=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

The division by ##f(x_k)## is compensated with the multiplication by ##f(x_k)##, so this way, we keep the relation :

##\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1##

don't we ?

Is this reasoning correct ?

Regards

Stephen Tashi · Mar 26, 2020

fab13 said:

I don't know how to deal with the summation ##\sum_{k}## on ##
\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)## terms, in order to convert it as :

##\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)##

?

To do that directly would involve introducing terms that are zero so that ##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))## where terms of the form ##h(r)w(s)## are zero except when ##r = s##.

It's more natural to begin with the left side of 1) where the pattern ## (\sum_r h(r))(\sum_s w(s))## appears and show that the terms in that expression are zero when ##r \ne s##.

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

For ##r \ne s##, ##E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})## since ##x_r, x_s## are independent random variables.

So the only nonzero terms on the left side of 1) are those of the above form when ##r = s##.

2) Morevoer, to make vanish the second term of ##(6)##, I divide for each ##k-th## iteration of sum the total joint density probability by ##f(x_k)## : is it enough to justify that :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

##=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

The division by ##f(x_k)## is compensated with the multiplication by ##f(x_k)##, so this way, we keep the relation :

##\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1##

don't we ?

Is this reasoning correct ?

Yes, I agree. However, it never hurts to check abstract arguments about summations by writing out a particular case such as ##n = 2##

fab13 · Mar 28, 2020

1)

Stephen Tashi said:

To do that directly would involve introducing terms that are zero so that ##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))## where terms of the form ##h(r)w(s)## are zero except when ##r = s##.

It's more natural to begin with the left side of 1) where the pattern ## (\sum_r h(r))(\sum_s w(s))## appears and show that the terms in that expression are zero when ##r \ne s##.

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

I don't understand the last part, i.e when you say :

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

Under which conditions have we got this expression ?

Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

i.e, ##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]## ?

The ideal would be to introduce a Kronecker symbol ##\delta_{rs}## to get this expression :

##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))\,\delta_{rs}##

But I don't know how to justify that ##h(r)\,w(s)=0## with ##r\neq s## or introduce a Kronecker symbol.

@Stephen Tashi : please if you could develop your reasoning, this woul be nice.

2)

On the other side, I understand well the relation :

##E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})##

since ##\dfrac{\partial log(f(x_r))}{\partial \theta_i}## and ##\dfrac{\partial log(f(x_s))}{\partial \theta_j}## are indepedant variables.

3)

Just a little remark : Why I make things complicated in this demo ? :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

I should have directly swap ##\sum_k## and ##E\bigg[...\bigg]##, this way, I could write directly :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\sum_k\,E\bigg[...\bigg] =\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

Regards

Stephen Tashi · Mar 28, 2020

fab13 said:

Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

i.e, ##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]## ?

Yes.

I suggest you take the case of two observations ##x_1, x_2## and write out the left hand side of 1).

fab13 said:

I should have directly swap ∑k\sum_k and E

Yes.

Apply that idea to the left hand side of 1). You will be taking the expectation in sums that involve terms like:

##E(\frac{\partial log(f(x_1))}{\partial \theta_i} \frac{\partial log(f(x_2))}{\partial \theta_j} )##
## = E(\frac{\partial log(f(x_1))}{\partial \theta_i}) E( \frac{\partial log(f(x_2))}{\partial \theta_j} )##
## = (0)(0) = 0 ##

fab13 · Mar 28, 2020

@Stephen Tashi

I think I have finally understood, maybe by writing :

Given assimilating ##X_k## and ##Y_l## to :

##X_k=\dfrac{\partial log(f(x_k))}{\partial \theta_i}##
##Y_l=\dfrac{\partial log(f(x_l))}{\partial \theta_j}##

(##E[X_r]=0## and ## E[Y_s]=0)\,\implies\,(E[\sum_k X_k]=0## and ##E[\sum_l Y_l]=0##) ##\implies (E[\sum_k\sum_l\,X_k Y_l]=\sum_k\sum_l\,E[ X_k Y_l]=\sum_k\,E[ X_k Y_k] = E[\sum_k X_k Y_k])##,

since ##E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]## (no more independence between ##X_k## and ##Y_k##).

Is this correct ?

thanks

Stephen Tashi · Mar 28, 2020

fab13 said:

since ##E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]## (no more independence between ##X_k## and ##Y_k##).

Is this correct ?

Yes.

Relation with Hessian and Log-likelihood

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Relation with Hessian and Log-likelihood

Similar threads