Relation with Hessian and Log-likelihood

fab13 · Mar 12, 2020

I would like to demonstrate the equation [itex](1)[/itex] below in the general form of the Log-likelihood :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

with the [itex]\log[/itex] of Likelihood [itex]\mathcal{L}[/itex] defined by [itex]\mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg)[/itex] with [itex]x_{i}[/itex] all experimental/observed values.

For the instant, if I start from the second derivative (left member of [itex](1)[/itex]), I can get :

##\dfrac{\partial \mathcal{L}}{\partial \theta_{i}} = \dfrac{\partial \log\big(\Pi_{k} f(x_{k})\big)}{\partial \theta_{i}} = \dfrac{\big(\partial \sum_{k} \log\,f(x_{k})\big)}{\partial \theta_{i}}
=\sum_{k} \dfrac{1}{f(x_{k})} \dfrac{\partial f(x_{k})}{\partial \theta_{i}}##

Now I have to compute : ##\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=\dfrac{\partial}{\partial \theta_j} \left(\sum_{k}\dfrac{1}{f(x_{k})}\,\dfrac{\partial f(x_{k})}{\partial \theta_{i}} \right)##
##= -\sum_{k} \bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{ \partial \theta_i \partial \theta_j}\bigg)##
##=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}
\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+
\dfrac{1}{f(x_{k})}
\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)##

As we compute an expectation on both sides, the second term is vanishing to zero under regularity conditions, i.e :##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}
\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)##

Second term can be expressed as :

##\int\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\text{d}x_k =\dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\int f(x_{k})\text{d}x_k=0##

since [itex]\int f(x_{k})\,\text{d}x_k = 1[/itex]

Finally, I get the relation :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\,\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k##

##=\int \sum_{k}\,f(x_k) \bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k\quad\quad(3)##

But I don't know to make equation [itex](3)[/itex] equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

##=\int \sum_{k}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\sum_{l}\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k##

##=\int \sum_k f(x_k) \bigg(\dfrac{\partial \log(\Pi_{k}f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(\Pi_{l}f(x_{l})}{\partial \theta_{j}}\bigg)\,\text{d}x_k\quad\quad(4)##
##=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]##

I just want to prove the equality between (3) and (4) : where is my error ?

IMPORTANT UPDATE : I realized that I made an error for the calculation of the expectation into equation [itex](2)[/itex], when I write :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)##

Indeed, I have not to integrate on the parameter [itex]\text{d}x_{k}[/itex] but rather on the variables [itex](\theta_i, \theta_j)[/itex].

But if I do this, to compute the expectation, it seems that I need the joint distribution [itex](f(x_k) = f(x_k, \theta_i, \theta_j)[/itex].

So I guess we can rewrite this joint distribution like if [itex]\theta_i[/itex] and [itex]\theta_j[/itex] were independent, i.e :

##f(x_k, \theta_i, \theta_j)= f_1(x_k, \theta_i)\quad f_2(x_k, \theta_j)##

This way, I could have :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\int \sum_{k} f_1(x_k,\theta_i)\quad f_2(x_k,\theta_j)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f_1(x_{k})}{\partial \theta_{j}}\dfrac{\partial f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i\theta_j##

instead of :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k##

So finally, I could obtain from equation (2) :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int \int f\sum_{k} f_1(x_k,\theta_i,\theta_j)\bigg(\dfrac{\partial \log(f_2(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j##

##=\int \int f_1(x_k,\theta_i) f_2(x_l,\theta_j) \bigg(\dfrac{\partial \sum_{k} \log(f_1(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \sum_l \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j\quad(5)##

##= \int\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j ##

##\int\int f_1(x_k, \theta_i, \theta_j) \bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j##

##=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]##

But I have difficulties to the step implying the 2 sums into equation[itex](5)[/itex]: [itex]\sum_k[/itex] and [itex]\sum_l[/itex] that I introduce above without justification ? Moreover, are my calculations of expectation are correct (I mean integrate over [itex]\theta_i[/itex] and [itex]\theta_j[/itex] ?

If somecone could help me, this would be nice.

Regards

Stephen Tashi · Mar 12, 2020

fab13 said:

Indeed, I have not to integrate on the parameter [itex]\text{d}x_{k}[/itex] but rather on the variables [itex](\theta_i, \theta_j)[/itex].

How can we integrate with respect to the ##\theta_i## without having a prior joint distribution for them?

The notes: https://mervyn.public.iastate.edu/stat580/Notes/s09mle.pdf indicate that the expectation is taken with respect to the variables representing the observations.

Stephen Tashi · Mar 17, 2020

fab13 said:

I would like to demonstrate the equation [itex](1)[/itex] below in the general form of the Log-likelihood :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

with the [itex]\log[/itex] of Likelihood [itex]\mathcal{L}[/itex] defined by [itex]\mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg)[/itex] with [itex]x_{i}[/itex] all experimental/observed values.

For the instant, if I start from the second derivative (left member of [itex](1)[/itex]), I can get :

You mean "the right member" of 1

But I don't know to make equation [itex](3)[/itex] equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

One thought is that for ##k \ne l##, the random variables ##\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}} ## and ##\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j} }## are independent. So the expected value of their product is the product of their expected values. Is each expected value equal to zero?

fab13 · Mar 23, 2020

@Stephen Tashi Yes, sorry, I meant the "right member" when I talk about the second derivative.

Don't forget that I have added an important UPDATE, you can see it since I have to integrate over ##\theta_i## and ##\theta_j## and not on ##x_k##.

Have you got a clue/track/suggestion ? Regards

Stephen Tashi · Mar 24, 2020

fab13 said:

Don't forget that I have added an important UPDATE, you can see it since I have to integrate over ##\theta_i## and ##\theta_j## and not on ##x_k##.

I'm curious why you think eq. 1 is correct when you interpret it as asking for taking the expectation with respect to ##\theta_i, \theta_j##.

fab13 · Mar 24, 2020

@Stephen Tashi

Do you mean that I made confusions between (##\theta,\theta'##) and (##\theta_i,\theta_j##) ? or maybe that equation (1) is false ?

If this is the case, which track do you suggest to conclude about this relation (1) ? As I said above, I don't know how to justify the passing from step (5) to the next step : there is something wrong in my attempt of demo and I don't know where it comes from.

Any help would be nice, I begin to despair ... Regards

Stephen Tashi · Mar 25, 2020

fab13 said:

@Stephen Tashi

Do you mean that I made confusions between (##\theta,\theta'##) and (##\theta_i,\theta_j##) ? or maybe that equation (1) is false ?

I mean that equations like eq. 1 that I've seen (for the Fisher Information Matrix) say that the expectation is taken with respect to the observations ##x_i##. The expectations are not taken with respect to the parameters ##\theta_i##. Where did you see eq. 1?

fab13 said:

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

? Is that notation equivalent to:
##E\Big[\frac{\partial \mathcal{L}}{\partial \theta_r} \frac{\partial \mathcal{L}}{\partial \theta_s}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta_r \partial \theta_s}\Big]\quad(1)##

If so, I doubt eq. 1 is always true when expectations are taken with respect to ##\theta_r, \theta_s##.

But I don't know to make equation [itex](3)[/itex] equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

I don't understand your notation for taking the expected value of a function. The expected value of ##w(x_1,x_2,...,x_n)## with respect to the joint density ##p(x_1,x_2,...x_n) = \Pi_{k} f(x_k) ## should be a multiple integral: ##\int \int ...\int p(x_1,x_2,...x_n) w(x_1,x_2,...x_n) dx_1 dx_2 ...dx_n ##

How did you get an expression involving only ##dx_k##?

It seems to me that the terms involved have a pattern like ##\int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_k) h(x_j) dx_1 dx_2 ...dx_n = ##
## \int \int f(x_k) w(x_k) f(x_j) h(x_j) dx_k dx_j = (\int f(x_k) w(x_k) dx_k) ( \int f(x_j) h(x_j) dx_j)##

fab13 · Mar 25, 2020

@Stephen Tashi

Thanks for your answer. So from I understand, the quantity ##w(x_k)## would correspond to :

##w(x_{k,i})=\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)##

and if I assume that : ##f(x_1,x_2,.. x_n) = f(x_1) f(x_2) .. f(x_n)##, I get for example :

##E[w_i] = \int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_{k,i}) dx_1 dx_2 ...dx_n##

which can be applied by adding a second quantity ##h_{l,j}##.

However, I have 2 last requests : in my first post, I have calculated that

##\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+ \dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\quad(6)##

1) If I follow your reasoning, I should write for the first term of ##(6)##:

##E[\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}] = -\int \int ...\int f(x_1) f(x_2) .. f(x_n) \sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)\,dx_1 dx_2 ...dx_n##

I don't know how to deal with the summation ##\sum_{k}## on ##
\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)## terms, in order to convert it as :

##\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)##

?

Sorry if this is evident for some of you ...

2) Morevoer, to make vanish the second term of ##(6)##, I divide for each ##k-th## iteration of sum the total joint density probability by ##f(x_k)## : is it enough to justify that :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

##=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

The division by ##f(x_k)## is compensated with the multiplication by ##f(x_k)##, so this way, we keep the relation :

##\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1##

don't we ?

Is this reasoning correct ?

Regards

Stephen Tashi · Mar 26, 2020

fab13 said:

I don't know how to deal with the summation ##\sum_{k}## on ##
\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)## terms, in order to convert it as :

##\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)##

?

To do that directly would involve introducing terms that are zero so that ##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))## where terms of the form ##h(r)w(s)## are zero except when ##r = s##.

It's more natural to begin with the left side of 1) where the pattern ## (\sum_r h(r))(\sum_s w(s))## appears and show that the terms in that expression are zero when ##r \ne s##.

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

For ##r \ne s##, ##E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})## since ##x_r, x_s## are independent random variables.

So the only nonzero terms on the left side of 1) are those of the above form when ##r = s##.

2) Morevoer, to make vanish the second term of ##(6)##, I divide for each ##k-th## iteration of sum the total joint density probability by ##f(x_k)## : is it enough to justify that :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

##=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

The division by ##f(x_k)## is compensated with the multiplication by ##f(x_k)##, so this way, we keep the relation :

##\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1##

don't we ?

Is this reasoning correct ?

Yes, I agree. However, it never hurts to check abstract arguments about summations by writing out a particular case such as ##n = 2##

fab13 · Mar 28, 2020

1)

Stephen Tashi said:

To do that directly would involve introducing terms that are zero so that ##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))## where terms of the form ##h(r)w(s)## are zero except when ##r = s##.

It's more natural to begin with the left side of 1) where the pattern ## (\sum_r h(r))(\sum_s w(s))## appears and show that the terms in that expression are zero when ##r \ne s##.

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

I don't understand the last part, i.e when you say :

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

Under which conditions have we got this expression ?

Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

i.e, ##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]## ?

The ideal would be to introduce a Kronecker symbol ##\delta_{rs}## to get this expression :

##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))\,\delta_{rs}##

But I don't know how to justify that ##h(r)\,w(s)=0## with ##r\neq s## or introduce a Kronecker symbol.

@Stephen Tashi : please if you could develop your reasoning, this woul be nice.

2)

On the other side, I understand well the relation :

##E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})##

since ##\dfrac{\partial log(f(x_r))}{\partial \theta_i}## and ##\dfrac{\partial log(f(x_s))}{\partial \theta_j}## are indepedant variables.

3)

Just a little remark : Why I make things complicated in this demo ? :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

I should have directly swap ##\sum_k## and ##E\bigg[...\bigg]##, this way, I could write directly :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\sum_k\,E\bigg[...\bigg] =\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

Regards

Stephen Tashi · Mar 28, 2020

fab13 said:

Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

i.e, ##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]## ?

Yes.

I suggest you take the case of two observations ##x_1, x_2## and write out the left hand side of 1).

fab13 said:

I should have directly swap ∑k\sum_k and E

Yes.

Apply that idea to the left hand side of 1). You will be taking the expectation in sums that involve terms like:

##E(\frac{\partial log(f(x_1))}{\partial \theta_i} \frac{\partial log(f(x_2))}{\partial \theta_j} )##
## = E(\frac{\partial log(f(x_1))}{\partial \theta_i}) E( \frac{\partial log(f(x_2))}{\partial \theta_j} )##
## = (0)(0) = 0 ##

fab13 · Mar 28, 2020

@Stephen Tashi

I think I have finally understood, maybe by writing :

Given assimilating ##X_k## and ##Y_l## to :

##X_k=\dfrac{\partial log(f(x_k))}{\partial \theta_i}##
##Y_l=\dfrac{\partial log(f(x_l))}{\partial \theta_j}##

(##E[X_r]=0## and ## E[Y_s]=0)\,\implies\,(E[\sum_k X_k]=0## and ##E[\sum_l Y_l]=0##) ##\implies (E[\sum_k\sum_l\,X_k Y_l]=\sum_k\sum_l\,E[ X_k Y_l]=\sum_k\,E[ X_k Y_k] = E[\sum_k X_k Y_k])##,

since ##E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]## (no more independence between ##X_k## and ##Y_k##).

Is this correct ?

thanks

Stephen Tashi · Mar 28, 2020

fab13 said:

since ##E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]## (no more independence between ##X_k## and ##Y_k##).

Is this correct ?

Yes.

Relation with Hessian and Log-likelihood

1. What is the relationship between the Hessian and log-likelihood?

2. How does the Hessian matrix affect the estimation of model parameters?

3. Can the Hessian matrix be used to assess the fit of a model?

4. How is the Hessian matrix related to the Fisher information matrix?

5. Are there any limitations to using the Hessian matrix in statistical modeling?

Similar threads

Hot Threads

Recent Insights