Relation with Hessian and Log-likelihood

• I

Summary:

I would like to get a rigorous demonstration of relation (1) given at the beginning of my post, implyint the Hessian and the general exporession of log-Likelihood.

Main Question or Discussion Point

I would like to demonstrate the equation $(1)$ below in the general form of the Log-likelihood :

$E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)$

with the $\log$ of Likelihood $\mathcal{L}$ defined by $\mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg)$ with $x_{i}$ all experimental/observed values.

For the instant, if I start from the second derivative (left member of $(1)$), I can get :

$\dfrac{\partial \mathcal{L}}{\partial \theta_{i}} = \dfrac{\partial \log\big(\Pi_{k} f(x_{k})\big)}{\partial \theta_{i}} = \dfrac{\big(\partial \sum_{k} \log\,f(x_{k})\big)}{\partial \theta_{i}} =\sum_{k} \dfrac{1}{f(x_{k})} \dfrac{\partial f(x_{k})}{\partial \theta_{i}}$

Now I have to compute : $\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=\dfrac{\partial}{\partial \theta_j} \left(\sum_{k}\dfrac{1}{f(x_{k})}\,\dfrac{\partial f(x_{k})}{\partial \theta_{i}} \right)$
$= -\sum_{k} \bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{ \partial \theta_i \partial \theta_j}\bigg)$
$=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+ \dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)$

As we compute an expectation on both sides, the second term is vanishing to zero under regularity conditions, i.e :

$E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)$

Second term can be expressed as :

$\int\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\text{d}x_k =\dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\int f(x_{k})\text{d}x_k=0$

since $\int f(x_{k})\,\text{d}x_k = 1$

Finally, I get the relation :

$E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\,\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k$

$=\int \sum_{k}\,f(x_k) \bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k\quad\quad(3)$

But I don't know to make equation $(3)$ equal to :

$\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k$

$=\int \sum_{k}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\sum_{l}\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k$

$=\int \sum_k f(x_k) \bigg(\dfrac{\partial \log(\Pi_{k}f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(\Pi_{l}f(x_{l})}{\partial \theta_{j}}\bigg)\,\text{d}x_k\quad\quad(4)$
$=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]$

I just want to prove the equality between (3) and (4) : where is my error ?

IMPORTANT UPDATE : I realized that I made an error for the calculation of the expectation into equation $(2)$, when I write :

$E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}, \dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)$

Indeed, I have not to integrate on the parameter $\text{d}x_{k}$ but rather on the variables $(\theta_i, \theta_j)$.

But if I do this, to compute the expectation, it seems that I need the joint distribution $(f(x_k) = f(x_k, \theta_i, \theta_j)$.

So I guess we can rewrite this joint distribution like if $\theta_i$ and $\theta_j$ were independent, i.e :

$f(x_k, \theta_i, \theta_j)= f_1(x_k, \theta_i)\quad f_2(x_k, \theta_j)$

This way, I could have :

$E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\int \sum_{k} f_1(x_k,\theta_i)\quad f_2(x_k,\theta_j)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f_1(x_{k})}{\partial \theta_{j}}\dfrac{\partial f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i\theta_j$

$E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k$

So finally, I could obtain from equation (2) :

$E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int \int f\sum_{k} f_1(x_k,\theta_i,\theta_j)\bigg(\dfrac{\partial \log(f_2(x_{k})}{\partial \theta_{i}}, \dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j$

$=\int \int f_1(x_k,\theta_i) f_2(x_l,\theta_j) \bigg(\dfrac{\partial \sum_{k} \log(f_1(x_{k})}{\partial \theta_{i}}, \dfrac{\partial \sum_l \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j\quad(5)$

$= \int\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j$

$\int\int f_1(x_k, \theta_i, \theta_j) \bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j$

$=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]$

But I have difficulties to the step implying the 2 sums into equation$(5)$: $\sum_k$ and $\sum_l$ that I introduce above without justification ? Moreover, are my calculations of expectation are correct (I mean integrate over $\theta_i$ and $\theta_j$ ?

If somecone could help me, this would be nice.

Regards

Related Set Theory, Logic, Probability, Statistics News on Phys.org
Stephen Tashi
Indeed, I have not to integrate on the parameter $\text{d}x_{k}$ but rather on the variables $(\theta_i, \theta_j)$.
How can we integrate with respect to the $\theta_i$ without having a prior joint distribution for them?

The notes: https://mervyn.public.iastate.edu/stat580/Notes/s09mle.pdf indicate that the expectation is taken with respect to the variables representing the observations.

Stephen Tashi

I would like to demonstrate the equation $(1)$ below in the general form of the Log-likelihood :

$E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)$

with the $\log$ of Likelihood $\mathcal{L}$ defined by $\mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg)$ with $x_{i}$ all experimental/observed values.

For the instant, if I start from the second derivative (left member of $(1)$), I can get :
You mean "the right member" of 1

But I don't know to make equation $(3)$ equal to :

$\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k$
One thought is that for $k \ne l$, the random variables $\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}$ and $\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j} }$ are independent. So the expected value of their product is the product of their expected values. Is each expected value equal to zero?

@Stephen Tashi Yes, sorry, I meant the "right member" when I talk about the second derivative.

Don't forget that I have added an important UPDATE, you can see it since I have to integrate over $\theta_i$ and $\theta_j$ and not on $x_k$.

Have you got a clue/track/suggestion ? Regards

Stephen Tashi
Don't forget that I have added an important UPDATE, you can see it since I have to integrate over $\theta_i$ and $\theta_j$ and not on $x_k$.

I'm curious why you think eq. 1 is correct when you interpret it as asking for taking the expectation with respect to $\theta_i, \theta_j$.

@Stephen Tashi

Do you mean that I made confusions between ($\theta,\theta'$) and ($\theta_i,\theta_j$) ? or maybe that equation (1) is false ?

If this is the case, which track do you suggest to conclude about this relation (1) ? As I said above, I don't know how to justify the passing from step (5) to the next step : there is something wrong in my attempt of demo and I don't know where it comes from.

Any help would be nice, I begin to despair ... Regards

Stephen Tashi
@Stephen Tashi

Do you mean that I made confusions between ($\theta,\theta'$) and ($\theta_i,\theta_j$) ? or maybe that equation (1) is false ?
I mean that equations like eq. 1 that I've seen (for the Fisher Information Matrix) say that the expectation is taken with respect to the observations $x_i$. The expectations are not taken with respect to the parameters $\theta_i$. Where did you see eq. 1?

$E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)$
? Is that notation equivalent to:
$E\Big[\frac{\partial \mathcal{L}}{\partial \theta_r} \frac{\partial \mathcal{L}}{\partial \theta_s}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta_r \partial \theta_s}\Big]\quad(1)$

If so, I doubt eq. 1 is always true when expectations are taken with respect to $\theta_r, \theta_s$.

But I don't know to make equation $(3)$ equal to :

$\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k$
I don't understand your notation for taking the expected value of a function. The expected value of $w(x_1,x_2,...,x_n)$ with respect to the joint density $p(x_1,x_2,....x_n) = \Pi_{k} f(x_k)$ should be a multiple integral: $\int \int ....\int p(x_1,x_2,....x_n) w(x_1,x_2,....x_n) dx_1 dx_2 ...dx_n$

How did you get an expression involving only $dx_k$?

It seems to me that the terms involved have a pattern like $\int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_k) h(x_j) dx_1 dx_2 ...dx_n =$
$\int \int f(x_k) w(x_k) f(x_j) h(x_j) dx_k dx_j = (\int f(x_k) w(x_k) dx_k) ( \int f(x_j) h(x_j) dx_j)$

@Stephen Tashi

Thanks for your answer. So from I understand, the quantity $w(x_k)$ would correspond to :

$w(x_{k,i})=\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)$

and if I assume that : $f(x_1,x_2,.. x_n) = f(x_1) f(x_2) .. f(x_n)$, I get for example :

$E[w_i] = \int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_{k,i}) dx_1 dx_2 ...dx_n$

which can be applied by adding a second quantity $h_{l,j}$.

However, I have 2 last requests : in my first post, I have calculated that

$\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+ \dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\quad(6)$

1) If I follow your reasoning, I should write for the first term of $(6)$:

$E[\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}] = -\int \int ...\int f(x_1) f(x_2) .. f(x_n) \sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)\,dx_1 dx_2 ...dx_n$

I don't know how to deal with the summation $\sum_{k}$ on $\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)$ terms, in order to convert it as :

$\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)$

???

Sorry if this is evident for some of you ...

2) Morevoer, to make vanish the second term of $(6)$, I divide for each $k-th$ iteration of sum the total joint density probability by $f(x_k)$ : is it enough to justify that :

$E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n$

$=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0$

The division by $f(x_k)$ is compensated with the multiplication by $f(x_k)$, so this way, we keep the relation :

$\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1$

don't we ?

Is this reasoning correct ?

Regards

Last edited:
Stephen Tashi
I don't know how to deal with the summation $\sum_{k}$ on $\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)$ terms, in order to convert it as :

$\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)$

???
To do that directly would involve introducing terms that are zero so that $\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))$ where terms of the form $h(r)w(s)$ are zero except when $r = s$.

It's more natural to begin with the left side of 1) where the pattern $(\sum_r h(r))(\sum_s w(s))$ appears and show that the terms in that expression are zero when $r \ne s$.

The basic ideas are that
$E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0$

For $r \ne s$, $E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})$ since $x_r, x_s$ are independent random variables.

So the only nonzero terms on the left side of 1) are those of the above form when $r = s$.

2) Morevoer, to make vanish the second term of $(6)$, I divide for each $k-th$ iteration of sum the total joint density probability by $f(x_k)$ : is it enough to justify that :

$E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n$

$=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0$

The division by $f(x_k)$ is compensated with the multiplication by $f(x_k)$, so this way, we keep the relation :

$\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1$

don't we ?

Is this reasoning correct ?
Yes, I agree. However, it never hurts to check abstract arguments about summations by writing out a particular case such as $n = 2$

Last edited:
1)
To do that directly would involve introducing terms that are zero so that $\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))$ where terms of the form $h(r)w(s)$ are zero except when $r = s$.

It's more natural to begin with the left side of 1) where the pattern $(\sum_r h(r))(\sum_s w(s))$ appears and show that the terms in that expression are zero when $r \ne s$.

The basic ideas are that
$E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0$
I don't understand the last part, i.e when you say :

The basic ideas are that
$E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0$
Under which conditions have we got this expression ?

Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

$E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)$

i.e, $E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]$ ???

The ideal would be to introduce a Kronecker symbol $\delta_{rs}$ to get this expression :

$\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))\,\delta_{rs}$

But I don't know how to justify that $h(r)\,w(s)=0$ with $r\neq s$ or introduce a Kronecker symbol.

@Stephen Tashi : please if you could develop your reasoning, this woul be nice.

2)

On the other side, I understand well the relation :

$E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})$

since $\dfrac{\partial log(f(x_r))}{\partial \theta_i}$ and $\dfrac{\partial log(f(x_s))}{\partial \theta_j}$ are indepedant variables.

3)

Just a little remark : Why I make things complicated in this demo ? :

$E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n$

I should have directly swap $\sum_k$ and $E\bigg[...\bigg]$, this way, I could write directly :

$E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\sum_k\,E\bigg[...\bigg] =\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0$

Regards

Stephen Tashi
Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

$E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)$

i.e, $E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]$ ???
Yes.

I suggest you take the case of two observations $x_1, x_2$ and write out the left hand side of 1).

I should have directly swap ∑k\sum_k and E
Yes.

Apply that idea to the left hand side of 1). You will be taking the expectation in sums that involve terms like:

$E(\frac{\partial log(f(x_1))}{\partial \theta_i} \frac{\partial log(f(x_2))}{\partial \theta_j} )$
$= E(\frac{\partial log(f(x_1))}{\partial \theta_i}) E( \frac{\partial log(f(x_2))}{\partial \theta_j} )$
$= (0)(0) = 0$

@Stephen Tashi

I think I have finally understood, maybe by writing :

Given assimilating $X_k$ and $Y_l$ to :

$X_k=\dfrac{\partial log(f(x_k))}{\partial \theta_i}$
$Y_l=\dfrac{\partial log(f(x_l))}{\partial \theta_j}$

($E[X_r]=0$ and $E[Y_s]=0)\,\implies\,(E[\sum_k X_k]=0$ and $E[\sum_l Y_l]=0$) $\implies (E[\sum_k\sum_l\,X_k Y_l]=\sum_k\sum_l\,E[ X_k Y_l]=\sum_k\,E[ X_k Y_k] = E[\sum_k X_k Y_k])$,

since $E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]$ (no more independence between $X_k$ and $Y_k$).

Is this correct ?

thanks

Last edited:
Stephen Tashi
since $E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]$ (no more independence between $X_k$ and $Y_k$).