Relation with Hessian and Log-likelihood

In summary, the conversation discusses the demonstration of an equation in the general form of the Log-likelihood. The equation is defined as a product of a partial derivative and a second partial derivative, and the goal is to prove its equality to a specific expression. The conversation goes through the steps of calculating the derivatives and expectations, with some errors being corrected along the way. The final result is an expression that is equal to the desired expression, but further clarification is needed for some of the steps involved.
  • #1
fab13
312
6
TL;DR Summary
I would like to get a rigorous demonstration of relation (1) given at the beginning of my post, implyint the Hessian and the general exporession of log-Likelihood.
I would like to demonstrate the equation [itex](1)[/itex] below in the general form of the Log-likelihood :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

with the [itex]\log[/itex] of Likelihood [itex]\mathcal{L}[/itex] defined by [itex]\mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg)[/itex] with [itex]x_{i}[/itex] all experimental/observed values.

For the instant, if I start from the second derivative (left member of [itex](1)[/itex]), I can get :

##\dfrac{\partial \mathcal{L}}{\partial \theta_{i}} = \dfrac{\partial \log\big(\Pi_{k} f(x_{k})\big)}{\partial \theta_{i}} = \dfrac{\big(\partial \sum_{k} \log\,f(x_{k})\big)}{\partial \theta_{i}}
=\sum_{k} \dfrac{1}{f(x_{k})} \dfrac{\partial f(x_{k})}{\partial \theta_{i}}##

Now I have to compute : ##\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=\dfrac{\partial}{\partial \theta_j} \left(\sum_{k}\dfrac{1}{f(x_{k})}\,\dfrac{\partial f(x_{k})}{\partial \theta_{i}} \right)##
##= -\sum_{k} \bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{ \partial \theta_i \partial \theta_j}\bigg)##
##=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}
\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+
\dfrac{1}{f(x_{k})}
\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)##

As we compute an expectation on both sides, the second term is vanishing to zero under regularity conditions, i.e :##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}
\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)##

Second term can be expressed as :

##\int\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\text{d}x_k =\dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\int f(x_{k})\text{d}x_k=0##

since [itex]\int f(x_{k})\,\text{d}x_k = 1[/itex]

Finally, I get the relation :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\,\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k##

##=\int \sum_{k}\,f(x_k) \bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k\quad\quad(3)##

But I don't know to make equation [itex](3)[/itex] equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

##=\int \sum_{k}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\sum_{l}\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k##

##=\int \sum_k f(x_k) \bigg(\dfrac{\partial \log(\Pi_{k}f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(\Pi_{l}f(x_{l})}{\partial \theta_{j}}\bigg)\,\text{d}x_k\quad\quad(4)##
##=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]##

I just want to prove the equality between (3) and (4) : where is my error ?

IMPORTANT UPDATE : I realized that I made an error for the calculation of the expectation into equation [itex](2)[/itex], when I write :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}+\dfrac{1}{f(x_{k})}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\text{d}x_k\quad\quad(2)##

Indeed, I have not to integrate on the parameter [itex]\text{d}x_{k}[/itex] but rather on the variables [itex](\theta_i, \theta_j)[/itex].

But if I do this, to compute the expectation, it seems that I need the joint distribution [itex](f(x_k) = f(x_k, \theta_i, \theta_j)[/itex].

So I guess we can rewrite this joint distribution like if [itex]\theta_i[/itex] and [itex]\theta_j[/itex] were independent, i.e :

##f(x_k, \theta_i, \theta_j)= f_1(x_k, \theta_i)\quad f_2(x_k, \theta_j)##


This way, I could have :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\int \sum_{k} f_1(x_k,\theta_i)\quad f_2(x_k,\theta_j)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f_1(x_{k})}{\partial \theta_{j}}\dfrac{\partial f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i\theta_j##

instead of :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int\sum_{k} f(x_k)\bigg(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}\bigg)\text{d}x_k##

So finally, I could obtain from equation (2) :

##E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]=\int \int f\sum_{k} f_1(x_k,\theta_i,\theta_j)\bigg(\dfrac{\partial \log(f_2(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j##

##=\int \int f_1(x_k,\theta_i) f_2(x_l,\theta_j) \bigg(\dfrac{\partial \sum_{k} \log(f_1(x_{k})}{\partial \theta_{i}},
\dfrac{\partial \sum_l \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_i \text{d}\theta_j\quad(5)##

##= \int\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f_1(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(f_2(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j ##

##\int\int f_1(x_k, \theta_i, \theta_j) \bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\text{d}\theta_i \bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}\theta_j##

##=E\Big[\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}\Big]##

But I have difficulties to the step implying the 2 sums into equation[itex](5)[/itex]: [itex]\sum_k[/itex] and [itex]\sum_l[/itex] that I introduce above without justification ? Moreover, are my calculations of expectation are correct (I mean integrate over [itex]\theta_i[/itex] and [itex]\theta_j[/itex] ?

If somecone could help me, this would be nice.

Regards
 
Physics news on Phys.org
  • #2
fab13 said:
Indeed, I have not to integrate on the parameter [itex]\text{d}x_{k}[/itex] but rather on the variables [itex](\theta_i, \theta_j)[/itex].
How can we integrate with respect to the ##\theta_i## without having a prior joint distribution for them?

The notes: https://mervyn.public.iastate.edu/stat580/Notes/s09mle.pdf indicate that the expectation is taken with respect to the variables representing the observations.
 
  • #3
fab13 said:


I would like to demonstrate the equation [itex](1)[/itex] below in the general form of the Log-likelihood :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

with the [itex]\log[/itex] of Likelihood [itex]\mathcal{L}[/itex] defined by [itex]\mathcal{L} = \log\bigg(\Pi_{i} f(x_{i})\bigg)[/itex] with [itex]x_{i}[/itex] all experimental/observed values.

For the instant, if I start from the second derivative (left member of [itex](1)[/itex]), I can get :

You mean "the right member" of 1

But I don't know to make equation [itex](3)[/itex] equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

One thought is that for ##k \ne l##, the random variables ##\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}} ## and ##\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j} }## are independent. So the expected value of their product is the product of their expected values. Is each expected value equal to zero?
 
  • #4
@Stephen Tashi Yes, sorry, I meant the "right member" when I talk about the second derivative.

Don't forget that I have added an important UPDATE, you can see it since I have to integrate over ##\theta_i## and ##\theta_j## and not on ##x_k##.

Have you got a clue/track/suggestion ? Regards
 
  • #5
fab13 said:
Don't forget that I have added an important UPDATE, you can see it since I have to integrate over ##\theta_i## and ##\theta_j## and not on ##x_k##.
I'm curious why you think eq. 1 is correct when you interpret it as asking for taking the expectation with respect to ##\theta_i, \theta_j##.
 
  • #6
@Stephen Tashi

Do you mean that I made confusions between (##\theta,\theta'##) and (##\theta_i,\theta_j##) ? or maybe that equation (1) is false ?

If this is the case, which track do you suggest to conclude about this relation (1) ? As I said above, I don't know how to justify the passing from step (5) to the next step : there is something wrong in my attempt of demo and I don't know where it comes from.

Any help would be nice, I begin to despair ... Regards
 
  • #7
fab13 said:
@Stephen Tashi

Do you mean that I made confusions between (##\theta,\theta'##) and (##\theta_i,\theta_j##) ? or maybe that equation (1) is false ?

I mean that equations like eq. 1 that I've seen (for the Fisher Information Matrix) say that the expectation is taken with respect to the observations ##x_i##. The expectations are not taken with respect to the parameters ##\theta_i##. Where did you see eq. 1?
fab13 said:
##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

? Is that notation equivalent to:
##E\Big[\frac{\partial \mathcal{L}}{\partial \theta_r} \frac{\partial \mathcal{L}}{\partial \theta_s}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta_r \partial \theta_s}\Big]\quad(1)##

If so, I doubt eq. 1 is always true when expectations are taken with respect to ##\theta_r, \theta_s##.

But I don't know to make equation [itex](3)[/itex] equal to :

##\int \sum_{k}\sum_{l}f(x_k)\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg)\text{d}x_k
##

I don't understand your notation for taking the expected value of a function. The expected value of ##w(x_1,x_2,...,x_n)## with respect to the joint density ##p(x_1,x_2,...x_n) = \Pi_{k} f(x_k) ## should be a multiple integral: ##\int \int ...\int p(x_1,x_2,...x_n) w(x_1,x_2,...x_n) dx_1 dx_2 ...dx_n ##

How did you get an expression involving only ##dx_k##?

It seems to me that the terms involved have a pattern like ##\int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_k) h(x_j) dx_1 dx_2 ...dx_n = ##
## \int \int f(x_k) w(x_k) f(x_j) h(x_j) dx_k dx_j = (\int f(x_k) w(x_k) dx_k) ( \int f(x_j) h(x_j) dx_j)##
 
  • #8
@Stephen Tashi

Thanks for your answer. So from I understand, the quantity ##w(x_k)## would correspond to :

##w(x_{k,i})=\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)##

and if I assume that : ##f(x_1,x_2,.. x_n) = f(x_1) f(x_2) .. f(x_n)##, I get for example :

##E[w_i] = \int \int ...\int f(x_1) f(x_2) .. f(x_n) w(x_{k,i}) dx_1 dx_2 ...dx_n##

which can be applied by adding a second quantity ##h_{l,j}##.

However, I have 2 last requests : in my first post, I have calculated that

##\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=-\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+ \dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\quad(6)##

1) If I follow your reasoning, I should write for the first term of ##(6)##:

##E[\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}] = -\int \int ...\int f(x_1) f(x_2) .. f(x_n) \sum_{k}\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}} \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)\,dx_1 dx_2 ...dx_n##

I don't know how to deal with the summation ##\sum_{k}## on ##
\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)## terms, in order to convert it as :

##\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)##

?

Sorry if this is evident for some of you ...

2) Morevoer, to make vanish the second term of ##(6)##, I divide for each ##k-th## iteration of sum the total joint density probability by ##f(x_k)## : is it enough to justify that :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

##=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

The division by ##f(x_k)## is compensated with the multiplication by ##f(x_k)##, so this way, we keep the relation :

##\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1##

don't we ?

Is this reasoning correct ?

Regards
 
Last edited:
  • #9
fab13 said:
I don't know how to deal with the summation ##\sum_{k}## on ##
\bigg(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}\,\bigg)## terms, in order to convert it as :

##\bigg(\dfrac{\partial \log(\Pi_k f_2(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(\Pi_l f(x_{l})}{\partial \theta_{j}}\bigg)##

?

To do that directly would involve introducing terms that are zero so that ##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))## where terms of the form ##h(r)w(s)## are zero except when ##r = s##.

It's more natural to begin with the left side of 1) where the pattern ## (\sum_r h(r))(\sum_s w(s))## appears and show that the terms in that expression are zero when ##r \ne s##.

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

For ##r \ne s##, ##E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})## since ##x_r, x_s## are independent random variables.

So the only nonzero terms on the left side of 1) are those of the above form when ##r = s##.

2) Morevoer, to make vanish the second term of ##(6)##, I divide for each ##k-th## iteration of sum the total joint density probability by ##f(x_k)## : is it enough to justify that :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

##=\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

The division by ##f(x_k)## is compensated with the multiplication by ##f(x_k)##, so this way, we keep the relation :

##\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,dx_1 dx_2 ...dx_n=1##

don't we ?

Is this reasoning correct ?

Yes, I agree. However, it never hurts to check abstract arguments about summations by writing out a particular case such as ##n = 2##
 
Last edited:
  • #10
1)
Stephen Tashi said:
To do that directly would involve introducing terms that are zero so that ##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))## where terms of the form ##h(r)w(s)## are zero except when ##r = s##.

It's more natural to begin with the left side of 1) where the pattern ## (\sum_r h(r))(\sum_s w(s))## appears and show that the terms in that expression are zero when ##r \ne s##.

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

I don't understand the last part, i.e when you say :

The basic ideas are that
##E (\ \dfrac{\partial log( f(x_k))}{\partial \theta_i}\ ) = 0##

Under which conditions have we got this expression ?

Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

i.e, ##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]## ?

The ideal would be to introduce a Kronecker symbol ##\delta_{rs}## to get this expression :

##\sum_k h(k)w(k) = (\sum_r h(r))(\sum_s w(s))\,\delta_{rs}##

But I don't know how to justify that ##h(r)\,w(s)=0## with ##r\neq s## or introduce a Kronecker symbol.

@Stephen Tashi : please if you could develop your reasoning, this woul be nice.

2)

On the other side, I understand well the relation :

##E( \dfrac{\partial log(f(x_r))}{\partial \theta_i} \dfrac{\partial log(f(x_s))}{\partial \theta_j}) = E( \dfrac{\partial log(f(x_r))}{\partial \theta_i})E( \dfrac{\partial log(f(x_s))}{\partial \theta_j})##

since ##\dfrac{\partial log(f(x_r))}{\partial \theta_i}## and ##\dfrac{\partial log(f(x_s))}{\partial \theta_j}## are indepedant variables.

3)

Just a little remark : Why I make things complicated in this demo ? :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\int \int{i\neq k} ...\int f(x_1) f(x_{i\neq k}) .. f(x_n)\,\bigg(\sum_{k}\dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg)\,dx_1 dx_2 ...dx_n = \sum_{k} \dfrac{\partial^{2}}{\partial \theta_{i} \partial \theta_{j}}\bigg(\int \int ...\int f(x_1) f(x_{2}) .. f(x_n)\,\bigg)\,dx_1 dx_2 ...dx_n##

I should have directly swap ##\sum_k## and ##E\bigg[...\bigg]##, this way, I could write directly :

##E\bigg[\sum_{k}\dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\bigg]=\sum_k\,E\bigg[...\bigg] =\sum_{k} \dfrac{\partial^{2} 1}{\partial \theta_{i} \partial \theta_{j}}=0##

Regards
 
  • #11
fab13 said:
Moreover, when you talk about "left side of 1)", you talk about the calculation of left side of this relation :

##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]=E\Big[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\Big]\quad(1)##

i.e, ##E\Big[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\Big]## ?

Yes.

I suggest you take the case of two observations ##x_1, x_2## and write out the left hand side of 1).

fab13 said:
I should have directly swap ∑k\sum_k and E

Yes.

Apply that idea to the left hand side of 1). You will be taking the expectation in sums that involve terms like:

##E(\frac{\partial log(f(x_1))}{\partial \theta_i} \frac{\partial log(f(x_2))}{\partial \theta_j} )##
## = E(\frac{\partial log(f(x_1))}{\partial \theta_i}) E( \frac{\partial log(f(x_2))}{\partial \theta_j} )##
## = (0)(0) = 0 ##
 
  • #12
@Stephen Tashi

I think I have finally understood, maybe by writing :

Given assimilating ##X_k## and ##Y_l## to :

##X_k=\dfrac{\partial log(f(x_k))}{\partial \theta_i}##
##Y_l=\dfrac{\partial log(f(x_l))}{\partial \theta_j}##

(##E[X_r]=0## and ## E[Y_s]=0)\,\implies\,(E[\sum_k X_k]=0## and ##E[\sum_l Y_l]=0##) ##\implies (E[\sum_k\sum_l\,X_k Y_l]=\sum_k\sum_l\,E[ X_k Y_l]=\sum_k\,E[ X_k Y_k] = E[\sum_k X_k Y_k])##,

since ##E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]## (no more independence between ##X_k## and ##Y_k##).

Is this correct ?

thanks
 
Last edited:
  • Like
Likes Stephen Tashi
  • #13
fab13 said:
since ##E[X_k\,Y_k])\neq E[X_k]\,E[Y_k]## (no more independence between ##X_k## and ##Y_k##).

Is this correct ?

Yes.
 

1. What is the relationship between the Hessian and log-likelihood?

The Hessian matrix is a square matrix of second-order partial derivatives of the log-likelihood function. It is used to calculate the standard errors and confidence intervals of the estimated parameters in a statistical model. The log-likelihood function, on the other hand, is a measure of the goodness of fit of a model to a given set of data. The Hessian and log-likelihood are closely related, as the Hessian is used to optimize the log-likelihood function and find the maximum likelihood estimates of the model parameters.

2. How does the Hessian matrix affect the estimation of model parameters?

The Hessian matrix plays a crucial role in the estimation of model parameters. It is used to calculate the standard errors of the estimated parameters, which are important for assessing the precision and significance of the estimated values. The Hessian also helps to determine the shape of the log-likelihood function, which can provide information about the stability and convergence of the model estimation process.

3. Can the Hessian matrix be used to assess the fit of a model?

While the Hessian matrix is not directly used to assess the fit of a model, it can provide valuable information about the quality of the model estimation. A well-fitting model will have a Hessian matrix with small values, indicating that the estimated parameters are stable and the model has converged. On the other hand, a poorly fitting model may have a Hessian matrix with large values, indicating instability and potential issues with the estimation process.

4. How is the Hessian matrix related to the Fisher information matrix?

The Hessian matrix is closely related to the Fisher information matrix, as both are used to calculate the standard errors of the estimated parameters in a model. The Fisher information matrix is the expected value of the Hessian matrix, and it provides information about the precision of the estimated parameters. In some cases, the Hessian matrix can be used as an approximation of the Fisher information matrix.

5. Are there any limitations to using the Hessian matrix in statistical modeling?

While the Hessian matrix is a useful tool in statistical modeling, it does have some limitations. One limitation is that it assumes that the log-likelihood function is well-behaved and has a unique maximum. If this is not the case, the Hessian matrix may not accurately estimate the standard errors of the parameters. Additionally, the Hessian matrix can be computationally intensive to calculate, especially for complex models with many parameters.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
962
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Advanced Physics Homework Help
Replies
11
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Atomic and Condensed Matter
Replies
1
Views
862
Replies
0
Views
289
  • Quantum Interpretations and Foundations
Replies
1
Views
499
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
3K
Back
Top