I Bayesian Information Criterion Formula Proof

  • Thread starter mertcan
  • Start date
326
6
Hi everyone, while I was digging arima model I saw that BIC value is given as $k*log(n)-2*log(L)$ where $L$ is the maximized value of likelihood function whereas $k$ is number of parameters. I have found the proof of AIC but no any clue about that. I wonder how it is derived. Could you help me with the proof?

Regards;
 

stevendaryl

Staff Emeritus
Science Advisor
Insights Author
8,400
2,569
Here's the way that I sort of understand it. We have the definition:

##P(\overrightarrow{y} | M) = \int P(\overrightarrow{y} | \overrightarrow{\theta}, M) P(\overrightarrow{\theta} | M) d \overrightarrow{\theta}##

##\overrightarrow{y}## is the vector of observations, ##M## is the model, and ##\overrightarrow{\theta}## is the vector of parameters in the model. Now, let ##Q(\overrightarrow{\theta})## be defined by:

##Q(\overrightarrow{\theta}) = log(P(\overrightarrow{y} | \overrightarrow{\theta}, M) P(\overrightarrow{\theta} | M))##

Then we are trying to approximate the integral:

##\int exp(Q(\overrightarrow{\theta})) d \overrightarrow{\theta}##

What we assume is ##Q## has a maximum at some particular value of the vector ##\overrightarrow{\theta}##, call it ##\overrightarrow{\Theta}##, and that it rapidly declines as you get away from its maximum. Under that assumption, you can approximate ##Q## by a Taylor expansion around its maximum:

##Q(\overrightarrow{\theta}) \approx Q(\overrightarrow{\Theta}) + (\overrightarrow{\theta} - \overrightarrow{\Theta}) \cdot \nabla_{\overrightarrow{\theta}} Q + \frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta}) ##

where ##H## is a matrix of the second derivatives of ##Q##:

##\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta}) = \frac{1}{2} \sum_{ij} (\theta^i - \Theta^i) H_{ij} (\theta^j - \Theta^j)##

where
$$H_{ij} = \frac{\partial^2 Q}{\partial \theta^i \partial \theta^j}|_{\overrightarrow{\theta} = \overrightarrow{\Theta}}$$

The maximum of ##Q## occurs when the linear term vanishes. So we have:

##Q(\overrightarrow{\theta}) \approx Q(\overrightarrow{\Theta}) + \frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta}) ##

So the integral becomes:

##\int exp(Q(\overrightarrow{\theta})) d\overrightarrow{\theta} \approx exp(Q(\overrightarrow{\Theta})) \int exp(\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})) d \overrightarrow{\theta}##
 

stevendaryl

Staff Emeritus
Science Advisor
Insights Author
8,400
2,569
So the integral becomes:

##\int exp(Q(\overrightarrow{\theta})) d\overrightarrow{\theta} \approx exp(Q(\overrightarrow{\Theta})) \int exp(\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})) d \overrightarrow{\theta}##
So the next step is to diagonalize ##H##. If there is some matrix ##U## such that ##U^T H U## is a diagonal matrix, then we can do a variable change: ##X^i \equiv U^{ij} (\theta^j - \Theta^j)##. Then the above integral becomes:
##\int exp(\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})) d \overrightarrow{\theta} = \int exp(\frac{1}{2} \sum_j H_{jj} (X^j)^2) d \overrightarrow{X}## (There actually should be a Jacobian thrown in to account for the coordinate change, but I'm being lazy and hoping the Jacobian is 1). That integral is easily calculated (if all the ##H_{jj}## are negative, which they will be):

##\int exp(\frac{1}{2} \sum_j H_{jj} (X^j)^2) d \overrightarrow{X} = \sqrt{\frac{(2 \pi)^k}{|det(H)|}}##

There is one Gaussian integral for each variable ##X^j##.
 
326
6
Thanks I got it.
 

Want to reply to this thread?

"Bayesian Information Criterion Formula Proof" You must log in or register to reply here.

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving
Top