# I Bayesian Information Criterion Formula Proof

#### mertcan

Hi everyone, while I was digging arima model I saw that BIC value is given as $k*log(n)-2*log(L)$ where $L$ is the maximized value of likelihood function whereas $k$ is number of parameters. I have found the proof of AIC but no any clue about that. I wonder how it is derived. Could you help me with the proof?

Regards;

Related Set Theory, Logic, Probability, Statistics News on Phys.org

#### stevendaryl

Staff Emeritus
Here's the way that I sort of understand it. We have the definition:

$P(\overrightarrow{y} | M) = \int P(\overrightarrow{y} | \overrightarrow{\theta}, M) P(\overrightarrow{\theta} | M) d \overrightarrow{\theta}$

$\overrightarrow{y}$ is the vector of observations, $M$ is the model, and $\overrightarrow{\theta}$ is the vector of parameters in the model. Now, let $Q(\overrightarrow{\theta})$ be defined by:

$Q(\overrightarrow{\theta}) = log(P(\overrightarrow{y} | \overrightarrow{\theta}, M) P(\overrightarrow{\theta} | M))$

Then we are trying to approximate the integral:

$\int exp(Q(\overrightarrow{\theta})) d \overrightarrow{\theta}$

What we assume is $Q$ has a maximum at some particular value of the vector $\overrightarrow{\theta}$, call it $\overrightarrow{\Theta}$, and that it rapidly declines as you get away from its maximum. Under that assumption, you can approximate $Q$ by a Taylor expansion around its maximum:

$Q(\overrightarrow{\theta}) \approx Q(\overrightarrow{\Theta}) + (\overrightarrow{\theta} - \overrightarrow{\Theta}) \cdot \nabla_{\overrightarrow{\theta}} Q + \frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})$

where $H$ is a matrix of the second derivatives of $Q$:

$\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta}) = \frac{1}{2} \sum_{ij} (\theta^i - \Theta^i) H_{ij} (\theta^j - \Theta^j)$

where
$$H_{ij} = \frac{\partial^2 Q}{\partial \theta^i \partial \theta^j}|_{\overrightarrow{\theta} = \overrightarrow{\Theta}}$$

The maximum of $Q$ occurs when the linear term vanishes. So we have:

$Q(\overrightarrow{\theta}) \approx Q(\overrightarrow{\Theta}) + \frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})$

So the integral becomes:

$\int exp(Q(\overrightarrow{\theta})) d\overrightarrow{\theta} \approx exp(Q(\overrightarrow{\Theta})) \int exp(\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})) d \overrightarrow{\theta}$

#### stevendaryl

Staff Emeritus
So the integral becomes:

$\int exp(Q(\overrightarrow{\theta})) d\overrightarrow{\theta} \approx exp(Q(\overrightarrow{\Theta})) \int exp(\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})) d \overrightarrow{\theta}$
So the next step is to diagonalize $H$. If there is some matrix $U$ such that $U^T H U$ is a diagonal matrix, then we can do a variable change: $X^i \equiv U^{ij} (\theta^j - \Theta^j)$. Then the above integral becomes:
$\int exp(\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})) d \overrightarrow{\theta} = \int exp(\frac{1}{2} \sum_j H_{jj} (X^j)^2) d \overrightarrow{X}$ (There actually should be a Jacobian thrown in to account for the coordinate change, but I'm being lazy and hoping the Jacobian is 1). That integral is easily calculated (if all the $H_{jj}$ are negative, which they will be):

$\int exp(\frac{1}{2} \sum_j H_{jj} (X^j)^2) d \overrightarrow{X} = \sqrt{\frac{(2 \pi)^k}{|det(H)|}}$

There is one Gaussian integral for each variable $X^j$.

Staff Emeritus

#### mertcan

Thanks I got it.

"Bayesian Information Criterion Formula Proof"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving