Undergrad Bayesian Information Criterion Formula Proof

Click For Summary
The discussion centers on the derivation of the Bayesian Information Criterion (BIC) formula, expressed as BIC = k*log(n) - 2*log(L), where L represents the maximized likelihood function and k is the number of parameters. A user seeks assistance in understanding the proof of this formula, referencing the definition of the likelihood and the use of a Taylor expansion around the maximum of the log-probability function. The conversation includes details on approximating integrals and the diagonalization of the Hessian matrix to facilitate calculations. The user expresses gratitude after gaining clarity on the topic. The thread effectively explores the mathematical foundations of BIC in the context of model selection.
mertcan
Messages
343
Reaction score
6
Hi everyone, while I was digging arima model I saw that BIC value is given as $k*log(n)-2*log(L)$ where $L$ is the maximized value of likelihood function whereas $k$ is number of parameters. I have found the proof of AIC but no any clue about that. I wonder how it is derived. Could you help me with the proof?

Regards;
 
Physics news on Phys.org
Here's the way that I sort of understand it. We have the definition:

##P(\overrightarrow{y} | M) = \int P(\overrightarrow{y} | \overrightarrow{\theta}, M) P(\overrightarrow{\theta} | M) d \overrightarrow{\theta}##

##\overrightarrow{y}## is the vector of observations, ##M## is the model, and ##\overrightarrow{\theta}## is the vector of parameters in the model. Now, let ##Q(\overrightarrow{\theta})## be defined by:

##Q(\overrightarrow{\theta}) = log(P(\overrightarrow{y} | \overrightarrow{\theta}, M) P(\overrightarrow{\theta} | M))##

Then we are trying to approximate the integral:

##\int exp(Q(\overrightarrow{\theta})) d \overrightarrow{\theta}##

What we assume is ##Q## has a maximum at some particular value of the vector ##\overrightarrow{\theta}##, call it ##\overrightarrow{\Theta}##, and that it rapidly declines as you get away from its maximum. Under that assumption, you can approximate ##Q## by a Taylor expansion around its maximum:

##Q(\overrightarrow{\theta}) \approx Q(\overrightarrow{\Theta}) + (\overrightarrow{\theta} - \overrightarrow{\Theta}) \cdot \nabla_{\overrightarrow{\theta}} Q + \frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta}) ##

where ##H## is a matrix of the second derivatives of ##Q##:

##\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta}) = \frac{1}{2} \sum_{ij} (\theta^i - \Theta^i) H_{ij} (\theta^j - \Theta^j)##

where
$$H_{ij} = \frac{\partial^2 Q}{\partial \theta^i \partial \theta^j}|_{\overrightarrow{\theta} = \overrightarrow{\Theta}}$$

The maximum of ##Q## occurs when the linear term vanishes. So we have:

##Q(\overrightarrow{\theta}) \approx Q(\overrightarrow{\Theta}) + \frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta}) ##

So the integral becomes:

##\int exp(Q(\overrightarrow{\theta})) d\overrightarrow{\theta} \approx exp(Q(\overrightarrow{\Theta})) \int exp(\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})) d \overrightarrow{\theta}##
 
  • Like
Likes mertcan
So the integral becomes:

##\int exp(Q(\overrightarrow{\theta})) d\overrightarrow{\theta} \approx exp(Q(\overrightarrow{\Theta})) \int exp(\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})) d \overrightarrow{\theta}##

So the next step is to diagonalize ##H##. If there is some matrix ##U## such that ##U^T H U## is a diagonal matrix, then we can do a variable change: ##X^i \equiv U^{ij} (\theta^j - \Theta^j)##. Then the above integral becomes:
##\int exp(\frac{1}{2} (\overrightarrow{\theta} - \overrightarrow{\Theta})^T H (\overrightarrow{\theta} - \overrightarrow{\Theta})) d \overrightarrow{\theta} = \int exp(\frac{1}{2} \sum_j H_{jj} (X^j)^2) d \overrightarrow{X}## (There actually should be a Jacobian thrown into account for the coordinate change, but I'm being lazy and hoping the Jacobian is 1). That integral is easily calculated (if all the ##H_{jj}## are negative, which they will be):

##\int exp(\frac{1}{2} \sum_j H_{jj} (X^j)^2) d \overrightarrow{X} = \sqrt{\frac{(2 \pi)^k}{|det(H)|}}##

There is one Gaussian integral for each variable ##X^j##.
 
  • Like
Likes mertcan
To go further, look at this: http://www.math.utah.edu/~hbhat/BICderivation.pdf
 
  • Like
Likes mertcan
Thanks I got it.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 26 ·
Replies
26
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
1
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
Replies
2
Views
1K