- #1

BRN

- 108

- 10

Hello,

I am better studying the theory that is the basis of Bayesian optimization with a Gaussian Process and the acquisition function EI.

I would like to expose what I think I understand and ask you to correct me if I'm wrong.

The aim is to find the best ##\theta## parameters for a parametric function ##f(x, \theta)## (objective function) of which the analytical form is not known.

The Bayes theorem is used to apply ##f## to approximate ##f## to the posterior and then the best parameter set are those that maximize the posterior.

Is used an normal function as likelihood and a Process Gaussian as prior:

$$\pi = \frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}}\exp\left[-\frac{1}{2}(\mathbf{Y}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{Y}-\mathbf{\mu})\right] $$

Everything happens in an iterative way, point by point. The points on which the posterior is calculated are given by the acquisition function sampling points in a ##D_t## dataset. Then, the improvement is defined as

$$

I = \left\{ \begin{matrix}

0 & \text{for}\;f>h \\

h_{t+1}(x)-f(x^+) &\text{for}\;f<h

\end{matrix}\right.

$$

where, ##h_{t+1}(x)## is the posterior function evaluated in step ##t+ 1## and ##f(x^+)## is the maximum value that has been reached so far.

and one can determine the expected improvement

$$

\alpha_{\rm EI}(x^*|\mathcal{D}_t) = \mathbb{E}[I(h)] = \int I(h) \pi {\rm d}h

$$

That is the expected improvement depends on the Gaussian process (the prior).

Therefore, at each step, the posterior is calculated at point #x_{max}# defined as

$$x_{max} = {\rm argmax}_x \alpha_{\rm EI}(x|\mathcal{D}_{t-1})$$

I don't know if what I wrote is correct. I'm a little confused ...

If I was wrong, can someone explain myself better? Could you tell me where to find a complete explanation on this topic? On the net I find only sketchy explanations.

Thanks!

I am better studying the theory that is the basis of Bayesian optimization with a Gaussian Process and the acquisition function EI.

I would like to expose what I think I understand and ask you to correct me if I'm wrong.

The aim is to find the best ##\theta## parameters for a parametric function ##f(x, \theta)## (objective function) of which the analytical form is not known.

The Bayes theorem is used to apply ##f## to approximate ##f## to the posterior and then the best parameter set are those that maximize the posterior.

Is used an normal function as likelihood and a Process Gaussian as prior:

$$\pi = \frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}}\exp\left[-\frac{1}{2}(\mathbf{Y}-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{Y}-\mathbf{\mu})\right] $$

Everything happens in an iterative way, point by point. The points on which the posterior is calculated are given by the acquisition function sampling points in a ##D_t## dataset. Then, the improvement is defined as

$$

I = \left\{ \begin{matrix}

0 & \text{for}\;f>h \\

h_{t+1}(x)-f(x^+) &\text{for}\;f<h

\end{matrix}\right.

$$

where, ##h_{t+1}(x)## is the posterior function evaluated in step ##t+ 1## and ##f(x^+)## is the maximum value that has been reached so far.

and one can determine the expected improvement

$$

\alpha_{\rm EI}(x^*|\mathcal{D}_t) = \mathbb{E}[I(h)] = \int I(h) \pi {\rm d}h

$$

That is the expected improvement depends on the Gaussian process (the prior).

Therefore, at each step, the posterior is calculated at point #x_{max}# defined as

$$x_{max} = {\rm argmax}_x \alpha_{\rm EI}(x|\mathcal{D}_{t-1})$$

I don't know if what I wrote is correct. I'm a little confused ...

If I was wrong, can someone explain myself better? Could you tell me where to find a complete explanation on this topic? On the net I find only sketchy explanations.

Thanks!

Last edited: