Find the Maximum of a Multi-variable Taylor Series

ecastro
Messages
249
Reaction score
8
Homework Statement
What is the maximum (or minimum) of a multi-variable Taylor Series?
Relevant Equations
Assume that the function ##f## to be expanded has three variables ##x, y, z##, so its Taylor series expansion is (centered at ##\left(a, b, c\right)##),
\begin{align*}
f\left(x, y, z\right) &= f\left(a, b, c\right) + \left[\frac{\partial f\left(a, b, c\right)}{\partial x}\left(x - a\right) + \frac{\partial f\left(a, b, c\right)}{\partial y}\left(y - b\right) + \frac{\partial f\left(a, b, c\right)}{\partial z}\left(z - c\right)\right] + \frac{1}{2}\left[\frac{\partial^2 f\left(a, b, c\right)}{\partial x^2}\left(x - a\right)^2 + \frac{\partial^2 f\left(a, b, c\right)}{\partial x \partial y}\left(x - a\right)\left(y - b\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial x \partial z}\left(x - a\right)\left(z - c\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial y \partial x}\left(y - b\right)\left(x - a\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial y^2}\left(y - b\right)^2 + \frac{\partial^2 f\left(a, b, c\right)}{\partial y \partial z}\left(y - b\right)\left(z - c\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial z \partial x}\left(z - c\right)\left(x - a\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial z \partial y}\left(z - c\right)\left(y - b\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial z^2}\left(z - c\right)^2\right],
\end{align*}
which is up to the second-order term.
Firstly, the matrix notation of the series is,
\begin{align*}
f\left(x, y, z\right) &= f\left(a, b, c\right) + \left(\mathbf{x} - \mathbf{a}\right)^T \frac{\partial f\left(a, b, c\right)}{\partial \mathbf{x}} + \frac{1}{2}\left(\mathbf{x} - \mathbf{a}\right)^T \frac{\partial^2 f\left(a, b, c\right)}{\partial \mathbf{x}^2} \left(\mathbf{x} - \mathbf{a}\right),
\end{align*}
where ##\mathbf{x} = \left[x, y, z\right]^T## and ##\mathbf{a} = \left[a, b, c\right]^T##. The first derivative of the series is set to zero to find the maximum or minimum point, then how to apply the first derivative? Is the first derivative matrix to be applied is ##\frac{\partial}{\partial \mathbf{x}} = \left[\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right]^T##?

Thank you in advance.
 
Physics news on Phys.org
ecastro said:
Is the first derivative matrix to be applied is ##\frac{\partial}{\partial \mathbf{x}} = \left[\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right]^T##?

Yes, you seem to have the right concept here. You need to look at the gradient
$$\nabla f(\mathbf x)=\frac{\partial f}{\partial \mathbf{x}} = \left[\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}\right]$$
and consider how your Taylor series will behave for ##\mathbf x## in the neighborhood of ##\mathbf a## when ##\nabla f(\mathbf a) = \mathbf 0##. Hints: What conditions on ##\nabla^2 f(\mathbf a)## would have to hold for ##f(\mathbf a)## to be a maximum or a minimum? Are you looking for a global extremum or a local one?
 
Thank you for the confirmation. I arrived at this solution, assuming that ##\left(a, b, c\right) = \left(0, 0, 0\right)## (for short-writing),

\begin{align*}
0 &= \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} f\left(0, 0, 0\right) + \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial f\left(0, 0, 0\right)}{\partial x} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial y} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial z} \end{bmatrix} + \frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}\end{bmatrix}\begin{bmatrix}x \\ y \\ z \end{bmatrix}
\end{align*}

So, the first term is zero because ##f\left(0, 0, 0\right)## is constant. As for the second term, the right-most matrix is constant because the first derivative of the function was evaluated at ##\left(0, 0, 0\right)## (correct me if I am wrong), so it reduces to:
\begin{align*}
\begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial f\left(0, 0, 0\right)}{\partial x} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial y} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial z} \end{bmatrix} &= \begin{bmatrix} \frac{\partial f\left(0, 0, 0\right)}{\partial x} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial y} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial z}\end{bmatrix}
\end{align*}

As for the final term,
\begin{align*}
\frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}\end{bmatrix}\begin{bmatrix}x \\ y \\ z \end{bmatrix} \\
= \frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}z\end{bmatrix} \\
= \frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2}x^2 + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}xy + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}xz + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}yx + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2}y^2 + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}yz + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}zx + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}zy + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}z^2\end{bmatrix} \\
= \frac{1}{2} \begin{bmatrix} 2x \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2} + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}z + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}x + 2y\frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2} + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}z + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}y + 2z\frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}
\end{bmatrix}
\end{align*}

If the partial differentiations are commutative, I should have 2 as a common factor in the matrix, which cancels out with the ##\frac{1}{2}##, and I can factor out the ##x##, ##y##, and ##z##. The final matrix notation should be,
\begin{align*}
\mathbf{0} &= 0 + \frac{\partial f\left(0, 0, 0\right)}{\partial \mathbf{x}} + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial \mathbf{x}^2} \mathbf{x} \\
\mathbf{x} &= -\frac{\partial^2 f\left(0, 0, 0\right)}{\partial \mathbf{x}^2}^{-1} \frac{\partial f\left(0, 0, 0\right)}{\partial \mathbf{x}}
\end{align*}

And I just need to input this into the original equation to find the maximum (or minimum) point, right?

tnich said:
Hints: What conditions on ∇2f(a)∇2f(a)\nabla^2 f(\mathbf a) would have to hold for f(a)f(a)f(\mathbf a) to be a maximum or a minimum? Are you looking for a global extremum or a local one?

As for the second derivative, I think it should be positive if it's minimum and otherwise if it's maximum. I am interested in the local one, how much will it change if I am somehow interested in the global extremum?
 
First, make sure you understand what a solution looks like. ##f(\mathbf x)## is a generic function, and you may not be able get an exact numerical solution for a maximum or minimum. You can figure out what conditions a maximum or minimum would satisfy, though. You have already guessed that ##\nabla f(\mathbf a)=\mathbf 0## at a local maximum or minimum). Start with

$$f(\mathbf x) = f(\mathbf a) + (\mathbf x - \mathbf a)^T \nabla f(\mathbf a) + \frac 1 2 (\mathbf x - \mathbf a)^T \nabla^2 f(\mathbf a) (\mathbf x - \mathbf a) + \dots$$

Now suppose you have a value of ##\mathbf a## for which ##\nabla f(\mathbf a)=\mathbf 0## and substitute ##\mathbf 0## for ##\nabla f(\mathbf a)## in the Taylor series expansion. Notice that you have an infinite number of terms in the Taylor expansion, so you can't just look at the first three and ignore the rest. You can, however, make ##\mathbf x## as close to ##\mathbf a## as you want. This tells you what happens to ##f(\mathbf x)## in a small neighborhood of ##\mathbf a## and makes the rest of the terms insignificantly small. Now you can consider what the matrix of second partials ##\nabla^2 f(\mathbf a)## needs to look like to make ##f(\mathbf a)## a maximum or minimum point in that neighborhood.

Knowing what a solution looks like, you can start to design numerical methods for approximating maxima or minima. You have come close to deriving a gradient descent method in your final solution, but here are some additional things you should consider.

First, assuming that ##\mathbf a=\mathbf 0## pretty much guarantees that you will get a local min or max near ##\mathbf x=\mathbf 0##, which is fine if that is the one you want, but the function may have other optima you are interested in.

Second, except in special cases, a gradient descent will not get you the exact solution, though you can do successive iterations and get closer to it.

Finally, while a gradient descent method sends you in the right direction toward an extremum, you may overshoot or undershoot it, so you may have to choose how far to go in that direction.
 
Thank you for your reply!

Actually, I am trying to understand the SIFT algorithm and trying to write it in code; it interpolates the neighbourhood of a pixel using the first three terms of the Taylor series. I am pretty much interested in the local extremum point, but I was curious for the absolute extremum. Sorry about that. :P
 
There are two things I don't understand about this problem. First, when finding the nth root of a number, there should in theory be n solutions. However, the formula produces n+1 roots. Here is how. The first root is simply ##\left(r\right)^{\left(\frac{1}{n}\right)}##. Then you multiply this first root by n additional expressions given by the formula, as you go through k=0,1,...n-1. So you end up with n+1 roots, which cannot be correct. Let me illustrate what I mean. For this...
Back
Top