Find the Maximum of a Multi-variable Taylor Series

In summary, the conversation discusses the matrix notation of a series and how to find the maximum or minimum point using the first derivative. The first derivative matrix is represented as a gradient, and the second derivative must meet certain conditions for the point to be a maximum or minimum. The conversation also mentions the difference between a global and local extremum.
  • #1
ecastro
254
8
Homework Statement
What is the maximum (or minimum) of a multi-variable Taylor Series?
Relevant Equations
Assume that the function ##f## to be expanded has three variables ##x, y, z##, so its Taylor series expansion is (centered at ##\left(a, b, c\right)##),
\begin{align*}
f\left(x, y, z\right) &= f\left(a, b, c\right) + \left[\frac{\partial f\left(a, b, c\right)}{\partial x}\left(x - a\right) + \frac{\partial f\left(a, b, c\right)}{\partial y}\left(y - b\right) + \frac{\partial f\left(a, b, c\right)}{\partial z}\left(z - c\right)\right] + \frac{1}{2}\left[\frac{\partial^2 f\left(a, b, c\right)}{\partial x^2}\left(x - a\right)^2 + \frac{\partial^2 f\left(a, b, c\right)}{\partial x \partial y}\left(x - a\right)\left(y - b\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial x \partial z}\left(x - a\right)\left(z - c\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial y \partial x}\left(y - b\right)\left(x - a\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial y^2}\left(y - b\right)^2 + \frac{\partial^2 f\left(a, b, c\right)}{\partial y \partial z}\left(y - b\right)\left(z - c\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial z \partial x}\left(z - c\right)\left(x - a\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial z \partial y}\left(z - c\right)\left(y - b\right) + \frac{\partial^2 f\left(a, b, c\right)}{\partial z^2}\left(z - c\right)^2\right],
\end{align*}
which is up to the second-order term.
Firstly, the matrix notation of the series is,
\begin{align*}
f\left(x, y, z\right) &= f\left(a, b, c\right) + \left(\mathbf{x} - \mathbf{a}\right)^T \frac{\partial f\left(a, b, c\right)}{\partial \mathbf{x}} + \frac{1}{2}\left(\mathbf{x} - \mathbf{a}\right)^T \frac{\partial^2 f\left(a, b, c\right)}{\partial \mathbf{x}^2} \left(\mathbf{x} - \mathbf{a}\right),
\end{align*}
where ##\mathbf{x} = \left[x, y, z\right]^T## and ##\mathbf{a} = \left[a, b, c\right]^T##. The first derivative of the series is set to zero to find the maximum or minimum point, then how to apply the first derivative? Is the first derivative matrix to be applied is ##\frac{\partial}{\partial \mathbf{x}} = \left[\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right]^T##?

Thank you in advance.
 
Physics news on Phys.org
  • #2
ecastro said:
Is the first derivative matrix to be applied is ##\frac{\partial}{\partial \mathbf{x}} = \left[\frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z}\right]^T##?

Yes, you seem to have the right concept here. You need to look at the gradient
$$\nabla f(\mathbf x)=\frac{\partial f}{\partial \mathbf{x}} = \left[\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}\right]$$
and consider how your Taylor series will behave for ##\mathbf x## in the neighborhood of ##\mathbf a## when ##\nabla f(\mathbf a) = \mathbf 0##. Hints: What conditions on ##\nabla^2 f(\mathbf a)## would have to hold for ##f(\mathbf a)## to be a maximum or a minimum? Are you looking for a global extremum or a local one?
 
  • #3
Thank you for the confirmation. I arrived at this solution, assuming that ##\left(a, b, c\right) = \left(0, 0, 0\right)## (for short-writing),

\begin{align*}
0 &= \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} f\left(0, 0, 0\right) + \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial f\left(0, 0, 0\right)}{\partial x} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial y} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial z} \end{bmatrix} + \frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}\end{bmatrix}\begin{bmatrix}x \\ y \\ z \end{bmatrix}
\end{align*}

So, the first term is zero because ##f\left(0, 0, 0\right)## is constant. As for the second term, the right-most matrix is constant because the first derivative of the function was evaluated at ##\left(0, 0, 0\right)## (correct me if I am wrong), so it reduces to:
\begin{align*}
\begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial f\left(0, 0, 0\right)}{\partial x} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial y} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial z} \end{bmatrix} &= \begin{bmatrix} \frac{\partial f\left(0, 0, 0\right)}{\partial x} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial y} \\ \frac{\partial f\left(0, 0, 0\right)}{\partial z}\end{bmatrix}
\end{align*}

As for the final term,
\begin{align*}
\frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z} \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y} \quad \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}\end{bmatrix}\begin{bmatrix}x \\ y \\ z \end{bmatrix} \\
= \frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix}x \quad y \quad z \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}z\end{bmatrix} \\
= \frac{1}{2} \begin{bmatrix}\frac{\partial}{\partial x} \\ \frac{\partial}{\partial y} \\ \frac{\partial}{\partial z}\end{bmatrix} \begin{bmatrix} \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2}x^2 + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}xy + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}xz + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}yx + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2}y^2 + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}yz + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}zx + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}zy + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}z^2\end{bmatrix} \\
= \frac{1}{2} \begin{bmatrix} 2x \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x^2} + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}z + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial y}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y\partial x}x + 2y\frac{\partial^2 f\left(0, 0, 0\right)}{\partial y^2} + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}z + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}z \\ \frac{\partial^2 f\left(0, 0, 0\right)}{\partial x \partial z}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial y \partial z}y + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial x}x + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial z \partial y}y + 2z\frac{\partial^2 f\left(0, 0, 0\right)}{\partial z^2}
\end{bmatrix}
\end{align*}

If the partial differentiations are commutative, I should have 2 as a common factor in the matrix, which cancels out with the ##\frac{1}{2}##, and I can factor out the ##x##, ##y##, and ##z##. The final matrix notation should be,
\begin{align*}
\mathbf{0} &= 0 + \frac{\partial f\left(0, 0, 0\right)}{\partial \mathbf{x}} + \frac{\partial^2 f\left(0, 0, 0\right)}{\partial \mathbf{x}^2} \mathbf{x} \\
\mathbf{x} &= -\frac{\partial^2 f\left(0, 0, 0\right)}{\partial \mathbf{x}^2}^{-1} \frac{\partial f\left(0, 0, 0\right)}{\partial \mathbf{x}}
\end{align*}

And I just need to input this into the original equation to find the maximum (or minimum) point, right?

tnich said:
Hints: What conditions on ∇2f(a)∇2f(a)\nabla^2 f(\mathbf a) would have to hold for f(a)f(a)f(\mathbf a) to be a maximum or a minimum? Are you looking for a global extremum or a local one?

As for the second derivative, I think it should be positive if it's minimum and otherwise if it's maximum. I am interested in the local one, how much will it change if I am somehow interested in the global extremum?
 
  • #4
First, make sure you understand what a solution looks like. ##f(\mathbf x)## is a generic function, and you may not be able get an exact numerical solution for a maximum or minimum. You can figure out what conditions a maximum or minimum would satisfy, though. You have already guessed that ##\nabla f(\mathbf a)=\mathbf 0## at a local maximum or minimum). Start with

$$f(\mathbf x) = f(\mathbf a) + (\mathbf x - \mathbf a)^T \nabla f(\mathbf a) + \frac 1 2 (\mathbf x - \mathbf a)^T \nabla^2 f(\mathbf a) (\mathbf x - \mathbf a) + \dots$$

Now suppose you have a value of ##\mathbf a## for which ##\nabla f(\mathbf a)=\mathbf 0## and substitute ##\mathbf 0## for ##\nabla f(\mathbf a)## in the Taylor series expansion. Notice that you have an infinite number of terms in the Taylor expansion, so you can't just look at the first three and ignore the rest. You can, however, make ##\mathbf x## as close to ##\mathbf a## as you want. This tells you what happens to ##f(\mathbf x)## in a small neighborhood of ##\mathbf a## and makes the rest of the terms insignificantly small. Now you can consider what the matrix of second partials ##\nabla^2 f(\mathbf a)## needs to look like to make ##f(\mathbf a)## a maximum or minimum point in that neighborhood.

Knowing what a solution looks like, you can start to design numerical methods for approximating maxima or minima. You have come close to deriving a gradient descent method in your final solution, but here are some additional things you should consider.

First, assuming that ##\mathbf a=\mathbf 0## pretty much guarantees that you will get a local min or max near ##\mathbf x=\mathbf 0##, which is fine if that is the one you want, but the function may have other optima you are interested in.

Second, except in special cases, a gradient descent will not get you the exact solution, though you can do successive iterations and get closer to it.

Finally, while a gradient descent method sends you in the right direction toward an extremum, you may overshoot or undershoot it, so you may have to choose how far to go in that direction.
 
  • #5
Thank you for your reply!

Actually, I am trying to understand the SIFT algorithm and trying to write it in code; it interpolates the neighbourhood of a pixel using the first three terms of the Taylor series. I am pretty much interested in the local extremum point, but I was curious for the absolute extremum. Sorry about that. :P
 

FAQ: Find the Maximum of a Multi-variable Taylor Series

1. What is a Taylor series?

A Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point. It is used to approximate a function and is particularly useful for finding the maximum value of a multi-variable function.

2. How is a multi-variable Taylor series different from a single-variable Taylor series?

A single-variable Taylor series is used to approximate a function of one variable, while a multi-variable Taylor series is used to approximate a function of multiple variables. This means that a multi-variable Taylor series will have more terms and will be more complex than a single-variable Taylor series.

3. Why is it important to find the maximum of a multi-variable Taylor series?

Finding the maximum of a multi-variable Taylor series allows us to determine the maximum value of a function at a specific point. This can be useful in optimization problems, where we want to find the maximum or minimum value of a function in order to optimize a certain outcome.

4. What is the process for finding the maximum of a multi-variable Taylor series?

The process involves taking the partial derivatives of the function with respect to each variable, setting them equal to 0, and solving for the variables. This will give us the coordinates of the critical points, and we can then use the second derivative test to determine whether these points are maximums or minimums.

5. How can the maximum of a multi-variable Taylor series be used in real-world applications?

Finding the maximum of a multi-variable Taylor series can be used in a variety of real-world applications, such as in engineering, economics, and physics. It can be used to optimize the performance of a machine, determine the most efficient way to allocate resources, or predict the trajectory of a projectile, among other things.

Similar threads

Back
Top