# Derivative of argmin/argmax w.r.t. auxiliary parameter?

• A
• aphirst
In summary, the conversation discusses the use of familiar properties of function minima/maxima in a way that has not been seen in literature. The speaker describes a scalar function with an auxiliary parameter and explains how they use a numerical procedure to obtain a local minimum. They then introduce the concept of taking an additional partial derivative to find the rate of change of the solved variables with respect to the parameter. This approach has been successful in obtaining quasi-analytic Jacobian matrices and the speaker wonders if there is a name for this method. The expert summarizer concludes that the approach used is an application of the Implicit Function Theorem and provides a detailed explanation of how this theorem applies to the speaker's problem.

#### aphirst

Gold Member
As part of my work, I'm making use of the familiar properties of function minima/maxima in a way which I can't seem to find in the literature. I was hoping that by describing it here, someone else might recognise it and be able to point me to a citation. I think it's highly unlikely that I'm the first to do this, since it's so straightforward.

Say you have a scalar function: $$f(\mathbf{x},q)$$ where: $$\mathbf{x} = \begin{pmatrix}x_1 \\ x_2 \\ \vdots \end{pmatrix}$$ and where ##q## is an auxiliary parameter - perhaps ##f(\mathbf{x})## represents some physical quantity, and ##q## parametrises the physical system somehow. Subscripts in ##\mathbf{x},u## denote partial derivatives w.r.t. those variables.

Let's assume that ##f(\mathbf{x},q)## is continuous and differentiable in ##\mathbf{x}## and ##q## up to at least ##\mathcal{C}^3##.

Let's say you use a robust numerical procedure to obtain the parameters ##\mathbf{x}_i## which give a local minimum of ##f(\mathbf{x})## from some (irrelevant) starting point, at some ##q##: $$\mathbf{x}_i = \mathrm{argmin} f(\mathbf{x})$$ Let's say we're actually interested in ##\partial_q \mathbf{x}_i##: the rate of change of the solved variables ##\mathbf{x}_i## with respect to the parameter ##q## describing the system itself. Of course, as you change ##q## (change the system), you expect different solved values of ##\mathbf{x}_i##.

From the definition of a minimum (actually any extremum), at the minimum: $$\nabla_{\mathbf{x}} f = 0$$ By taking an additional ##\partial_q## (and swapping the order of partial derivatives via Schwarz' theorem): $$\mathbf{H}_{\mathbf{x}} \partial_q \mathbf{x}_i + \partial_q \nabla_{\mathbf{x}} f= 0$$ where ##\mathbf{H}_\mathbf{x}## is the Hessian of ##f## w.r.t. ##\mathbf{x}##, and where the expression takes advantage of the total derivative (is that the correct term?): $$\frac{\partial g_i(\mathbf{x}_i,q)}{\partial q} = \frac{\partial g}{\partial x_1} \frac{\partial x_{i,1}}{\partial q} + \frac{\partial g}{\partial x_2} \frac{\partial x_{i,2}}{\partial q}+ ... + \frac{\partial g}{\partial q}$$ From here, obtaining ##\partial_q \mathbf{x}_i## involves simply solving the linear equation system involving the Hessian matrix.

Issues with my compact notation aside, I can confirm that this approach is very successful, and let's me obtain quasi-analytic Jacobian matrices for physical quantities w.r.t. other physical parameters, which beforehand seemed to unavoidably necessitate numerical derivatives (e.g. finite difference).

Is there a name for what I've just done here? Surely I can't have invented it?

Last edited:
I think what you have done is used an application of the Implicit Function Theorem.

Let the ##\mathbf x## in your problem be in ##\mathbb R^n##. Then we can re-cast your function ##f## as a function from ##\mathbb R^{n+1}\to\mathbb R## so that the first argument is ##q## and the next ##n## arguments are ##\mathbf x##.

Define a function ##h:\mathbb R^{n+1}\to \mathbb R^n## whose first component function ##h^1## is constant at 0 and the ##k##th component function for ##2\le k\le n## is ##h^k:\mathbb R^{n+1}\to\mathbb R## such that ##h^k( q,\mathbf x) =
\frac{\partial }{\partial x_k} f(q,\mathbf x)##.

Then by the Implicit Function Theorem there exists a function ##g:\mathbb R \to \mathbb R^n## whose graph is the set of points for which ##h(q,g(q))=0##, that is, the set of all pairs of parameter ##q## with the coordinates ##g(q)## of the local minimum point of ##f## given parameter ##q##.

I have used the notation of the linked wikipedia article, to make it easier to use their formulas. So my ##g## does not refer to the same function as your ##g## does.

Then the final equation in the section entitled 'Statement of the Theorem' corresponds to the equation you came up with. Note that the wiki's Jacobian corresponds to your Hessian because it is a Jacobian of the function ##h## which is a gradient function.

aphirst

## 1. What is the purpose of taking the derivative of argmin/argmax w.r.t. auxiliary parameter?

The derivative of argmin/argmax w.r.t. auxiliary parameter helps us to find the optimal value of the auxiliary parameter that minimizes or maximizes the original function. This is useful in optimization problems where the auxiliary parameter can affect the performance of the function.

## 2. How do you calculate the derivative of argmin/argmax w.r.t. auxiliary parameter?

The derivative of argmin/argmax w.r.t. auxiliary parameter is calculated using the chain rule. First, we take the derivative of the function with respect to the auxiliary parameter, then we multiply it by the derivative of the auxiliary parameter with respect to the original function.

## 3. Can the derivative of argmin/argmax w.r.t. auxiliary parameter be negative?

Yes, the derivative of argmin/argmax w.r.t. auxiliary parameter can be negative. This indicates that the optimal value of the auxiliary parameter lies on the left side of the current value. Similarly, a positive derivative indicates that the optimal value lies on the right side.

## 4. How does the derivative of argmin/argmax w.r.t. auxiliary parameter affect the original function?

The derivative of argmin/argmax w.r.t. auxiliary parameter affects the original function by providing information on how the function changes with respect to the auxiliary parameter. This allows us to adjust the auxiliary parameter to improve the performance of the function.

## 5. Can the derivative of argmin/argmax w.r.t. auxiliary parameter be used for non-linear functions?

Yes, the derivative of argmin/argmax w.r.t. auxiliary parameter can be used for non-linear functions. However, the calculation may be more complex compared to linear functions. In such cases, numerical methods such as gradient descent may be used to find the optimal value of the auxiliary parameter.