Derivative of argmin/argmax w.r.t. auxiliary parameter?

aphirst · May 26, 2018

As part of my work, I'm making use of the familiar properties of function minima/maxima in a way which I can't seem to find in the literature. I was hoping that by describing it here, someone else might recognise it and be able to point me to a citation. I think it's highly unlikely that I'm the first to do this, since it's so straightforward.

Say you have a scalar function: $$f(\mathbf{x},q)$$ where: $$\mathbf{x} = \begin{pmatrix}x_1 \\ x_2 \\ \vdots \end{pmatrix}$$ and where ##q## is an auxiliary parameter - perhaps ##f(\mathbf{x})## represents some physical quantity, and ##q## parametrises the physical system somehow. Subscripts in ##\mathbf{x},u## denote partial derivatives w.r.t. those variables.

Let's assume that ##f(\mathbf{x},q)## is continuous and differentiable in ##\mathbf{x}## and ##q## up to at least ##\mathcal{C}^3##.

Let's say you use a robust numerical procedure to obtain the parameters ##\mathbf{x}_i## which give a local minimum of ##f(\mathbf{x})## from some (irrelevant) starting point, at some ##q##: $$\mathbf{x}_i = \mathrm{argmin} f(\mathbf{x})$$ Let's say we're actually interested in ##\partial_q \mathbf{x}_i##: the rate of change of the solved variables ##\mathbf{x}_i## with respect to the parameter ##q## describing the system itself. Of course, as you change ##q## (change the system), you expect different solved values of ##\mathbf{x}_i##.

From the definition of a minimum (actually any extremum), at the minimum: $$\nabla_{\mathbf{x}} f = 0$$ By taking an additional ##\partial_q## (and swapping the order of partial derivatives via Schwarz' theorem): $$\mathbf{H}_{\mathbf{x}} \partial_q \mathbf{x}_i + \partial_q \nabla_{\mathbf{x}} f= 0$$ where ##\mathbf{H}_\mathbf{x}## is the Hessian of ##f## w.r.t. ##\mathbf{x}##, and where the expression takes advantage of the total derivative (is that the correct term?): $$\frac{\partial g_i(\mathbf{x}_i,q)}{\partial q} = \frac{\partial g}{\partial x_1} \frac{\partial x_{i,1}}{\partial q} + \frac{\partial g}{\partial x_2} \frac{\partial x_{i,2}}{\partial q}+ ... + \frac{\partial g}{\partial q}$$ From here, obtaining ##\partial_q \mathbf{x}_i## involves simply solving the linear equation system involving the Hessian matrix.

Issues with my compact notation aside, I can confirm that this approach is very successful, and let's me obtain quasi-analytic Jacobian matrices for physical quantities w.r.t. other physical parameters, which beforehand seemed to unavoidably necessitate numerical derivatives (e.g. finite difference).

Is there a name for what I've just done here? Surely I can't have invented it?

andrewkirk · May 27, 2018

I think what you have done is used an application of the Implicit Function Theorem.

Let the ##\mathbf x## in your problem be in ##\mathbb R^n##. Then we can re-cast your function ##f## as a function from ##\mathbb R^{n+1}\to\mathbb R## so that the first argument is ##q## and the next ##n## arguments are ##\mathbf x##.

Define a function ##h:\mathbb R^{n+1}\to \mathbb R^n## whose first component function ##h^1## is constant at 0 and the ##k##th component function for ##2\le k\le n## is ##h^k:\mathbb R^{n+1}\to\mathbb R## such that ##h^k( q,\mathbf x) =
\frac{\partial }{\partial x_k} f(q,\mathbf x)##.

Then by the Implicit Function Theorem there exists a function ##g:\mathbb R \to \mathbb R^n## whose graph is the set of points for which ##h(q,g(q))=0##, that is, the set of all pairs of parameter ##q## with the coordinates ##g(q)## of the local minimum point of ##f## given parameter ##q##.

I have used the notation of the linked wikipedia article, to make it easier to use their formulas. So my ##g## does not refer to the same function as your ##g## does.

Then the final equation in the section entitled 'Statement of the Theorem' corresponds to the equation you came up with. Note that the wiki's Jacobian corresponds to your Hessian because it is a Jacobian of the function ##h## which is a gradient function.

Derivative of argmin/argmax w.r.t. auxiliary parameter?

1. What is the purpose of taking the derivative of argmin/argmax w.r.t. auxiliary parameter?

2. How do you calculate the derivative of argmin/argmax w.r.t. auxiliary parameter?

3. Can the derivative of argmin/argmax w.r.t. auxiliary parameter be negative?

4. How does the derivative of argmin/argmax w.r.t. auxiliary parameter affect the original function?

5. Can the derivative of argmin/argmax w.r.t. auxiliary parameter be used for non-linear functions?

Similar threads

Hot Threads

Recent Insights