Derivative of argmin/argmax w.r.t. auxiliary parameter?

Click For Summary
SUMMARY

This discussion centers on the derivation of the rate of change of the argmin of a scalar function \( f(\mathbf{x}, q) \) with respect to an auxiliary parameter \( q \). The author successfully applies the Implicit Function Theorem to establish that the Hessian matrix \( \mathbf{H}_{\mathbf{x}} \) can be utilized to solve for \( \partial_q \mathbf{x}_i \), the change in the local minimum parameters as \( q \) varies. This method allows for the computation of quasi-analytic Jacobian matrices for physical quantities, eliminating the need for numerical derivatives. The approach is confirmed to be effective and is grounded in established mathematical principles.

PREREQUISITES
  • Understanding of scalar functions and their minima/maxima properties
  • Familiarity with the Implicit Function Theorem
  • Knowledge of Hessian matrices and their role in optimization
  • Proficiency in calculus, particularly partial derivatives
NEXT STEPS
  • Study the Implicit Function Theorem in detail to understand its applications
  • Learn about Hessian matrices and their significance in optimization problems
  • Explore numerical methods for optimization, including robust numerical procedures
  • Investigate quasi-analytic methods for computing Jacobians in physical systems
USEFUL FOR

Mathematicians, physicists, and engineers involved in optimization problems, particularly those working with scalar functions and their derivatives in relation to auxiliary parameters.

aphirst
Gold Member
Messages
25
Reaction score
5
As part of my work, I'm making use of the familiar properties of function minima/maxima in a way which I can't seem to find in the literature. I was hoping that by describing it here, someone else might recognise it and be able to point me to a citation. I think it's highly unlikely that I'm the first to do this, since it's so straightforward.

Say you have a scalar function: $$f(\mathbf{x},q)$$ where: $$\mathbf{x} = \begin{pmatrix}x_1 \\ x_2 \\ \vdots \end{pmatrix}$$ and where ##q## is an auxiliary parameter - perhaps ##f(\mathbf{x})## represents some physical quantity, and ##q## parametrises the physical system somehow. Subscripts in ##\mathbf{x},u## denote partial derivatives w.r.t. those variables.

Let's assume that ##f(\mathbf{x},q)## is continuous and differentiable in ##\mathbf{x}## and ##q## up to at least ##\mathcal{C}^3##.

Let's say you use a robust numerical procedure to obtain the parameters ##\mathbf{x}_i## which give a local minimum of ##f(\mathbf{x})## from some (irrelevant) starting point, at some ##q##: $$\mathbf{x}_i = \mathrm{argmin} f(\mathbf{x})$$ Let's say we're actually interested in ##\partial_q \mathbf{x}_i##: the rate of change of the solved variables ##\mathbf{x}_i## with respect to the parameter ##q## describing the system itself. Of course, as you change ##q## (change the system), you expect different solved values of ##\mathbf{x}_i##.

From the definition of a minimum (actually any extremum), at the minimum: $$\nabla_{\mathbf{x}} f = 0$$ By taking an additional ##\partial_q## (and swapping the order of partial derivatives via Schwarz' theorem): $$\mathbf{H}_{\mathbf{x}} \partial_q \mathbf{x}_i + \partial_q \nabla_{\mathbf{x}} f= 0$$ where ##\mathbf{H}_\mathbf{x}## is the Hessian of ##f## w.r.t. ##\mathbf{x}##, and where the expression takes advantage of the total derivative (is that the correct term?): $$\frac{\partial g_i(\mathbf{x}_i,q)}{\partial q} = \frac{\partial g}{\partial x_1} \frac{\partial x_{i,1}}{\partial q} + \frac{\partial g}{\partial x_2} \frac{\partial x_{i,2}}{\partial q}+ ... + \frac{\partial g}{\partial q}$$ From here, obtaining ##\partial_q \mathbf{x}_i## involves simply solving the linear equation system involving the Hessian matrix.

Issues with my compact notation aside, I can confirm that this approach is very successful, and let's me obtain quasi-analytic Jacobian matrices for physical quantities w.r.t. other physical parameters, which beforehand seemed to unavoidably necessitate numerical derivatives (e.g. finite difference).

Is there a name for what I've just done here? Surely I can't have invented it?
 
Last edited:
Physics news on Phys.org
I think what you have done is used an application of the Implicit Function Theorem.

Let the ##\mathbf x## in your problem be in ##\mathbb R^n##. Then we can re-cast your function ##f## as a function from ##\mathbb R^{n+1}\to\mathbb R## so that the first argument is ##q## and the next ##n## arguments are ##\mathbf x##.

Define a function ##h:\mathbb R^{n+1}\to \mathbb R^n## whose first component function ##h^1## is constant at 0 and the ##k##th component function for ##2\le k\le n## is ##h^k:\mathbb R^{n+1}\to\mathbb R## such that ##h^k( q,\mathbf x) =
\frac{\partial }{\partial x_k} f(q,\mathbf x)##.

Then by the Implicit Function Theorem there exists a function ##g:\mathbb R \to \mathbb R^n## whose graph is the set of points for which ##h(q,g(q))=0##, that is, the set of all pairs of parameter ##q## with the coordinates ##g(q)## of the local minimum point of ##f## given parameter ##q##.

I have used the notation of the linked wikipedia article, to make it easier to use their formulas. So my ##g## does not refer to the same function as your ##g## does.

Then the final equation in the section entitled 'Statement of the Theorem' corresponds to the equation you came up with. Note that the wiki's Jacobian corresponds to your Hessian because it is a Jacobian of the function ##h## which is a gradient function.
 
  • Like
Likes   Reactions: aphirst

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
5
Views
4K
  • · Replies 24 ·
Replies
24
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 12 ·
Replies
12
Views
5K