Graduate Derivative of argmin/argmax w.r.t. auxiliary parameter?

Click For Summary
The discussion focuses on deriving the rate of change of the parameters at a local minimum of a scalar function with respect to an auxiliary parameter. The approach involves using the Implicit Function Theorem to recast the function and establish a relationship between the parameters and the auxiliary variable. By applying the total derivative and the Hessian matrix, a linear equation system can be solved to obtain the desired derivatives. This method allows for the calculation of quasi-analytic Jacobian matrices for physical quantities without relying on numerical derivatives. The conversation confirms that this technique is not new and is indeed related to established mathematical principles.
aphirst
Gold Member
Messages
25
Reaction score
5
As part of my work, I'm making use of the familiar properties of function minima/maxima in a way which I can't seem to find in the literature. I was hoping that by describing it here, someone else might recognise it and be able to point me to a citation. I think it's highly unlikely that I'm the first to do this, since it's so straightforward.

Say you have a scalar function: $$f(\mathbf{x},q)$$ where: $$\mathbf{x} = \begin{pmatrix}x_1 \\ x_2 \\ \vdots \end{pmatrix}$$ and where ##q## is an auxiliary parameter - perhaps ##f(\mathbf{x})## represents some physical quantity, and ##q## parametrises the physical system somehow. Subscripts in ##\mathbf{x},u## denote partial derivatives w.r.t. those variables.

Let's assume that ##f(\mathbf{x},q)## is continuous and differentiable in ##\mathbf{x}## and ##q## up to at least ##\mathcal{C}^3##.

Let's say you use a robust numerical procedure to obtain the parameters ##\mathbf{x}_i## which give a local minimum of ##f(\mathbf{x})## from some (irrelevant) starting point, at some ##q##: $$\mathbf{x}_i = \mathrm{argmin} f(\mathbf{x})$$ Let's say we're actually interested in ##\partial_q \mathbf{x}_i##: the rate of change of the solved variables ##\mathbf{x}_i## with respect to the parameter ##q## describing the system itself. Of course, as you change ##q## (change the system), you expect different solved values of ##\mathbf{x}_i##.

From the definition of a minimum (actually any extremum), at the minimum: $$\nabla_{\mathbf{x}} f = 0$$ By taking an additional ##\partial_q## (and swapping the order of partial derivatives via Schwarz' theorem): $$\mathbf{H}_{\mathbf{x}} \partial_q \mathbf{x}_i + \partial_q \nabla_{\mathbf{x}} f= 0$$ where ##\mathbf{H}_\mathbf{x}## is the Hessian of ##f## w.r.t. ##\mathbf{x}##, and where the expression takes advantage of the total derivative (is that the correct term?): $$\frac{\partial g_i(\mathbf{x}_i,q)}{\partial q} = \frac{\partial g}{\partial x_1} \frac{\partial x_{i,1}}{\partial q} + \frac{\partial g}{\partial x_2} \frac{\partial x_{i,2}}{\partial q}+ ... + \frac{\partial g}{\partial q}$$ From here, obtaining ##\partial_q \mathbf{x}_i## involves simply solving the linear equation system involving the Hessian matrix.

Issues with my compact notation aside, I can confirm that this approach is very successful, and let's me obtain quasi-analytic Jacobian matrices for physical quantities w.r.t. other physical parameters, which beforehand seemed to unavoidably necessitate numerical derivatives (e.g. finite difference).

Is there a name for what I've just done here? Surely I can't have invented it?
 
Last edited:
Physics news on Phys.org
I think what you have done is used an application of the Implicit Function Theorem.

Let the ##\mathbf x## in your problem be in ##\mathbb R^n##. Then we can re-cast your function ##f## as a function from ##\mathbb R^{n+1}\to\mathbb R## so that the first argument is ##q## and the next ##n## arguments are ##\mathbf x##.

Define a function ##h:\mathbb R^{n+1}\to \mathbb R^n## whose first component function ##h^1## is constant at 0 and the ##k##th component function for ##2\le k\le n## is ##h^k:\mathbb R^{n+1}\to\mathbb R## such that ##h^k( q,\mathbf x) =
\frac{\partial }{\partial x_k} f(q,\mathbf x)##.

Then by the Implicit Function Theorem there exists a function ##g:\mathbb R \to \mathbb R^n## whose graph is the set of points for which ##h(q,g(q))=0##, that is, the set of all pairs of parameter ##q## with the coordinates ##g(q)## of the local minimum point of ##f## given parameter ##q##.

I have used the notation of the linked wikipedia article, to make it easier to use their formulas. So my ##g## does not refer to the same function as your ##g## does.

Then the final equation in the section entitled 'Statement of the Theorem' corresponds to the equation you came up with. Note that the wiki's Jacobian corresponds to your Hessian because it is a Jacobian of the function ##h## which is a gradient function.
 
  • Like
Likes aphirst
Relativistic Momentum, Mass, and Energy Momentum and mass (...), the classic equations for conserving momentum and energy are not adequate for the analysis of high-speed collisions. (...) The momentum of a particle moving with velocity ##v## is given by $$p=\cfrac{mv}{\sqrt{1-(v^2/c^2)}}\qquad{R-10}$$ ENERGY In relativistic mechanics, as in classic mechanics, the net force on a particle is equal to the time rate of change of the momentum of the particle. Considering one-dimensional...

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
5
Views
4K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 12 ·
Replies
12
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K