Understanding of Higher-Order Derivatives

  • Context: Graduate 
  • Thread starter Thread starter shaggymoods
  • Start date Start date
  • Tags Tags
    Derivatives
Click For Summary
SUMMARY

This discussion focuses on the interpretation of higher-order derivatives for multivariable functions, specifically the mapping properties of derivatives. The first derivative of a function f: ℝⁿ → ℝ is a linear map λ: ℝⁿ → ℝ, while the second derivative D²f is a map from ℝⁿ to L(ℝⁿ, L(ℝⁿ, ℝ)), which is isomorphic to the space of n x n matrices, known as the Hessian matrix. The confusion arises in understanding why Dⁿf maps to ℝⁿᵏ instead of ℝⁿ. The key takeaway is that derivatives serve as best linear approximations, necessitating two arguments for evaluation: a location and a direction.

PREREQUISITES
  • Understanding of multivariable calculus
  • Familiarity with linear maps and their properties
  • Knowledge of tensor products and dual spaces
  • Concept of Hessian matrices in optimization
NEXT STEPS
  • Study the properties of Hessian matrices in multivariable optimization
  • Learn about the implications of higher-order derivatives in Taylor series expansions
  • Explore the concept of Fréchet derivatives in functional analysis
  • Investigate applications of higher-order derivatives in machine learning algorithms
USEFUL FOR

Mathematicians, students of multivariable calculus, and professionals in fields requiring optimization techniques will benefit from this discussion.

shaggymoods
Messages
26
Reaction score
0
Hey guys, so this may be a really silly question, but I'm trying to grasp a subtle point about higher-order derivatives of multivariable functions. In particular, suppose we have an infinitely differentiable function

[tex]f: \mathbb{R}^{n} \rightarrow \mathbb{R}[/tex]

I know that the first derivative of this function is a linear map [tex]\lambda: \mathbb{R}^{n}\rightarrow\mathbb{R}[/tex]. However, when we take the second-derivative of [tex]\lambda[/tex], some questions arise for me:

1.) If we are taking this derivative when considering [tex]\lambda[/tex] as a linear function, then we'd just get back [tex]\lambda[/tex], which isn't the case. So how are we interpreting the first derivative when taking a second?

2.) In general, why do we say that [tex]D^{k}f:\mathbb{R}^{n^{k}}\rightarrow\mathbb{R}[/tex] and not [tex]D^{k}f:\mathbb{R}^{n}\rightarrow\mathbb{R}[/tex] ??

Thanks in advance.
 
Physics news on Phys.org
If [tex]f : \mathbb{R}^n \to \mathbb{R}[/tex], then the derivative of [tex]f[/tex] is a linear map [tex]\lambda : \mathbb{R}^n \to \mathbb{R}[/tex] at each point in [tex]\mathbb{R}^n[/tex]. That is to say, the derivative of [tex]f[/tex], properly considered, is a map [tex]Df : \mathbb{R}^n \to L(\mathbb{R}^n, \mathbb{R})[/tex], where [tex]L(\mathbb{R}^n, \mathbb{R})[/tex] denotes the space of all linear maps [tex]\lambda : \mathbb{R}^n \to \mathbb{R}[/tex], which is just the dual of [tex]\mathbb{R}^n[/tex] (and is thus isomorphic to [tex]\mathbb{R}^n[/tex]). The second derivative of [tex]f[/tex] is then a map [tex]D^2 f : \mathbb{R}^n \to L(\mathbb{R}^n, L(\mathbb{R}^n, \mathbb{R})) \cong L(\mathbb{R}^n, \mathbb{R}^n)[/tex], where [tex]L(\mathbb{R}^n, \mathbb{R}^n)[/tex] is the space of all [tex]n \times n[/tex] matrices, and is isomorphic to [tex]\mathbb{R}^{n^2}[/tex]. (The output of the second derivative is usually called the Hessian matrix of [tex]f[/tex].) Continuing in this vein, you can show that [tex]D^k f[/tex] is a map from [tex]\mathbb{R}^n[/tex] to [tex]\mathbb{R}^{n^k}[/tex], not a map from [tex]\mathbb{R}^{n^k} \to \mathbb{R}[/tex] as you suggest in #2.

Basically, what's going on here is that a derivative, properly defined, is a best linear approximation to a function. Thus, at some point [tex]\mathbf{p} \in \mathbb{R}^n[/tex], the derivative [tex]Df[/tex] takes the value of the linear map [tex]\lambda : \mathbb{R}^n \to \mathbb{R}[/tex] which most closely resembles [tex]f[/tex] near [tex]\mathbf{p}[/tex]. Thus, [tex]Df[/tex] is actually a map from [tex]\mathbb{R}^n[/tex] into the space of all possible such approximations, and [tex]D^k f[/tex] is a map from [tex]\mathbb{R}^n[/tex] into some higher tensor product of [tex]\mathbb{R}^n[/tex] and its dual space. Your answer to #1 is thus that, while elements of the range of [tex]Df[/tex] must be linear maps and have trivial derivatives, [tex]Df[/tex] itself is not necessarily linear. This is why it is necessary to specify two arguments when evaluating [tex]Df[/tex]: a location [tex]\mathbf{a}[/tex], and a direction [tex]\mathbf{h}[/tex]. The location specifies a linear map, i.e., there is some linear map [tex]\lambda[/tex] for which [tex]Df : \mathbf{a} \mapsto \lambda[/tex]. The direction then serves as the argument for [tex]\lambda[/tex], and, in a slight abuse of notation, we usually write [tex]\lambda(\mathbf{h}) \equiv Df(\mathbf{a})(\mathbf{h})[/tex] or [tex]Df(\mathbf{a}, \mathbf{h})[/tex].
 

Similar threads

  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 2 ·
Replies
2
Views
4K