Hey guys, so this may be a really silly question, but I'm trying to grasp a subtle point about higher-order derivatives of multivariable functions. In particular, suppose we have an infinitely differentiable function [tex]f: \mathbb{R}^{n} \rightarrow \mathbb{R}[/tex] I know that the first derivative of this function is a linear map [tex]\lambda: \mathbb{R}^{n}\rightarrow\mathbb{R}[/tex]. However, when we take the second-derivative of [tex]\lambda[/tex], some questions arise for me: 1.) If we are taking this derivative when considering [tex]\lambda[/tex] as a linear function, then we'd just get back [tex]\lambda[/tex], which isn't the case. So how are we interpreting the first derivative when taking a second? 2.) In general, why do we say that [tex]D^{k}f:\mathbb{R}^{n^{k}}\rightarrow\mathbb{R}[/tex] and not [tex]D^{k}f:\mathbb{R}^{n}\rightarrow\mathbb{R}[/tex] ?? Thanks in advance.
If [tex] f : \mathbb{R}^n \to \mathbb{R} [/tex], then the derivative of [tex] f [/tex] is a linear map [tex]\lambda : \mathbb{R}^n \to \mathbb{R} [/tex] at each point in [tex] \mathbb{R}^n [/tex]. That is to say, the derivative of [tex] f [/tex], properly considered, is a map [tex] Df : \mathbb{R}^n \to L(\mathbb{R}^n, \mathbb{R}) [/tex], where [tex] L(\mathbb{R}^n, \mathbb{R}) [/tex] denotes the space of all linear maps [tex] \lambda : \mathbb{R}^n \to \mathbb{R} [/tex], which is just the dual of [tex] \mathbb{R}^n [/tex] (and is thus isomorphic to [tex] \mathbb{R}^n [/tex]). The second derivative of [tex] f [/tex] is then a map [tex] D^2 f : \mathbb{R}^n \to L(\mathbb{R}^n, L(\mathbb{R}^n, \mathbb{R})) \cong L(\mathbb{R}^n, \mathbb{R}^n) [/tex], where [tex] L(\mathbb{R}^n, \mathbb{R}^n) [/tex] is the space of all [tex] n \times n [/tex] matrices, and is isomorphic to [tex] \mathbb{R}^{n^2} [/tex]. (The output of the second derivative is usually called the Hessian matrix of [tex] f [/tex].) Continuing in this vein, you can show that [tex] D^k f [/tex] is a map from [tex] \mathbb{R}^n [/tex] to [tex] \mathbb{R}^{n^k} [/tex], not a map from [tex] \mathbb{R}^{n^k} \to \mathbb{R} [/tex] as you suggest in #2. Basically, what's going on here is that a derivative, properly defined, is a best linear approximation to a function. Thus, at some point [tex] \mathbf{p} \in \mathbb{R}^n [/tex], the derivative [tex] Df [/tex] takes the value of the linear map [tex]\lambda : \mathbb{R}^n \to \mathbb{R} [/tex] which most closely resembles [tex] f [/tex] near [tex] \mathbf{p} [/tex]. Thus, [tex] Df [/tex] is actually a map from [tex] \mathbb{R}^n [/tex] into the space of all possible such approximations, and [tex] D^k f [/tex] is a map from [tex] \mathbb{R}^n [/tex] into some higher tensor product of [tex] \mathbb{R}^n [/tex] and its dual space. Your answer to #1 is thus that, while elements of the range of [tex] Df [/tex] must be linear maps and have trivial derivatives, [tex] Df [/tex] itself is not necessarily linear. This is why it is necessary to specify two arguments when evaluating [tex] Df [/tex]: a location [tex] \mathbf{a} [/tex], and a direction [tex] \mathbf{h} [/tex]. The location specifies a linear map, i.e., there is some linear map [tex] \lambda [/tex] for which [tex] Df : \mathbf{a} \mapsto \lambda [/tex]. The direction then serves as the argument for [tex] \lambda [/tex], and, in a slight abuse of notation, we usually write [tex] \lambda(\mathbf{h}) \equiv Df(\mathbf{a})(\mathbf{h}) [/tex] or [tex] Df(\mathbf{a}, \mathbf{h}) [/tex].