# Understanding of Higher-Order Derivatives

by shaggymoods
Tags: higher order, multivariable
 P: 26 Hey guys, so this may be a really silly question, but I'm trying to grasp a subtle point about higher-order derivatives of multivariable functions. In particular, suppose we have an infinitely differentiable function $$f: \mathbb{R}^{n} \rightarrow \mathbb{R}$$ I know that the first derivative of this function is a linear map $$\lambda: \mathbb{R}^{n}\rightarrow\mathbb{R}$$. However, when we take the second-derivative of $$\lambda$$, some questions arise for me: 1.) If we are taking this derivative when considering $$\lambda$$ as a linear function, then we'd just get back $$\lambda$$, which isn't the case. So how are we interpreting the first derivative when taking a second? 2.) In general, why do we say that $$D^{k}f:\mathbb{R}^{n^{k}}\rightarrow\mathbb{R}$$ and not $$D^{k}f:\mathbb{R}^{n}\rightarrow\mathbb{R}$$ ?? Thanks in advance.
 P: 111 If $$f : \mathbb{R}^n \to \mathbb{R}$$, then the derivative of $$f$$ is a linear map $$\lambda : \mathbb{R}^n \to \mathbb{R}$$ at each point in $$\mathbb{R}^n$$. That is to say, the derivative of $$f$$, properly considered, is a map $$Df : \mathbb{R}^n \to L(\mathbb{R}^n, \mathbb{R})$$, where $$L(\mathbb{R}^n, \mathbb{R})$$ denotes the space of all linear maps $$\lambda : \mathbb{R}^n \to \mathbb{R}$$, which is just the dual of $$\mathbb{R}^n$$ (and is thus isomorphic to $$\mathbb{R}^n$$). The second derivative of $$f$$ is then a map $$D^2 f : \mathbb{R}^n \to L(\mathbb{R}^n, L(\mathbb{R}^n, \mathbb{R})) \cong L(\mathbb{R}^n, \mathbb{R}^n)$$, where $$L(\mathbb{R}^n, \mathbb{R}^n)$$ is the space of all $$n \times n$$ matrices, and is isomorphic to $$\mathbb{R}^{n^2}$$. (The output of the second derivative is usually called the Hessian matrix of $$f$$.) Continuing in this vein, you can show that $$D^k f$$ is a map from $$\mathbb{R}^n$$ to $$\mathbb{R}^{n^k}$$, not a map from $$\mathbb{R}^{n^k} \to \mathbb{R}$$ as you suggest in #2. Basically, what's going on here is that a derivative, properly defined, is a best linear approximation to a function. Thus, at some point $$\mathbf{p} \in \mathbb{R}^n$$, the derivative $$Df$$ takes the value of the linear map $$\lambda : \mathbb{R}^n \to \mathbb{R}$$ which most closely resembles $$f$$ near $$\mathbf{p}$$. Thus, $$Df$$ is actually a map from $$\mathbb{R}^n$$ into the space of all possible such approximations, and $$D^k f$$ is a map from $$\mathbb{R}^n$$ into some higher tensor product of $$\mathbb{R}^n$$ and its dual space. Your answer to #1 is thus that, while elements of the range of $$Df$$ must be linear maps and have trivial derivatives, $$Df$$ itself is not necessarily linear. This is why it is necessary to specify two arguments when evaluating $$Df$$: a location $$\mathbf{a}$$, and a direction $$\mathbf{h}$$. The location specifies a linear map, i.e., there is some linear map $$\lambda$$ for which $$Df : \mathbf{a} \mapsto \lambda$$. The direction then serves as the argument for $$\lambda$$, and, in a slight abuse of notation, we usually write $$\lambda(\mathbf{h}) \equiv Df(\mathbf{a})(\mathbf{h})$$ or $$Df(\mathbf{a}, \mathbf{h})$$.