# The Pantheon of Derivatives – 5 Part Series

#### Differentiation in a Nutshell

I want to gather the various concepts at one place, to reveal the similarities between them, as they are often hidden by the serial nature of a curriculum.

There are many terms and special cases, which deal with the process of differentiation. The basic idea, however, is the same in all cases: something non-linear, as for instance a multiplication, is approximated by something linear which means, by something we can add.

This reveals already two major consequences: addition is easier than multiplication, that’s why we consider it at all, and as an approximation, it is necessarily a local property around the point of consideration.

Thus the result of our differentiation should always be a linear function like the straight lines we draw in graphs and call them tangents.

And our approximation will get worse the farther away we are from the point we considered. That’s the reason why these ominous infinitesimals come into play. They are nothing obscure, but merely an attempt to quantify and get a hand on the small deviations of our linear approximation to what is really going on.

To begin with, let’s clarify the language:

\begin{align*}

\textrm{differentiation} &- \textrm{certain process to achieve a linear approximation}\\

\textrm{to differentiate} &- \textrm{to proceed a differentiation}\\

\textrm{differential} &-\textrm{infinitesimal linear change of the function value only}\\

\textrm{differentiability}&-\textrm{condition that allows the process of differentiation}\\

\textrm{derivative}&-\textrm{result of a differentiation}\\

\textrm{derivation}&-\textrm{linear mapping that obeys the Leibniz rule}\\

\textrm{to derivate}&- \textrm{to deduce a statement by logical means}

\end{align*}

All these terms are context sensitive and their meanings change, if they are used, e.g. in chemistry, mechanical engineering or common language. But even within mathematics, the terms may vary among different authors. E.g. differential has two meanings, as adjective or as notation for df, i.e. the infinitesimal linear change on the function values. Differentials are used in various applications with varying meanings and even with different mathematical rigor. This is essentially true in calculus where ##\int f(x)dx## and ##\frac{df(x)}{dx}## is only of notational value. The most precise meaning of the term can be found in differential geometry as an exact ##1-##form.

As a thumb rule might serve: diff… refers to the process, derivative to the result.

As differentiability is a local property, it is defined on a domain ##U## which is open, not empty and connected, at a point ##x_0## or ##z_0## in ##U##. I will not mentioned these requirements every time I use them. They are helpful, as one doesn’t have to deal with isolated points or the behavior of a function on boundaries and one always has a way to approach ##x_0## from all sides. So with respect to the approximation which is intended, they come in naturally. I also won’t distinguish between approximations from the left or from the right, since this article is only an overview. So it is always meant as identical from both sides. Moreover, a function is said to be in ##C(U)=C^0(U)## if it is continuous, in ##C^n(U)## if it is ##n-##times continuously differentiable, and in ##C^\infty(U)=\bigcap_{n\in \mathbb{N}}C^n(U)## if it is infinitely many times continuously differentiable. The latter functions are also called **smooth**.

##### Real Functions in one Variable: ##\mathbb{R}##

A function ##f : \mathbb{R} \rightarrow \mathbb{R}## is differentiable at ##x_0##, if the limit

$$\lim_{x\rightarrow x_{0}}{\frac{f(x)-f(x_{0})}{x-x_{0}}}=\lim_{v\rightarrow 0}{\frac{f(x_{0}+v)-f(x_{0})}{v}}=\lim_{\Delta x \rightarrow 0}{\frac{f(x_{0}+\Delta x)-f(x_{0})}{\Delta x}}$$

exists, which is then called the derivative of ##f## at ##x_0## and denoted by

$$\left. f'(x_0)=\frac{d}{dx}\right|_{x=x_0}f(x)$$

This is the definition we learn at school. But I think it hides the crucial point. There is another way to define it, which describes much better the purpose and geometry of the concept, **Weierstraß’ decomposition formula**:

##f## is differentiable at ##x_0## if there is a linear map ##J##, such that

\begin{equation}\label{Diff}

\mathbf{f(x_{0}+v)=f(x_{0})+J(v)+r(v)}

\end{equation}where the error or remainder function ##r## has the property, that it converges faster to zero than linear, which means

$$\lim_{v \rightarrow 0}\frac{r(v)}{v}=0$$

The **derivative** is now the linear function ##J(.) : \mathbb{R}\rightarrow \mathbb{R}## which approximates ##f## at ##x_0## with a (more than linear) error function ##r##. This function is e.g. quadratic as in the Taylor series. Both functions may depend on ##x_0## which plays the role of a constant parameter for them.

##### Real Functions in many Variables: ##\mathbb{R}^{n}##

Here another advantage of the last definition becomes obvious. If our function is defined on real vector spaces, say ##f:\mathbb{R}^n \rightarrow \mathbb{R}^m##, how could we divide vectors? Well, we can’t. So the definition

$$f(x_{0}+v)=f(x_{0})+J(v)+r(v)$$

comes in handy, because it has none. We can directly use the same formula without any adjustments. We only have to specify, what “the remainder ##r(v)## converges faster to zero than linear” means. Since we’re not especially interested in it except it’s general behavior at zero, we simply require

$$\lim_{v \rightarrow 0}\frac{r(v)}{||v||}=0$$

which is more out of practicability to quantify “faster than linear” than it is an essential property. The essential part is the linear approximation ##J(.): \mathbb{R}^n \rightarrow \mathbb{R}^m## which is why differentiation is done for. Again both functions may depend on the constant ##x_0##.

We can also see, that now the direction of ##v## comes into play, which makes sense, since the tangents are now tangent spaces, planes for example. And a plane has many different slopes, generated by two coordinates. It makes a lot of a difference whether we walk on a hill surrounding it, or climbing it.

Therefore the process above is called total differentiation and ##J(.)## the **total derivative**, because it includes all possible directions. In standard coordinates it is the **Jacobian matrix** of ##f(x)## in ##\mathbb{M}_{m \times n}(\mathbb{R})##.

If we are only interested in one special direction, then we get the **directional derivative**. We can take the same definition, only that ##v## is now a specified vector pointing in a certain direction. This means that we approximate ##f## only in one direction. Our directional derivative as linear approximation therefore depends only on one vector ##v## and it can be written as ##J(v)(f(x))=v \cdot f(x) = \vec{v}^\tau \cdot \vec{f(x)} = <v,f(x)>## which maps a vector ##f(x)## to its part of the slope in direction of ##v##. In this case, the directional derivative is also written as:

\begin{equation}\label{Nabla}

\mathbf{J(v)=D_{v}f(x)=\nabla_{v}{f}(x)=\partial_{v}f(x)=\frac{\partial f(x)}{\partial {v}}=f’_{v}(x)}

\end{equation}

If ##f## is also totally differentiable, then additional notations are in use:

\begin{equation}\label{Grad}

\mathbf{J(v)=Df(x)v=Df_{x}\,v=\operatorname {grad} \ f(x)\cdot v=\nabla f(x)\cdot v= (v\cdot \nabla )f(x)}

\end{equation}

A directional derivative is often defined for scalar functions ##f:\mathbb{R}^n \rightarrow \mathbb{R}##, i.e. ##m=1##. This isn’t really a restriction, because one could always simply take all components of ##f = (f_1,\ldots ,f_m)##. Furthermore they are also often defined for unit vectors ##v_0## as direction and then as the limit of ##\frac{1}{t}(f(x_0+t\cdot v_0)-f(x_0)## for ##t \rightarrow 0##. However, there is no need to do this. It’s a matter of taste and only means, that we have to divide by ##||v||## if scales like coordinates are involved.

From here **partial derivatives** are obviously simply the directional derivatives in the various variables ##x_1,\ldots,x_n## of ##f##, the coordinates of ##\mathbb{R}^n##.

The dependencies among these differentiability conditions are as follows:

continuous partially differentiable, i.e. all partial derivatives are continuous

##\Downarrow ##

totally differentiable or differentiable for short

##\Downarrow ##

differentiable in any direction

##\Downarrow ##

partially differentiable

All implications are proper implications. (Counter-) Examples are (from Wikipedia):

##f(x,y) =

\begin{cases}

(x^2+y^2) \cdot \sin \frac{1}{x^2+y^2} & \text{if } (x,y) \neq (0,0)\\

0 & \text{if } (x,y) = (0,0)

\end{cases}

##

is totally differentiable but not continuous partially.

##f(x,y) =

\begin{cases}

\frac{3x^2y-y^3}{x^2+y^2} & \text{if } (x,y) \neq (0,0)\\

0 & \text{if } (x,y) = (0,0)

\end{cases}

##

is differentiable in all directions, but they don’t define a linear function ##J##.

##f(x,y) =

\begin{cases}

\frac{xy^3}{x^2+y^4} & \text{if } (x,y) \neq (0,0)\\

0 & \text{if } (x,y) = (0,0)

\end{cases}

##

is differentiable in all directions, and they define a linear function ##J##, but it is not totally differentiable, because the remainder term doesn’t converge to zero.

##f(x,y) =

\begin{cases}

\frac{2xy}{x^2+y^2} & \text{if } (x,y) \neq (0,0)\\

0 & \text{if } (x,y) = (0,0)

\end{cases}

##

is partially differentiable but not all directional derivatives exist.

##### Complex Functions: ##\mathbb{C}##

Complex functions ##f: U \rightarrow \mathbb{C}## are somehow special and entire textbooks deal with the complex part of analysis. So I will restrict myself to a brief listing of terminology and dependencies. What appears to be more complicated at first glance is to some extend even easier than in the real case.

To begin with, I like to mention, that we haven’t used any specifically properties of ##\mathbb{R}## in the previous sections apart the Euclidean norm and directions. However, both is given over ##\mathbb{C}## as well, and all we have to think about is, that linearity in our definition

\begin{equation}\label{Jacobi-C}

\mathbf{f(x_{0}+v)=f(x_{0})+J(v)+r(v)}

\end{equation}

now means ##\mathbb{C}-##linearity of ##J##. A complex differential function is called **holomorphic function** and in older literature sometimes **regular function**. As regularity is widely used in various areas of mathematics, it should be avoided here. A function which is holomorphic in the entire complex plane ##U=\mathbb{C}## is called an **entire function** or an **integral function**. These are strong requirements, which means that we sometimes need a weaker condition, namely one that allows us to consider poles. Poles are isolated points, at which functions are not defined. Therefore a function, which is holomorphic on ##U## except at its poles, is called a **meromorphic function**.

It might be due the many different terms in complex analysis, which sometimes leads to the impression, that the complex case is more difficult than the real case. I think this is mainly for historical reasons and the need to have useful adjectives for certain properties. Until now I’ve neglected the representation of functions by series, which have – beside their practical advantages – often been the historically first approach to deal with the various concepts. Their names are:

**Power series.**$$\sum_{n=0}^{\infty}\; a_n(x-x_0)^n$$**Laurent series.**$$\sum_{n=-\infty}^{\infty} a_n(x-x_0)^n$$**Taylor series.**$$\sum_{n=0}^{\infty}\; \frac{1}{n!}\cdot f^{(n)}(x_0)\cdot (x-x_0)^n$$**Maclaurin series.**$$\sum_{n=0}^{\infty}\;\frac{1}{n!}\cdot f^{(n)}(0)\cdot x^n$$

Now it has to be considered, as some functions in complex analysis are called analytic. Although often used in the context of complex valued functions, analytic can equally be defined for real valued functions.

Let ##\mathbb{K}\in \{\mathbb{R},\mathbb{C}\}## be either the real or complex numbers and ##f: U \rightarrow \mathbb{K}##. Then ##f## is called **analytic** at ##x_0##, if there is a power series

\begin{equation}\label{PS}

\sum_{n=0}^{\infty} a_n (x-x_0)^n

\end{equation}

that converges to ##f(x)## in a neighborhood of ##x_0##. If ##f## is analytic in every point of ##U##, then ##f## is called analytic without the emphasis on any points. Analytic functions are smooth, i.e. in ##C^\infty(U)##. This implication is proper, as the real function

##f(x,y) =

\begin{cases}

\exp (-x^{-2})& \text{if } x \neq 0\\

0 & \text{if } x = 0

\end{cases}

##

is smooth everywhere, but not analytic at zero.

How do all these definitions relate to each other? ##\mathbb{C}## is a two dimensional real vector space and the defining equations are the same. The only difference is the ##\mathbb{C}-## linearity of ##J##. However, this is a quite powerful difference:

A function ##f:U \rightarrow \mathbb{C}## with ##f(x+iy)=u(x,y)+iv(x,y)## is holomorphic (differentiable) at ##z_0=x_0+iy_0## if ##f## is totally differentiable as function on ##\mathbb{R}^2## and the derivative ##J## is a ##\mathbb{C}-##linear mapping. This means that

$$J=\begin{pmatrix}\frac{\partial u}{\partial x}&\frac{\partial u}{\partial y}\\\frac{\partial v}{\partial x}&\frac{\partial v}{\partial y}\end{pmatrix}=\begin{pmatrix}u_x & u_y\\v_x & v_y\end{pmatrix}$$

is represented by a skew-symmetric matrix w.r.t. the basis ##\{1,i\}##, i.e. that for ##f## the **Cauchy-Riemann (differential) equations** hold

\begin{equation}\label{CR}

u_x = \frac{\partial u}{\partial x} = \frac{\partial v}{\partial y} = v_y \;\;\textrm{ and }\;\; u_y = \frac{\partial u}{\partial y} = – \frac{\partial v}{\partial x} = – v_x

\end{equation}

In a neighborhood ##U## of ##z_0\in \mathbb{C}## a function ##f(x+iy)=u(x,y)+iv(x,y)## is

holomorphic at ##z_0##

##\Longleftrightarrow ##

once complex differentiable at ##z_0##, i.e. ##f \in C^{1}(U)##

##\Longleftrightarrow ##

infinite many times complex differentiable at ##z_0##, i.e. ##f \in C^{\infty}(U)##

##\Longleftrightarrow ##

analytic at ##z_0## (locally)

##\Longleftrightarrow ##

##u## and ##v## are at least once real totally differentiable at ##(x_0,y_0)## and satisfy the Cauchy-Riemann differential equations ##(\ref{CR})##

##\Longleftrightarrow ##

f is real totally differentiable at ##(x_0,y_0)## and ##\frac{\partial f}{\partial \zeta}= 0##

with the **Cauchy-Riemann operator** ##\partial \zeta =\frac{1}{2} \left(\frac{\partial}{\partial x}+i{\frac{\partial}{\partial y}}\right)##

##\Longleftrightarrow ##

##f## is continuous at ##z_0## and its path integral over any closed, simple connected (##0##-homotopic), rectifiable curve ##\gamma## in ##U## is identically zero: ##\oint_\gamma f(z)d(z)=0##

**Cauchy’s integral theorem** or **Cauchy-Goursat theorem **

\begin{equation}\label{CGT}\end{equation}

##\Longleftrightarrow ##

if ##U_0## is a circular disc in ##U## with center ##z_0## then for ##z\in U_0## holds

**Cauchy’s integral formula **$$\; f(z)=\frac{1}{2\pi \, i}\oint_{\partial U_0}{\frac{f(\zeta )}{\zeta -z}} \;d \zeta$$

\begin{equation}\label{CI}\end{equation}

#### Sources

Masters in mathematics, minor in economics, and always worked in the periphery of IT. Often as a programmer in ERP systems on various platforms and in various languages, as a software designer, project-, network-, system- or database administrator, maintenance, and even as CIO.

## Leave a Reply

Want to join the discussion?Feel free to contribute!