# Partial Differentiation Without Tears

Differentiation is usually taught quite well. Perhaps that’s because it is the first introduction to calculus, which is considered a big step in a student’s mathematical education, so teachers mostly take a lot of care with that step.

Unfortunately, the same cannot be said for partial differentiation. It is often glossed over in a rush that leaves only a very superficial appreciation of what it really is. Perhaps that’s because some teachers assume that, since the big hurdle of differentiation has already been cleared, very little effort is needed to extend the understanding of that to an understanding of partial differentiation.

Nothing could be further from the truth. Time and again I come across people who, although very bright, and having received an advanced mathematical education, have a completely confused notion of what partial differentiation is.

I was one of those people. Despite working in mathematics one way or another since leaving school, my use of partial derivatives was until recently always more ‘fingers crossed and hope for the best’ than the systematic application of sound principles. I was never absolutely sure what I was doing, and whether it was valid. I usually managed to get the right answer, but I couldn’t have told you why what I did was valid.

This sort of comprehension barrier could be easily avoided if the topic were presented in the right way, which is what I will try to do here. Let me set the scene by giving a few examples of how it can go wrong.

### A couple of messy examples

Let’s say we have been learning about mechanics in physics, and have learned the various equations relating key quantities for linear motion under a constant acceleration ##a##. The quantities are ##s## for distance travelled, ##u## for initial velocity, ##v## for final velocity and ##t## for time.

We then get a homework question asking us to calculate the partial derivative ##\frac{\partial s}{\partial u}## – ie ‘how does the distance travelled depend on the initial velocity?’ Here are three different possible answers:

- we differentiate the right-hand side of equation ##s=ut+\frac12 at^2## with respect to ##u## to obtain answer ##t##;
- we use the equation ##v^2-u^2=2as## to write ##s=\frac1{2a}(v^2-u^2)##, then differentiate with respect to ##u## to get answer ##-\frac ua##;
- we use the first equation to substitute ##\frac 1t (s-\frac12 at^2)## for one of the ##u## in the second equation, obtaining ##v^2-u\times \frac 1t (s-\frac12 at^2)=2as##, from which we obtain ##s=\frac {v^2+uat/2}{2a+u/t}##. Differentiating this with respect to ##u## gives ##\frac{(at/2)(2a+u/t)-(v^2+uat/2)/t}{(2a+u/t)^2}##.

We might hope that these are different expressions for the same number, but they are not. Which answer is correct?

Here’s another example. Let ##f:\mathbb R^2\to\mathbb R## be the function such that ##{f(x,y)=xy}##. We are asked to find ##\frac{\partial}{\partial x}f(x,2x)##.

One student argues that the question wants us to differentiate with respect to the first argument of ##f##, and so obtains the answer ##2x##.

Another argues that ##f(x,2x)=2x^2## so we should differentiate that with respect to ##x##. They obtain the answer ##4x##.

Which student got the question right?

In fact, both of these questions had no correct answer because they were ambiguous, and hence unanswerable.

Problems like this can be resolved without difficulty if one has the right conceptual framework. So let’s set up that framework.

The key to understanding partial differentiation is to realise that it is defined as an operation that is applied to a *function*. Later we will see that, with suitable caution, it can also be validly applied to a *formula*. But let’s do partial differentiation of functions first.

## Initial terminology and notation

A *function* is a rule that, with every element of a set called the *domain*, associates an element of another set called the *range*. We will call the elements of the domain *input points*, and the element of the range that is associated with an input point is called the ‘*image* of the input point under the specified function’. If we like, we can think of the function as a process and describe the image as the ‘output’ or ‘result’ of the function for the given input point. The image is also sometimes referred to as the ‘value’ of the function at the given input point.

For the functions we will be concerned with, the range will always be the set of real numbers ##\mathbb R##, and the domain will be the set of ordered ##n##-tuples of real numbers, known as ‘##n##-dimensional Euclidean space’ and denoted by ##\mathbb R^n##, where ##n## is some positive integer. Although it is beyond the scope of this note, the concepts and notation can be extended to cases:

- where the domain and/or range use complex numbers; and
- where the range is multi-dimensional, eg ##\mathbb R^m## or ##\mathbb C^m##.

An input point in the domain is an ##n##-dimensional vector, and we use an overhead arrow to signal that, eg ##\vec x##. Each such vector has ##n## components. We use the symbol ##x_k## to denote the ##k##th component of ##\vec x## and we can also write ##\vec x## as a sequence of components using the notation:

$$\vec x=\langle x_1,x_2,…,x_{n-1},x_n\rangle$$

Where ##n=1##, the domain is just ##\mathbb R##, so we may drop the overhead arrow and just write an unadorned letter like ##x## to denote an input point.

We will refer to the components of an input point as *arguments* to the function. We say that the ##k##th component of an input point is the ##k##th *argument to the function*. If the domain is ##\mathbb R^n## the function has ##n## arguments. This is sometimes referred to as a ##n##-ary function, and when ##n## is 1 or 2 the function is called *unary* or *binary* respectively.

To define a function, we need to specify three things: the domain, the range and the rule that assigns an image to each input point. Sometimes only the rule is explicitly stated. In that case the convention is to:

- assume the range is ##\mathbb R##; and
- assume the domain is the subset of ##R^n## consisting of all input points for which the rule gives a unique, well-defined real number as image, where ##n## should be somehow indicated in the statement of the rule.

It is usual to use a Roman or Greek letter to denote a function. There are some additional notational shorthands that are often useful:

- writing ##f:A\to B## means that the function with name, or label, ##f## has domain ##A## and range ##B##
- we can also write a function without giving it a label, by using the ‘maps to’ symbol ##\mapsto##. Note the flat bit at the tail of the arrow, to distinguish it from the more commonly used arrow ##\to##, that has a different meaning:
- writing ##x\mapsto x^2## denotes the function whose domain and range are both ##\mathbb R## and for which the image of an input point is the square of that input point.
- writing ##\langle x,y\rangle\mapsto y## denotes the function whose domain is ##\mathbb R^2## and range is ##\mathbb R## and for which the image of an input point is the second component of that input point.

Since this method does not state the domain or range, it is usually only employed when the context implies a natural choice of domain and range.

- to denote the image of a given input point under a given function we first write the function and then write the input point to its right in parentheses:
- if ##f:\mathbb R\to\mathbb R## returns as image the square of an input point, that image for input point ##x## is written ##f(x)## and we have ##f(x)=x^2##
- if we write ##(\langle x,y\rangle\mapsto y)(\langle a,b\rangle)## that means we apply the function ##\langle x,y\rangle\mapsto y## to the input point ##\langle a,b\rangle##, which will give image ##b##.

It is worth dwelling on this last point. Not uncommonly we come across references like ‘the function ##f(x)##’. Such a reference is an abuse of notation, and often a confusing one. In most cases where such atrocities occur, ##f(x)## is not a function but a real number that is the image of input point ##x## under function ##f##. If we want to refer to the function we just write ##f##.

- to save on ink, page space and reader fatigue, we allow dropping of the angle brackets around the components of a vector when a function is applied to it. That is, we abbreviate ##f(\langle x_1,…,x_n\rangle)## to ##f(x_1,…,x_n)## when we want to write the image of input point ##\langle x_1,…,x_n\rangle## under function ##f##.It’s usually safe enough to drop the enclosing angle brackets in this context because the components ##x_k## are still hemmed in by the parentheses, so they can’t run away.

## Ordinary Differentiation of a function

The simplest type of function is a unary function, that has only a single argument. The function ##f:\mathbb R\to\mathbb R## such that ##f(x)=x+1## is an example of that, and so too are the functions that map a real number to its square, its absolute value or its cosine. Such functions are also sometimes called ‘single-variable’ functions.

We say that a unary function ##f## ‘is differentiable’ at input point ##x## iff the following limit exists:

$$\lim_{h\to 0}\frac{f(x+h)-f(x)}{h}$$

If ##U## is the subset of the domain of ##f## comprising all points ##x## where that limit exists then we use ##f’## to denote the function with domain ##U## and range ##\mathbb R## that, to each input point ##x##, assigns the above limit as image. The function ##f’## is called the *derivative* of ##f##.

This is often described by saying ‘the value of the derivative of ##f## at ##x##’ is ##f'(x)##. Sometimes this is shortened to ‘the derivative of ##f## at ##x## is ##f'(x)##’. Such abbreviations should be avoided except where we can be confident they will not cause confusion. A derivative is a *function*, not a number (value).

Another notational convention that works well is to use the operator ##D## to indicate differentiation of a function symbol that occurs to its right. Then we can write ##f’## as ##Df## and the value of the derivative of ##f## at ##x## can also be written ##Df(x)##.

The derivative is also sometimes written as something like ##\frac{dy}{dx}## or even ##y’##, but the ##f’## and ##Df## notations are preferred because they make the role of the function ##f## explicit, so that confusion is less likely.

## Partial Differentiation of a function

With that set-up, it now becomes straightforward and intuitive to define a partial derivative. Let ##f## be a ##n##-ary function. Then we say ##f## is *differentiable with respect to its ##k##th argument* at input point ##\vec x## iff the following limit exists:

$$\lim_{h\to 0}\frac{f(x_1,…,x_{k-1},x_k+h,x_{k+1},…,x_n)-f(x_1,…,x_n)}{h}$$

The first term in the numerator is the same as the second except that ##h## has been added to the ##k##th argument.

If ##U## is the subset of the domain of ##f## comprising all points ##\vec x## where that limit exists then I use ##D_kf## to denote the function with domain ##U## and range ##\mathbb R## that, to each input point ##\vec x##, assigns the above limit as image. The function ##D_kf## is called ‘the *partial derivative* of ##f## with respect to its ##k##th argument’. We say ‘the *value* at ##\vec x## of the partial derivative of ##f## with respect to its ##k##th argument’ is ##D_kf(\vec x)##, which is the image of input point ##\vec x## under function ##D_kf##.

Note how this ##D_kf## notation explicitly identifies the two things that are necessary to have a partial derivative:

- a well-defined function ##f##, and
- a clear identification, by the number ##k##, of the position of the argument with respect to which the derivative is taken.

Unless those two things are clearly identified, references to a partial derivative are at best ambiguous and at worst meaningless. Most notations used do not make these two things clear. We will review common types of such notation, but first, since I promised to cover partial differentiation of formulas, let’s do that.

## Partial differentiation of formulas

If suitable care is taken, partial differentiation can be applied to formulas as well as functions. A ‘formula’ is a symbol string made up of constant and variable symbols connected in various ways using operators such as ##+##, ##\times## and exponentiation, and functions such as ##\sin##, with parentheses used occasionally to enforce precedence in a particular way.

Say we have a formula, which we label as ##\phi##, that uses a bunch of different variable symbols to denote real numbers, and one of those symbols is ##y##. Then we can use the definitions we already have to define the partial derivative of ##\phi## with respect to ##y## as follows:

- (substitution) make a new formula ##\psi## by starting with ##\phi## and replacing ##y## by ##x_1## and all other variable symbols by ##x_2,….,x_n##, where ##n## is the number of different variable symbols in ##\phi##.
- (function definition) define function ##f:\mathbb R^n\to \mathbb R## such that ##f(\vec x)=\psi##.
- (partial differentiation of function) let ##\psi’## be the formula for ##D_1f(\vec x)##
- (reverse substitution) make a new formula ##\phi’## by replacing each ##x_k## in ##\psi’## by the symbol that was changed to ##x_k## when we formed ##\psi## from ##\phi##.

Then we define the formula ##\phi’## to be ‘the partial derivative of ##\phi## with respect to ##y##’.

While that approach is instructive because it connects the concept of partial differentiation of a formula to the existing concept of partial differentiation of a function, it’s not hard to show that it is equivalent to the following more direct definition, which is the one generally used in practice.

The partial derivative of the above formula ##\phi## with respect to one of its variables ##y## is equal to

$$\lim_{h\to 0}\frac{\phi^{y+h}_y-\phi}{h}$$

where ##\phi^{y+h}_y## denotes the formula that is obtained by replacing every instance of ##y## in ##\phi## by ##y+h##.

Let’s use the second definition to do an example. We’ll partial-differentiate the formula ##a\cos x+b\sin xy## with respect to variable ##y##. The partial derivative is:

\begin{align*}

\lim_{h\to 0}\frac{(a\cos x+b\sin x(y+h))-(a\cos x+b\sin xy)}{h}

&=\lim_{h\to 0}\frac{b\sin (x(y+h))-b\sin xy}{h}\\

&=b\lim_{h\to 0}\frac{\sin (x(y+h))-\sin xy}{h}\\

&=bD(u\mapsto\sin xu)(y)\\

&\textrm{[which is just an ordinary derviative]}\\

&=b(u\mapsto x\cos xu)(y)\\

&\textrm{[having performed the ordinary differentiation]}\\

&=bx\cos xy

\end{align*}

A key difference from partial differentiation of a function is that

- arguments to a function are numbered (ordered) but not named; whereas
- variables used in a formula are named but not numbered (ordered)

So, whereas we perform partial differentiation of a function with respect to an argument with a specified number, we perform partial differentiation of a formula with respect to a named variable. In view of this, we can extend the ##D_k## notation we used in the previous section to a notation where we use a variable name as subscript rather than an argument number, eg ##D_x##. Using that, we can write the above partial derivative as

$$D_x\left(a\cos x+b\sin xy\right)$$

We should also note that, whereas the partial derivative of a function is another function, the partial derivative of a formula is a formula. Partial differentiation, like ordinary differentiation, is an operation that takes one sort of thing as an input and produces the same sort of thing as an output.

## Different notations commonly used for partial differentiation

Here we consider the types of notation typically used for partial differentiation, and the circumstances in which it is clear and effective, and those in which it is not.

The notation I have come across most often for partial derivatives uses a ‘curly d’: ##\partial##. Taking for argument’s sake the case where the differentiation is with respect to a variable labelled ##t##, it is written as:

- ##\frac{\partial f}{\partial t}## if ##f## is a function. Other ways that this is sometimes written are ##f_t, D_tf## or ##\partial_tf##.
- ##\frac{\partial}{\partial t}\left(\phi\right)## where ##\phi## is a formula, Other ways that this is sometimes written are ##D_t(\phi)## or ##\partial_t(\phi)##.
- ##\frac{\partial a}{\partial t}## if ##a## is an amount in a theory from some discipline such as physics or economics.

The third of these should never be used, unless the context provides a clear, rigid understanding of a unique formula that expresses ##a## in terms of ##t## and other variables. The examples at the opening of this note demonstrate the mess that can arise otherwise.

The second one just presents two other ways of writing what we would write using the notation of the previous section as ##D_t(\phi)##. Hence it is perfectly meaningful and unambiguous.

The first one can be problematic, because arguments to functions are *numbered*, not* named*, so the use of the name ##t## may generate uncertainty as to which argument is intended. For instance, even if we have been mostly using ##t## for the first argument of ##f##, what are we to make of ##\frac{\partial f}{\partial t}(t,t^2)##?

A way to remove the ambiguity of such an expression is to instead write it as:

$$\frac{\partial f}{\partial u}(u,s)\bigg|_{\langle t,t^2\rangle }

\textrm{ OR }

f_u(u,s)\bigg|_{\langle t,t^2\rangle }

\textrm{ OR }

D_uf(u,s)\bigg|_{\langle t,t^2\rangle }

\textrm{ OR }

\partial_uf(u,s)\bigg|_{\langle t,t^2\rangle }

$$

What this does is first, by virtue of the parenthesis ##(u,s)##, **label** the arguments to the function with names so that we can interpret ##\frac{\partial f}{\partial u}## as indicating partial differentiation with respect to the first argument, ie ##D_1f##. Then the bar and the input point ##\langle t,t^2\rangle## at its foot tells us to evaluate (find the image of) the partial derivative function ##D_1f## at input point ##\langle t,t^2\rangle##.

Note that, while that tactic adapts the ##\frac{\partial f}{\partial x}## notation to clearly express the *value* of the partial derivative *at an input point*, there is no obvious, safe, reliable way to use it to express the partial derivative function ##D_kf##. On the other hand, despite being the best for concision and clarity, ##D_k## operators are not widely-used notation so, when I am writing about partial derivatives, I explain the meaning at the first use, with a sentence like:

‘The velocity is ##D_3g##, where ##D_kg## indicates the partial derivative of function ##g## with respect to its ##k##th argument.’

Another notation, that is commonly used in physics, is the ‘overdot’ notation, where a dot is placed over a variable to indicate differentiation of it with respect to time, as in ##\dot r##. This rarely causes problems because it is typically used for location coordinates of a moving particle. The fact that the particle must trace out some path in spacetime implicitly defines a unary function for each of its spatial coordinates, relating that coordinate to time. So ##\dot r## is an *ordinary* derivative which, if we denote by ##f## the function that gives coordinate ##r## in terms of time, can also be written ##Df## or ##\frac{df}{dt}##. There is no risk of ambiguity in this case.

## Revisiting the messy examples

We can now see why the examples with which we opened this article caused such trouble.

We were asked to calculate a partial derivative of the displacement ##s##, but that displacement is a *physical quantity*, which is an amount or a variable, not a function or a formula. No function or formula was specified, so it is meaningless to ask about the partial derivative of the displacement. We can make a guess as to a function that might have been intended, but if there is more than one possible function, that guess may not be what the question-setter had in mind. In the example given, we found three different possible functions for ##s##, leading to three totally different, incompatible, partial derivatives.

In the second example we were asked to find ##\frac{\partial}{\partial x}f(x,2x)##. The dilemma is, what function are we differentiating? If it is ##f## then we might guess that we need to differentiate with respect to the first argument because, when the function was defined as ##f(x,y)=xy##, the variable ##x## was used for the first argument. So the answer would be as follows:

\begin{align*}

D_1f(x,2x)&=D_1(\langle x,y\rangle\mapsto xy)(x,2x)\\

&=(\langle x,y\rangle\mapsto y)(x,2x)\textrm{ [performing the differentiation]}\\

&=2x\textrm{ [evaluating the function at input point }\langle x,2x\rangle\ ]

\end{align*}

Alternatively, if the intention was for us to differentiate with respect to both ##x##s, the function we need to differentiate is the unary function ##g## given by ##g(x)=f(x,2x)=2x^2##. In that case the answer is

\begin{align*}

Dg(x,2x)&=D(x\mapsto 2x^2)(x)\\

&=(x\mapsto 4x)(x)\textrm{ [performing the differentiation]}\\

&=4x\textrm{ [evaluating the function at input point }x\ ]

\end{align*}

Again, the failure of the problem to specify the function that is being differentiated has led to more than one possible answer.

## Partial differentiation in Thermodynamics

The topic that has, for me, involved more partial differentiation than any other I have encountered, is Thermodynamics. It seems there are ##\partial##s every way one looks in thermodynamics. And they appear to break a key law I have enunciated, because they are applied to amounts, rather than to functions or formulas.

Fortunately, this turns out not to be a problem because it satisfies the proviso I set out above, which is that differentiation of ‘amounts’ can be meaningful if the context provides a clear, rigid understanding of a unique formula that expresses ##a## in terms of ##t## and other variables. In thermodynamics there is such a formula, called the ‘fundamental equation’. In its usual form it expresses heat energy ##U## as a formula involving entropy ##S##, volume ##V## and ##N_1,…,N_r## where each ##N_k## is the number of moles of molecule type ##k##.

Given that context, ##U## is a function from ##\mathbb R^{2+r}## to ##\mathbb R## such that ##U(S,V,N_1,…,N_r)## is the formula specified in the fundamental equation, which is the heat energy in a system whose other extensive parameters have values ##S,V,N_1,…,N_r##.

Hence when a thermodynamic text defines temperature as

$$T=\frac{\partial U}{\partial S}$$

this means that ##T## is the function ##D_1U## with domain ##\mathbb R^{2+r}## and range ##\mathbb R##. That function, given an input point whose components are values for ##S,V,N_1,…,N_r##, tells you what the temperature is, which is the amount

$$D_1U(S,V,N_1,…,N_r)$$

Sometimes in thermodynamics we encounter partial derivatives where the numerator is not ##\partial U##. I’m looking at a page now that has both ##\frac{\partial S}{\partial U}## and ##\frac{\partial S}{\partial V}## on it. To make sense of this we need to rearrange the fundamental equation to make ##S## the subject. That allows us to define a function ##S## from ##\mathbb R^{2+r}## to ##\mathbb R## whose image, given an input point whose components are, in order, heat energy ##U##, volume ##V## and mole numbers ##N_1,…,N_r##, is the entropy of the system, denoted

$$S(U,V,N_1,…,N_r)$$

This context is called the ‘entropy representation’ of the fundamental equation, and is contrasted with the earlier context which is based on the ‘energy representation’ of the fundamental equation.

Note that the symbols ##U## and ##S## switch meanings between the two contexts, with ##U## being a function in the first and an amount in the second, while ##S## is the other way around.

Given that, we interpret ##\frac{\partial S}{\partial U}## and ##\frac{\partial S}{\partial V}## as meaning ##D_1S## and ##D_2S## respectively, both in the context of the entropy representation.

## Partial differentiation of fields

Another area where partial differentiation abounds is electromagnetism, or the study of fields more generally.

In Maxwell’s equations we encounter items like ##\frac{\partial^2 \vec B}{\partial t^2}##, and vector ‘curls’, which contain elements like ##\frac{\partial \vec E}{\partial x}##. Since we are restricting ourselves to scalar-valued functions in this note, let us consider the (scalar) *components* of those vector derivatives in the ##x## spatial direction, which we can denote by ##\frac{\partial^2 B^x}{\partial t^2}## and ##\frac{\partial E^x}{\partial x}## respectively.

##B^x## and ##E^x## might look like amounts, but actually they are functions. A scalar spacetime field (all the fields considered here are spacetime fields) is defined as a function that assigns to each point in spacetime a scalar value. So ##B^x## and ##E^x## are both functions from ##\mathbb R^4## to ##\mathbb R##, such that ##B^x(t,x,y,z)## and ##E^x(t,x,y,z)## are the ##x## component of the electric and magnetic field vectors respectively, at the spacetime point ##\langle t,x,y,z\rangle##.

Hence ##\frac{\partial^2 B^x}{\partial t^2}## means ##D_1D_1B^x## (note the repeated application of the differential operator ##D_1##) and ##\frac{\partial E^x}{\partial x}## means ##D_2E^x##.

## Conclusion

The technique set out in this note cannot turn an ambiguous statement or question into an unambiguous one. Nothing can do that. But it should enable the reader that has a problem with understanding something that somebody else has written involving partial differentiation to:

- identify whether the problem is with their own comprehension or with the clarity of the author’s writing;
- identify what the different possible interpretations of the question are; and
- formulate a clear question they can ask the author, to point out the problem and get clarification of what had been intended.

It will also equip the reader to express themselves clearly when they need to write something involving partial differentiation.

A useful rule to remember is:

- When we take a partial derivative of a function, we specify the position of the argument with respect to which we are differentiating, and the result is a function.
- When we take a partial derivative of a formula, we specify the symbol/label/name of the variable with respect to which we are differentiating, and the result is a formula.
- If we try to take a partial derivative of a variable or amount then, unless there is a clear, rigid understanding of a unique function or formula that is implied by the context, we probably won’t really understand what we are doing, and will end up with a mess.

Finally, I find that, when problems involving lots of variables and partial differentiation become confusing, setting out to formally define the functions involved, how they relate to one another, and what partial derivatives are being taken of which arguments of which functions, helps me get my thoughts organised enough to overcome obstacles of comprehension that previously seemed insurmountable. As always though, your mileage may vary.

[*Featured illustration by Eleanor Kirk*]

Some of the ambiguity with functions could be cleared up using the lambda calculus (or just the lambda notation), which makes explicit what arguments a function takes. There is always a confusion in an expression such as [itex]f(x)[/itex] as to whether you mean the function, or whether you mean the value of [itex]f[/itex] at a particular point [itex]x[/itex]. The lambda calculus makes this clear. Your notation [itex]xmapsto x^2[/itex] is basically equivalent (I think) to the lambda calculus, which would indicate that function by [itex]lambda x . x^2[/itex]. I think that the only reason people don't use the lambda notation is that it's a lot of trouble, and usually it's not necessary.

Thanks andrewkirk!

Especially with respect to thermodynamics, it would be good to be explicit about what variables are held fixed.

http://www.compadre.org/per/document/servefile.cfm?ID=11819&DocID=2669 [PDF]

"Representations of Partial Derivatives in Thermodynamics"Thompson, Manogue, Roundy, MountcastleNice work Andrew!

This issue is particularly important when teaching students variational calculus and Lagrange mechanics. In particular when the base space is multi-dimensional and the Euler-Lagrange equations involve a partial derivative with respect to the base space coordinates of a partial derivative with respect to the arguments of the Lagrangian. This has confused many a student.

I have a suggestion. Start with images to explain all the differentiation concepts on the first day before even simple total derivative is explained. Once students grasp what they are trying to express, then they can better focus on how.

Using this one image, I think the following concepts can be explained in just seconds.

Partial derivatives:The rate of change of altitude as we walk toward the summit versus the rate of change of altitude as we walk parallel to the summit.Total derivative:Imagine the image reduced to a 2D profile choosing a planar slice through the mountain. Then height Y represents altitude, and it varies only with the horizontal variable X. The slope is the total derivative.Covariant derivative: Consider the color shading as another independent variable. Now imagine walking on the mountain following a contour with constant shade of green. The rate of change of altitude is now the derivative covariant with color.