Demystifying the Chain Rule in Calculus

Introduction

There are a  number of posts on PF involving a general confusion over the multi-vairiable chain rule.  The problem is often caused by a lack of clarity about the roles of functions and variables and what precisely each derivative means.  This insight is an attempt to clarify things.

Part A: The Single-Variable Chain Rule

1) Functional Notation

We begin with a review of the chain rule in one dimension.  For this we need two functions ##f## and ##g## and a third function defined by composition of ##f## and ##g##:

$$h = f \circ g$$

If we use ##x## as our variable, then this means that:

$$h(x) = f(g(x))$$

With the appropriate assumptions about ##f## and ##g## the chain rule says that:

$$h'(x) = f'(g(x))g'(x) \ \ (1)$$

Note that this means: the derivative of ##h##, evaluated at a point ##x##, equals the derivative of ##f##, evaluated at the point ##g(x)##, times the derivative of ##g##, evaluated at the point ##x##.

Although this notation is more complicated than the differential notation below, it has the advantage that it explicitly shows what each derivative means.

Another important point is that the notation ##f’, g’, h’## is independent of the variable ##x##.   In other words, we can use the notation ##f’## to identify unambiguously the function that is the derivative of ##f##, without having to specify a variable.

Dissociating a function (and its derivative) from the dummy variable that is used to define it is an important step in understanding how derivatives and the chain rule really work.  Unfortunately, as we will see, standard mathematical notation makes this harder to do in the multi-variable case.

2) Differential Notation

The alternative notation for the chain rule is, of course:

$$\frac{df}{dx} = \frac{df}{dg} \frac{dg}{dx} \ \ (2)$$

This notation is relatively free and easy and allows calculus to be done quickly, but notice how much precision has been lost:

  1. ##f## now stands for two different funtions: the original ##f## and the composition ##f \circ g##.
  2. The points at which these functions are to be evaluated has been lost.
  3. And, what exactly is ##\frac{df}{dg}##?

This is why you have to be careful using the differential notation not to lose track of how things are defined and what is a function of what.

Note that equation (2) means exactly the same as equation (1).  If you have any doubt about what (2) means, then go back to equation (1).

Part B: Multi-Variable Chain Rule

In multi-variable calculus, we start with a function ##f## of several independent variables: ##x, y, z##, say.  Assuming ##f## is differentiable, we can then define three new functions, the partial derivatives of ##f##:

$$f_x \equiv \frac{\partial f}{\partial x}, \ f_y \equiv \frac{\partial f}{\partial y}, f_z \equiv \frac{\partial f}{\partial z} \ \ (3)$$

Notice that the notation for partial derivatives is tied to a particular set of dummy variables – ##x, y, z## in this case.  This, as we shall see, may lead to ambiguity.

To make sure we understand what is meant by these derivatives, here is an example. Let

$$f(x, y, z) = xy + x^2z + xyz \ \ (4)$$

Then we have:

$$f_x(x, y, z) \equiv \frac{\partial f}{\partial x} = y + 2xz + yz \ \ (4a)$$

$$f_y(x, y, z) \equiv \frac{\partial f}{\partial y} = x + xz \ \ (4b)$$

$$f_z(x, y, z) \equiv \frac{\partial f}{\partial z} = x^2 + xy \ \ (4c)$$

Note that these partial derivatives are themselves functions of the three variables.  In general, once we have defined ##f## we have also defined ##f_x, f_y, f_z## and these are just three functions that can be applied to any variables we like.  For example, from equation ##(4a)## we get:

$$f_x(u, v, w) = v + 2uw + vw \ \ (5)$$

Note that technically ##f_x## is really “the function obtained by differentiating ##f## with respect to its first argument – which we just happen to call ##x##.”  There is no equivalent of the single-variable notation ##f’##, which would allow us to avoid using the variable ##x## here.  For good or bad, we are stuck with the ##f_x## and ##\frac{\partial f}{\partial x}## notation.

Ambiguity may now arise, however, if ##u, v, w## are themselves functions of ##x, y, z##.  For example, if we define:

$$h(x) = f(u(x), v(x), w(x)) \ \ (6)$$

Then we have defined a new function, ##h##, of a single variable ##x##.  The chain rule says that:

$$h'(x) = f_x(u(x), v(x), w(x))u'(x) + f_y(u(x), v(x), w(x))v'(x) + f_z(u(x), v(x), w(x))w'(x) \ \ (7)$$

But, what exactly is ##f_x## here?  Well, it is the function formed by taking the partial derivative of ##f## with respect to its first argument.  Note that the symbol ##x## is now overloaded.  But, again, we are stuck with the notation and we have to juggle using ##x## in these two roles.  In our example, ##f_x## is the function defined in equation ##(5)##, regardless of what ##u, v, w## are .

In fact, things get worse.  If simply ##u(x) = x## and, as is often the case, instead of ##v(x), w(x)## we use ##y(x), z(x)##.  Now we have overloaded all three symbols ##x, y, z##.  We have:

$$h(x) = f(x, y(x), z(x)) \ \ (8)$$

And the chain rule in this case gives:

$$h'(x) = f_x(x, y(x), z(x)) + f_y(x, y(x), z(x))y'(x) + f_z(x, y(x), z(x))z'(x) \ \ (9)$$

Where technically ##y, z## are used for both the dummy variables with which ##f## was defined (and which are used to denote the partial derivatives of ##f##) and also as functions of ##x##.  To illustrate this we could, for example, capitalise the letters where they represent dummy variables denoting which partial derivative we mean and leave them as they are where they repersent specific variables and functions.  This would give:

$$h'(x) = f_X(x, y(x), z(x)) + f_Y(x, y(x), z(x))y'(x) + f_Z(x, y(x), z(x))z'(x) \ \ (9b)$$

This highlights that ##X, Y, Z## denote the first, second and third partial derivatives of ##f## and are, in fact, unrelated to our variables ##x, y, z##.  A distinction which in the usual notation in equation ##(9)## is simply not made.

In the differential notation equation (9) becomes:

$$\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx} + \frac{\partial f}{\partial z} \frac{dz}{dx} \ \ (10)$$

Where the left-hand side is often called the “total” derivative of ##f##.  Note, however, that it is not really a derivative of ##f## at all, but the derivative of the composite function ##h##, which we defined in equation ##(8)## above.

This analysis, if nothing else, may at least take some of the ambiguity out of the “total” derivative ##\frac{df}{dx}## and the partial derivative ##\frac{\partial f}{\partial x}##.  Equation ##(9)## hopefully makes clear what equation ##(10)## really means.

Although equation ##(10)## may be the form of the chain rule with which most people are familiar, the notation hides a multitude of sins.  If you get confused over the chain rule it is worth being able to deconstruct it back to the functional format (equation ##(9)##) to see what is really going on.  The key step in that process is to recognise that we have used the symbols ##x, y, z##, and indeed the function ##f##, in two different roles.

It’s actually quite rare in standard mathematical notation to have to juggle the same symbol in two different roles.  But, the multi-variable chain rule in equations ##(9)## and ##(10)## is such a case.

Part C: Example

To illustrate how the multi-variable chain rule in used in a physical context, consider a particle under a time-dependent potential, which is defined as a function of four variables: ##V(x, y, z, t)##.

##V## is a multi-variable function with four partial derivatives (which themselves are multi-variable functions): ##V_x(x, y, z, t), V_y(x, y, z, t), V_z(x, y, z, t), V_t(x, y, z, t)##.  Note that these functions are defined in general for ##V##, independent of what the particle is doing.

Now, if the particle takes a specific trajectory through space, we can define a new function which is the potential of the particle along its trajectory:

$$V_p(t) = V(x(t), y(t), z(t), t) \ \ (11)$$

Note that texts will often use the same symbol for both these functions and simply write:

$$V(t) = V(x(t), y(t), z(t), t) \ \ (12)$$

Overloading the symbol ##V## and trusting that the student doesn’t get confused by how the derivatives are calculated.  I’ll stay with this convention, although I personally like to distinguish the functions by writing ##V_p## on the left hand side, as in equation ##(11)##.

In any case, we can calculate how the potential changes with time (finally, for simplicity, I’ll drop the variables and write ##V_x \equiv V_x(x, y, z, t)## etc.):

$$V'(t) = V_x x'(t) + V_y y'(t) + V_z z'(t) + V_t \ \ (13)$$

Or, in the differential notation:

$$\frac{dV}{dt} = \frac{\partial V}{\partial x} \frac{dx}{dt}+ \frac{\partial V}{\partial y} \frac{dy}{dt} + \frac{\partial V}{\partial z} \frac{dz}{dt} + \frac{\partial V}{\partial t} \ \ (14)$$

This is standard notation, but you can see from equations ##(11)## and ##(12)##that the ##V## on the left-hand side is actually a different function from the ##V## on the right-hand side.

Note also that ##V_x = \frac{\partial V}{\partial x}## is the partial derivative of the function ##V## with respect to its ##x## coordinate.  This ##x## is not the same as the function ##x(t)## representing the particle’s x-coordinate over time.   In other words ##V_x = \frac{\partial V}{\partial x}## is the same function, regardless of the path of the particle.  The path of the particle ##(x(t), y(t), z(t))## is the set of points at which this function ##V_x## is evaluated for that particular path.

One particular area of confusion is where the student thinks they need to differentiate the function ##V## along the particular particle trajectory to get ##V_x## (or ##\frac{\partial V}{\partial x}##).  This is not the case.  ##V_x## (or ##\frac{\partial V}{\partial x}##) is calculated by a general, spatial derivative before any particular trajectory is considered.  Then, this function is evaluated along a particular trajectory.

Conclusion

A thorough understanding of the single-variable chain rule is an important pre-requisite for multi-variable calculus.

The multi-variable chain rule involves a certain overload of the symbols ##x, y, z## and, in the usual differential notation, an overload of the symbol representing the function.  These ambiguities must be faced and understood.  In addition, the differential notation misses out much of the detail about how things are defined and the points at which they are evaluated.  Being able to go back to the functional notation can often clarify what is really going on.

In a physical context, it is important to distinguish where a function is defined and has been differentiated with respect to general spatial and time coordinates – yielding its partial derivatives ##V_x## etc. – and where these partial derivatives are being evaluated at specific points along, for example, the trajectory of a particle.

This insight hopefully provides a useful supplement to anyone learning multi-variable calculus.

 

 

4 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply