# I How to logically derive the total derivative formula?

1. Sep 7, 2016

### OmegaKV

Consider this equation:

$$f(x(t),y(t))=2(x(t))^2+x(t)y(t)+y(t)$$

One way to calculate df/dt is directly using the chain rule:

$$\frac{df}{dt}=4x(t)\frac{dx}{dt}+\frac{dx}{dt}y(t)+\frac{dy}{dt}x(t)+\frac{dy}{dt}$$
$$\frac{df}{dt}=(4x(t)+y(t))\frac{dx}{dt}+(x(t)+1)\frac{dy}{dt}$$

Another way is by using the formula for the total derivative:

$$\frac{df}{dt}=\frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}$$
$$\frac{\partial f}{\partial x} = 4x+y$$
$$\frac{\partial f}{\partial y} = x+1$$
$$\frac{df}{dt} = (4x+y)\frac{dx}{dt}+(x+1)\frac{dy}{dt}$$

I see how the formula for total derivatives should work since it is the multivariable analog of the derivative, but is there any way to logically derive formula for the total derivative from single variable calculus (just using chain rule, product rule, etc.), without having to visualize things in "3D"? It seems like you should be able to since f(x(t),y(t))=f(t) is really just a single variable function.

2. Sep 8, 2016

### chiro

Hey OmegaKV.

Can you look at the different derivatives in terms of the vector components that they represent?

The derivative for a one dimensional function is in terms of a two dimensional vector and you can extend it geometrically as well as algebraically (using vector geometry).

If you want to do it algebraically then I'd start by considering two and three dimensional systems (i.e. with one and two partial derivatives to get the total derivative) and get intuition for the algebraic case when you extend it to n-dimensions and see what it says about the total derivative formula.

3. Sep 8, 2016

### slider142

Not quite. The definition of the derivative of a multivariable function is slightly different to the standard definition of the derivative for a single-variable function. They coincide when the multivariable function in question is actually a single-variable function, of course, but the formula you want depends on getting the derivative from the multivariable expression for the function's value, instead of just taking the ordinary derivative of the single-variable expression for the function, so you will need to use the multivariable definition of the derivative.
The definition we usually use is the unique linear function of displacement vectors attached to the input that vanishes with the the function f at the same rate as linear variation in the domain of f. That is extremely vague and probably ambiguous, so it is preferable to have a strict mathematical expression that has only a single interpretation. We interpret the derivative at a point $(a, b)$ in the domain of $f$ to be the unique linear function $D$, whose domain is the tangent space at $(a, b)$ (a fancy name for the space of all possible displacement vectors from $(a, b)$), which satisfies the following limit:
$$\lim_{(h_1 , h_2 )\to (0, 0)} \frac{|f(a+h_1 , b+h_2 ) - f(a, b) - D(h_1 , h_2 )|}{|(h_1 , h_2)|}$$
which is what we mean by vanishing linearly with f.
From linear algebra, we know that if D is a linear transformation between two finite dimensional vector spaces, and we choose a basis for each vector space, then the action of D is equivalent to matrix multiplication by a particular matrix of numbers.
If you use the standard basis for $R^2$, the domain of f, and the standard basis for R, the codomain of f, then the matrix form of D at the point $(a, b)$, if the derivative exists there, is:
$$\left[ \begin{array}{c} \left.\frac{\partial f}{\partial x}\right|_{(a, b)} \\ \left.\frac{\partial f}{\partial y}\right|_{(a, b)} \end{array}\right]$$
When you look at the multivariable definition of the derivative, and consider your question, you will find that what you want is the chain rule for a multivariable function. If you go through the definition, you will find that the chain rule takes the same form as for single-variable functions. If f is a function of g and g is a function of t, then $D[f \circ g](a) = D[f](g(a)) \cdot D[g](a)$. We can use the multiplication $\cdot$ if we write each derivative as a matrix with respect to the same pair of domain and codomain bases.
In your case, f is a function of g, where g(t) = (x(t), y(t)). To apply the chain rule when t = a, we therefore need D[f](g(a)) and D[g](a). D[f](g(a))= D[f](x(a), y(a)) is, with respect to the standard bases, equivalent to multiplication by the matrix
$$\left[ \begin{array}{c} \left.\frac{\partial f}{\partial x}\right|_{(x(a), y(a))} \\ \left.\frac{\partial f}{\partial y}\right|_{(x(a), y(a))} \end{array}\right]$$
Likewise, D[g](a) is equivalent to multiplication by the matrix
$$\left[ \begin{array}{cc} \left.\frac{\partial x}{\partial t}\right|_{a} & \left.\frac{\partial y}{\partial t}\right|_{a} \end{array}\right]$$
The different matrix dimensions are due to the fact that the domain of g is R and the codomain is $R^2$, so the derivative is a linear transformation associating displacement vectors at a in R with displacement vectors at g(a) in $R^2$ (called the cotangent space).
Therefore, the matrix form of our derivative $D[f \circ g](a) = D[f](g(a)) \cdot D[g](a)$ with respect to the standard bases is:
$$\left[ \begin{array}{c} \left.\frac{\partial f}{\partial x}\right|_{(x(a), y(a))} \\ \left.\frac{\partial f}{\partial y}\right|_{(x(a), y(a))} \end{array}\right]\cdot \left[ \begin{array}{cc} \left.\frac{\partial x}{\partial t}\right|_{a} & \left.\frac{\partial y}{\partial t}\right|_{a} \end{array}\right] = \left.\frac{\partial f}{\partial x}\right|_{(x(a), y(a))}\left.\frac{\partial x}{\partial t}\right|_{a} + \left.\frac{\partial f}{\partial y}\right|_{(x(a), y(a))}\left.\frac{\partial y}{\partial t}\right|_{a}$$
Since x(t) and y(t) are actually single variable functions, the partial derivatives are equivalent to the ordinary derivatives, as you have in your expression.

4. Nov 16, 2016

### Stephen Tashi

That's a common "abuse of notation", but it is technically wrong to name two different functions with the same letter "f". ( Yet it's one of those things that "everybody does", especially in discussing physics.)

A function of a single variable $f(t)$ doesn't have partial derivatives, so the $f$ on the right hand side of your equation ( $f(x(t),y(t)) = f(t)$ ) can't be the same as the $f$ on the left hand side. If we want to use precise notation, we would say:

$f(x,y)$ is a real valued function of two variables.
$r(t)$ and $s(t)$ are each real valued functions of a single variable.
The composition of functions given by $g(t) = f(r(t), s(t))$ is a real valued function $g(t)$ of a single variable.

That notation gives each distinct function a distinct name.

If you have a good technical grasp of derivatives and partial derivatives, it would be helpful to look at tricky examples where the total derivative formula doesn't work. There can be cases when the partial derivatives of $f(x,y)$ either don't exist or exist and are not continuous at a point $(x_0, y_0)$ and yet the derivative $D_t f(r(t), s(t))$ exists for a particular path $(r(t),s(t))$.

Try $f(x,y) = \sqrt[3]{xy} , \ r(t) = t^2, s(t) = t$ and consider the point $x = 0, y = 0$.
Use the definition of a partial derivative as a limit to show $\frac{\partial f}{\partial x}|_{(x = 0, y = 0)} = 0 = \frac{\partial f}{\partial y}|_{(x = 0, y = 0)}$.

5. Nov 21, 2016

### zinq

Suppose we want to know the derivative of

g(t) = f(x(t), y(t))​

with respect to t.

I.e., we want to know how much g(t) changes when t changes just a little, say Δt . . . considered as a ratio to the amount t changes. (Since of course the amount that g(t) changes will depend on how much t changes.) That would be

Δg(t) / Δt = g(t+Δt) - g(t) / Δt.​

Now plugging this expression into the definition of g(t):

Δg(t) / Δt = (f(x(t+Δt), y(t+Δt)) - f(x(t), y(t))) / Δt.​

The right-hand side is starting to look like the definition of some derivative. We can always add and subtract the same term so:

Δg(t) / Δt = (f(x(t+Δt), y(t+Δt)) - f(x(t), y(t+Δt)) + f(x(t), y(t+Δt)) - f(x(t), y(t))) / Δt

= (f(x(t+Δt), y(t+Δt)) - f(x(t), y(t+Δt))) / Δt + (f(x(t), y(t+Δt)) - f(x(t), y(t)) / Δt.​

We're not quite there yet. But wait — we can also multiply and divide by the same thing (as long as it isn't equal to 0):

Δg(t) / Δt = (f(x(t+Δt), y(t+Δt)) - f(x(t), y(t+Δt))) / (x(t+Δt) - x(t)) ⋅ (x(t+Δt) - x(t)) / Δt + (f(x(t), y(t+Δt)) - f(x(t), y(t)) / (y(t+Δt) - y(t)) ⋅ (y(t+Δt) - y(t)) / Δt.​

Finally we can let Δt approach 0 and see the following formula materialize, as if by magic:

g'(t) = ∂1f(x(t), y(t)) ⋅ x'(t) + ∂2(x(t), y(t)) ⋅ y'(t)​

(Where the partial derivative symbols ∂1 and ∂2 are to be preferred (to ∂x and ∂y) since they are unambiguous, and do not depend on which variables are plugged in to the first and second slots of f( , ).)

Last edited: Nov 21, 2016