How to logically derive the total derivative formula?

In summary, the chain rule for multivariable functions takes the same form as for single-variable functions, but you need to use the multivariable definition of the derivative to get the formula.
  • #1
OmegaKV
22
1
Consider this equation:

[tex]f(x(t),y(t))=2(x(t))^2+x(t)y(t)+y(t)[/tex]

One way to calculate df/dt is directly using the chain rule:

[tex]\frac{df}{dt}=4x(t)\frac{dx}{dt}+\frac{dx}{dt}y(t)+\frac{dy}{dt}x(t)+\frac{dy}{dt}[/tex]
[tex]\frac{df}{dt}=(4x(t)+y(t))\frac{dx}{dt}+(x(t)+1)\frac{dy}{dt}[/tex]

Another way is by using the formula for the total derivative:

[tex]\frac{df}{dt}=\frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}[/tex]
[tex]\frac{\partial f}{\partial x} = 4x+y[/tex]
[tex]\frac{\partial f}{\partial y} = x+1[/tex]
[tex]\frac{df}{dt} = (4x+y)\frac{dx}{dt}+(x+1)\frac{dy}{dt}[/tex]

I see how the formula for total derivatives should work since it is the multivariable analog of the derivative, but is there any way to logically derive formula for the total derivative from single variable calculus (just using chain rule, product rule, etc.), without having to visualize things in "3D"? It seems like you should be able to since f(x(t),y(t))=f(t) is really just a single variable function.
 
Physics news on Phys.org
  • #2
Hey OmegaKV.

Can you look at the different derivatives in terms of the vector components that they represent?

The derivative for a one dimensional function is in terms of a two dimensional vector and you can extend it geometrically as well as algebraically (using vector geometry).

If you want to do it algebraically then I'd start by considering two and three dimensional systems (i.e. with one and two partial derivatives to get the total derivative) and get intuition for the algebraic case when you extend it to n-dimensions and see what it says about the total derivative formula.
 
  • #3
OmegaKV said:
Consider this equation:

[tex]f(x(t),y(t))=2(x(t))^2+x(t)y(t)+y(t)[/tex]

One way to calculate df/dt is directly using the chain rule:

[tex]\frac{df}{dt}=4x(t)\frac{dx}{dt}+\frac{dx}{dt}y(t)+\frac{dy}{dt}x(t)+\frac{dy}{dt}[/tex]
[tex]\frac{df}{dt}=(4x(t)+y(t))\frac{dx}{dt}+(x(t)+1)\frac{dy}{dt}[/tex]

Another way is by using the formula for the total derivative:

[tex]\frac{df}{dt}=\frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}[/tex]
[tex]\frac{\partial f}{\partial x} = 4x+y[/tex]
[tex]\frac{\partial f}{\partial y} = x+1[/tex]
[tex]\frac{df}{dt} = (4x+y)\frac{dx}{dt}+(x+1)\frac{dy}{dt}[/tex]

I see how the formula for total derivatives should work since it is the multivariable analog of the derivative, but is there any way to logically derive formula for the total derivative from single variable calculus (just using chain rule, product rule, etc.), without having to visualize things in "3D"? It seems like you should be able to since f(x(t),y(t))=f(t) is really just a single variable function.
Not quite. The definition of the derivative of a multivariable function is slightly different to the standard definition of the derivative for a single-variable function. They coincide when the multivariable function in question is actually a single-variable function, of course, but the formula you want depends on getting the derivative from the multivariable expression for the function's value, instead of just taking the ordinary derivative of the single-variable expression for the function, so you will need to use the multivariable definition of the derivative.
The definition we usually use is the unique linear function of displacement vectors attached to the input that vanishes with the the function f at the same rate as linear variation in the domain of f. That is extremely vague and probably ambiguous, so it is preferable to have a strict mathematical expression that has only a single interpretation. We interpret the derivative at a point ##(a, b)## in the domain of ##f## to be the unique linear function ##D##, whose domain is the tangent space at ##(a, b)## (a fancy name for the space of all possible displacement vectors from ##(a, b)##), which satisfies the following limit:
[tex]\lim_{(h_1 , h_2 )\to (0, 0)} \frac{|f(a+h_1 , b+h_2 ) - f(a, b) - D(h_1 , h_2 )|}{|(h_1 , h_2)|}[/tex]
which is what we mean by vanishing linearly with f.
From linear algebra, we know that if D is a linear transformation between two finite dimensional vector spaces, and we choose a basis for each vector space, then the action of D is equivalent to matrix multiplication by a particular matrix of numbers.
If you use the standard basis for ##R^2##, the domain of f, and the standard basis for R, the codomain of f, then the matrix form of D at the point ##(a, b)##, if the derivative exists there, is:
[tex]\left[ \begin{array}{c} \left.\frac{\partial f}{\partial x}\right|_{(a, b)} \\ \left.\frac{\partial f}{\partial y}\right|_{(a, b)} \end{array}\right][/tex]
When you look at the multivariable definition of the derivative, and consider your question, you will find that what you want is the chain rule for a multivariable function. If you go through the definition, you will find that the chain rule takes the same form as for single-variable functions. If f is a function of g and g is a function of t, then ##D[f \circ g](a) = D[f](g(a)) \cdot D[g](a)##. We can use the multiplication ##\cdot## if we write each derivative as a matrix with respect to the same pair of domain and codomain bases.
In your case, f is a function of g, where g(t) = (x(t), y(t)). To apply the chain rule when t = a, we therefore need D[f](g(a)) and D[g](a). D[f](g(a))= D[f](x(a), y(a)) is, with respect to the standard bases, equivalent to multiplication by the matrix
[tex]\left[ \begin{array}{c} \left.\frac{\partial f}{\partial x}\right|_{(x(a), y(a))} \\ \left.\frac{\partial f}{\partial y}\right|_{(x(a), y(a))} \end{array}\right][/tex]
Likewise, D[g](a) is equivalent to multiplication by the matrix
[tex]\left[ \begin{array}{cc} \left.\frac{\partial x}{\partial t}\right|_{a} & \left.\frac{\partial y}{\partial t}\right|_{a} \end{array}\right][/tex]
The different matrix dimensions are due to the fact that the domain of g is R and the codomain is ##R^2##, so the derivative is a linear transformation associating displacement vectors at a in R with displacement vectors at g(a) in ##R^2## (called the cotangent space).
Therefore, the matrix form of our derivative ##D[f \circ g](a) = D[f](g(a)) \cdot D[g](a)## with respect to the standard bases is:
[tex]\left[ \begin{array}{c} \left.\frac{\partial f}{\partial x}\right|_{(x(a), y(a))} \\ \left.\frac{\partial f}{\partial y}\right|_{(x(a), y(a))} \end{array}\right]\cdot \left[ \begin{array}{cc} \left.\frac{\partial x}{\partial t}\right|_{a} & \left.\frac{\partial y}{\partial t}\right|_{a} \end{array}\right] = \left.\frac{\partial f}{\partial x}\right|_{(x(a), y(a))}\left.\frac{\partial x}{\partial t}\right|_{a} + \left.\frac{\partial f}{\partial y}\right|_{(x(a), y(a))}\left.\frac{\partial y}{\partial t}\right|_{a}[/tex]
Since x(t) and y(t) are actually single variable functions, the partial derivatives are equivalent to the ordinary derivatives, as you have in your expression.
 
  • Like
Likes OmegaKV
  • #4
OmegaKV said:
since f(x(t),y(t))=f(t) is really just a single variable function.

That's a common "abuse of notation", but it is technically wrong to name two different functions with the same letter "f". ( Yet it's one of those things that "everybody does", especially in discussing physics.)

A function of a single variable ##f(t)## doesn't have partial derivatives, so the ##f## on the right hand side of your equation ( ##f(x(t),y(t)) = f(t)## ) can't be the same as the ##f## on the left hand side. If we want to use precise notation, we would say:

##f(x,y)## is a real valued function of two variables.
##r(t)## and ##s(t)## are each real valued functions of a single variable.
The composition of functions given by ##g(t) = f(r(t), s(t))## is a real valued function ##g(t)## of a single variable.

That notation gives each distinct function a distinct name.If you have a good technical grasp of derivatives and partial derivatives, it would be helpful to look at tricky examples where the total derivative formula doesn't work. There can be cases when the partial derivatives of ##f(x,y)## either don't exist or exist and are not continuous at a point ##(x_0, y_0)## and yet the derivative ##D_t f(r(t), s(t)) ## exists for a particular path ##(r(t),s(t))##.

Try ## f(x,y) = \sqrt[3]{xy} , \ r(t) = t^2, s(t) = t## and consider the point ##x = 0, y = 0##.
Use the definition of a partial derivative as a limit to show ##\frac{\partial f}{\partial x}|_{(x = 0, y = 0)} = 0 = \frac{\partial f}{\partial y}|_{(x = 0, y = 0)} ##.
 
  • #5
Suppose we want to know the derivative of

g(t) = f(x(t), y(t))​

with respect to t.

I.e., we want to know how much g(t) changes when t changes just a little, say Δt . . . considered as a ratio to the amount t changes. (Since of course the amount that g(t) changes will depend on how much t changes.) That would be

Δg(t) / Δt = g(t+Δt) - g(t) / Δt.​

Now plugging this expression into the definition of g(t):

Δg(t) / Δt = (f(x(t+Δt), y(t+Δt)) - f(x(t), y(t))) / Δt.​

The right-hand side is starting to look like the definition of some derivative. We can always add and subtract the same term so:

Δg(t) / Δt = (f(x(t+Δt), y(t+Δt)) - f(x(t), y(t+Δt)) + f(x(t), y(t+Δt)) - f(x(t), y(t))) / Δt

= (f(x(t+Δt), y(t+Δt)) - f(x(t), y(t+Δt))) / Δt + (f(x(t), y(t+Δt)) - f(x(t), y(t)) / Δt.​

We're not quite there yet. But wait — we can also multiply and divide by the same thing (as long as it isn't equal to 0):

Δg(t) / Δt = (f(x(t+Δt), y(t+Δt)) - f(x(t), y(t+Δt))) / (x(t+Δt) - x(t)) ⋅ (x(t+Δt) - x(t)) / Δt + (f(x(t), y(t+Δt)) - f(x(t), y(t)) / (y(t+Δt) - y(t)) ⋅ (y(t+Δt) - y(t)) / Δt.​

Finally we can let Δt approach 0 and see the following formula materialize, as if by magic:

g'(t) = ∂1f(x(t), y(t)) ⋅ x'(t) + ∂2(x(t), y(t)) ⋅ y'(t)​

(Where the partial derivative symbols ∂1 and ∂2 are to be preferred (to ∂x and ∂y) since they are unambiguous, and do not depend on which variables are plugged into the first and second slots of f( , ).)
 
Last edited:

1. What is the total derivative formula?

The total derivative formula, also known as the chain rule, is used to calculate the derivative of a composite function. It states that the derivative of a composite function is equal to the product of the derivatives of its individual functions.

2. How is the total derivative formula derived?

The total derivative formula is derived using the concept of limits and the definition of the derivative. It involves breaking down a composite function into its individual functions and taking the derivative of each function separately.

3. Can the total derivative formula be applied to any function?

Yes, the total derivative formula can be applied to any function that is a composite of two or more functions. This includes polynomial, exponential, logarithmic, and trigonometric functions.

4. How is the total derivative formula used in real-life applications?

The total derivative formula is used in many fields of science and engineering, such as physics, chemistry, economics, and computer science. It is used to calculate rates of change, optimize functions, and model complex systems.

5. Are there any limitations to using the total derivative formula?

One limitation of the total derivative formula is that it can only be used for functions that are differentiable. It also assumes that the individual functions in the composite function are continuous. In some cases, the formula may not yield an exact solution and approximation methods may need to be used.

Similar threads

Replies
6
Views
2K
  • Calculus
Replies
2
Views
2K
Replies
4
Views
2K
Replies
2
Views
1K
Replies
1
Views
939
Replies
4
Views
1K
Replies
12
Views
2K
Replies
3
Views
1K
Replies
5
Views
1K
Replies
13
Views
1K
Back
Top