Mastering the Chain Rule: Tips and Tricks for Calculus Students

Drakkith · Mar 27, 2015

I'm in Calc 1 and the Chain Rule is giving me one hell of a rough time. I've spent about 10-12 hours over the last few days just on the homework problems in this one section (only getting about 15-20 problems done) and still feel like I barely understand it. Does anyone have any tips, tricks, hints, or suggestions that you found helpful when learning the chain rule?

I understand that the chain rule is used when you have composite functions, but there's something about the whole process that makes it super difficult for some reason. Perhaps its trying to keep track of each function as I work? I've tried using letters in place of the functions when I decompose them, but it's almost equally confusing. Writing out the entire function as I go seems to work, but it takes forever and is prone to sign errors and other little mistakes. Any suggestions with this?

And if someone tells me to just do more problems I'm going to wrap the chain rule around their neck and pull it tight!

Thanks!

ShayanJ · Mar 27, 2015

Consider the functions ## f(x,y,v) ##,## x(u,v) ## , ##y(u,v,t)## and ## t(v) ##. Now let's calculate ## \frac{df}{dv} ##!
The thing you should keep in mind, is that you should have a map of all the variables and their relationships and then you should identify all the paths that go from f to v. One path is really straightforward because f explicitly depends on v, so we should have a ## \frac{\partial f}{\partial v}##. The next path, is through x, because x depends on v. So we should traverse this path. The first step is from f to x, which is traversed by ## \frac{\partial f}{\partial x}##. Now we should go from x to v, which is done by multiplying the latter derivative by ## \frac{\partial x}{\partial v} ##. The next two paths are through y. One is traversed from f to y to v and the other from f to y to t to v, which means we should have ## \frac{\partial f}{\partial y} \frac{\partial y}{\partial v} ## and ##\frac{\partial f}{\partial y}\frac{\partial y}{\partial t}\frac{d t}{dv}##.

Drakkith · Mar 27, 2015

I'm sorry, Shyan, but you lost me in your first sentence. I've never seen a function with multiple variables inside it and I can't follow the rest of your post.

ShayanJ · Mar 27, 2015

Drakkith said:

I'm sorry, Shyan, but you lost me in your first sentence. I've never seen a function with multiple variables inside it and I can't follow the rest of your post.

I thought you're asking about the full-grown chain rule!
So I don't understand what's your problem with the chain rule. If you give an example and explain why its hard for you to do the example, I can help better.

Mark44 · Mar 27, 2015

The chain rule, for functions of one variable, is all about differentiating function compositions. If you have a function that doesn't involve composition, such as f(x) = x³, differentiation is pretty straightforward. ##f'(x) = \frac d {dx} x^3 = 3x^2##.

Now, if you have a function that involves a composition, such as g(x) = (x² + x)³, you could think of this as g(x) = u³, where u(x) = x² + x. Here ##g'(x) = \frac d {dx} u^3 = \frac d {du} u^3 * \frac{du}{dx} = 3u^2 * (2x + 1) = 3(u^2 + x)^2 * (2x + 1)##.

The key to understanding the notation above is realizing the ##\frac {du^3}{dx} = \frac{du^3}{du} * \frac{du}{dx}##. In essence, we have multiplied by ##\frac{du}{du}##. Although you've probably been told not to regard ##\frac{dy}{dx}## and other derivativeas fractions, it does no harm to do so here, and has the added advantage of producing the correct answer.

If you expand (x² + x)³, and then differentiate, you'll get exactly the same thing, providing that you factor what you get using this alternate technique.

Drakkith · Mar 28, 2015

Shyan said:

If you give an example and explain why its hard for you to do the example, I can help better.

If I knew exactly what the problem was, I'd tell you. It just feels like crawling through mud when I do these problems.

Mark44 said:

Now, if you have a function that involves a composition,

Okay, how do you know if a function is composite? I know 2^x+2 is a composite function, but I don't feel like I could explain to someone why. Is 2x a composite function? Is (2x)/(1+x)?

Mark44 · Mar 28, 2015

Drakkith said:

Okay, how do you know if a function is composite? I know 2^x+2 is a composite function, but I don't feel like I could explain to someone why.

Let g(x) = x + 2, and f(x) = 2². Then 2^{x + 2} = g(f(x)), which is also written as ##(g \circ f)(x)##.

Drakkith said:

Is 2x a composite function?

You could write it as a composite function, but it would be trivial, with one of the functions being g(x) = x.

Drakkith said:

Is (2x)/(1+x)?

You could write it as a composite function, but there wouldn't be much point in doing so. f(x) = 2x, g(x) = ##\frac x {1 + x}##. The ##\frac{2x} {1 + x} = f(g(x))##. It's more important to recognize this one as a rational function: the quotient of two polynomials. To differentiate this function, it would be silly to use the chain rule. The quotient rule would be most applicable here. However, if you wrote ##\frac{2x} {1 + x}## as (2x) * (1 + x)^-1, then you would probably be thinking of using the product rule, following by the chain rule (to get the derivative of (1 + x)^-1).

Drakkith · Mar 28, 2015

Okay, now I'm confused. I thought you had to use the chain rule if the function is composite...

ShayanJ · Mar 28, 2015

Drakkith said:

Okay, how do you know if a function is composite? I know 2x+2 is a composite function, but I don't feel like I could explain to someone why. Is 2x a composite function? Is (2x)/(1+x)?

OK. Let's start from the beginning. The definition of a derivative is:
## \displaystyle f'(x)=\lim_{\delta \to 0} \frac{f(x+\delta)-f(x)}{\delta} ##
This definition gives us x'=1. This is a direct computation, no tricks involved. Other examples are ## (e^x)'=e^x##, ## (\sin x)'=\cos x##,etc.
You can do all differentiations like this, directly. But it'll be really painful. So people figured out some tricks:
## (uv)'=u'v+uv'##,##(\frac u v)'=\frac{u'v-uv'}{v^2}##,etc.
So differentiating is now a procedure with different layers of abstraction with the last layer being the direct derivatives with no abstraction. I can say that each layer of abstraction is a composition and so any time you use a rule of the above kind, you're dealing with a composite function. In fact in differentiating something like ## y=\tan(\frac{2x}{x+1})##, we do ## \frac{dy}{dx}=\frac{dy}{du} \frac{du}{dv}\frac{dv}{dx} ## where ##u=\frac{2x}{x+1} ## and ## v=x ##.

Drakkith · Mar 28, 2015

Okay, that all seems to make sense, Shyan. Now my question is when do you use the chain rule? Do you use it only when the other methods don't work?

ShayanJ · Mar 28, 2015

Drakkith said:

Okay, that all seems to make sense, Shyan. Now my question is when do you use the chain rule? Do you use it only when the other methods don't work?

We always use it because the only alternative is using the definition of derivative directly which is painful. The point is that we don't always mention we used it!
Lets go back to my example, ## y=\tan(\frac{2x}{x+1})##. If I want to take the derivative, I write something like:
## y'=\sec^2(\frac{2x}{x+1}) \frac{2(x+1)-2x}{(x+1)^2} ##. Yeah, I didn't put ## u=\frac{2x}{x+1} ## and v=x but it doesn't mean I didn't use chain rule. I first calculated the derivative of tangent regardless of what is it that it depends on and then I took the derivative of the argument w.r.t. x. Its chain rule, whether I mention it or not.

Drakkith · Mar 28, 2015

No no, I mean that if I have a function F(x) = 2x², that's technically a composite function consisting of G(U) = U² and U(x) = 2x, right? But we don't use the chain rule here, we use the power rule and just multiply 2x² by 2, reduce the exponent by 1, and come up with F'(x) = 4x. So why didn't we use the chain rule if the function was composite?

ShayanJ · Mar 28, 2015

Drakkith said:

No no, I mean that if I have a function F(x) = 2x², that's technically a composite function consisting of G(U) = U² and U(x) = 2x, right? But we don't use the chain rule here, we use the power rule and just multiply 2x² by 2, reduce the exponent by 1, and come up with F'(x) = 4x. So why didn't we use the chain rule if the function was composite?

Yeah, you're right but this is one of those cases where the function being differentiate is an elementary function and there is no layer of abstraction.
You should distinguish between rules like ## (x^n)'=nx^{n-1} ##, ## (\sin x)'=\cos x ##,etc. and rules like ## (uv)'=u'v+uv' ##, ## (u+v)'=u'+v' ## etc.
The former rules are the rock bottom of the layers of abstraction while the latter rules are for transition between layers, or something like that!
I should say that still we can think of ## (x^n)'=nx^{n-1} ## as a rule of second kind and only think of x'=1 as a rock bottom rule.

SteamKing · Mar 28, 2015

Drakkith said:

No no, I mean that if I have a function F(x) = 2x², that's technically a composite function consisting of G(U) = U² and U(x) = 2x, right?

Well, here's a basic problem: if U(x) = 2x and G(U) = U², then G(U) ≠ 2x². G(U) = U²(x) = (2x)² = 4x²

If you wanted to find dG(U) / dx, then the chain rule would give dG(U) / dx = 2U(x)*dU(x) / dx = 2* (2x) * 2 = 8x = d(4x²)/dx

Drakkith · Mar 28, 2015

SteamKing said:

Well, here's a basic problem: if U(x) = 2x and G(U) = U2, then G(U) ≠ 2x2. G(U) = U2(x) = (2x)2 = 4x2

Whoops, you're right. Would it be U(x) = x² and G(U) = 2U?

Drakkith · Mar 28, 2015

Shyan said:

The former rules are the rock bottom of the layers of abstraction while the latter rules are for transition between layers, or something like that!

That makes sense.

Drakkith · Mar 28, 2015

So, let's say we have a function F(x) = (2x-1)(x+1).
Is it correct to say that G(x) = 2x-1 and H(x) = x+1 and neither function is part of the other? In other words, you can't use the chain rule because neither H(x) or G(x) are dependent on the other? This contrasts with the function T(x) = 2(x+1)-1 which can be decomposed into V(U) = 2U-1 and U(x) = X+1, which means that V(U) is dependent on U(x). Is all that right?

ShayanJ · Mar 28, 2015

Drakkith said:

So, let's say we have a function F(x) = (2x-1)(x+1).
Is it correct to say that G(x) = 2x-1 and H(x) = x+1 and neither function is part of the other? In other words, you can't use the chain rule because neither H(x) or G(x) are dependent on the other? This contrasts with the function T(x) = 2(x+1)-1 which can be decomposed into V(U) = 2U-1 and U(x) = X+1, which means that V(U) is dependent on U(x). Is all that right?

Chain rule, in the restricted sense that you know, can't be used for F(x). But if you consider the full-grown chain rule as I explained in post #2, then it can be applied to F(x) too.

SteamKing · Mar 28, 2015

Drakkith said:

Whoops, you're right. Would it be U(x) = x² and G(U) = 2U?

That would work.

Mark44 · Mar 28, 2015

Drakkith said:

So, let's say we have a function F(x) = (2x-1)(x+1).
Is it correct to say that G(x) = 2x-1 and H(x) = x+1 and neither function is part of the other?

Yes, correct.

Drakkith said:

In other words, you can't use the chain rule because neither H(x) or G(x) are dependent on the other? This contrasts with the function T(x) = 2(x+1)-1 which can be decomposed into V(U) = 2U-1 and U(x) = X+1, which means that V(U) is dependent on U(x). Is all that right?

Yes.

It is often the case that you can use more than one differentiation rule, and as you get more experience you'll be able to decide which is the better rule to use. Some examples:

1. f(x) = (x + 1)(x - 2)
The product rule would work, or you could expand the right side to x² - x - 2, and differentiate using the sum rule (derivative of a sum is the sum of the derivatives) and the power rule (d/dx(xⁿ = nx^{n - 1})
2. g(x) = 3*sin(x)
The product rule would work, but should never be used when one of the factors is a constant (3 here). Instead, use the constant multiple rule (d/dx(k*f(x)) = k * f'(x) ).
3. h(x) = 1/x
Here you could use the quotient rule, or you could rewrite the function's formula as x^-1 and use the power rule, which would be easier in this problem.
4. f(x) = ##\frac{x^2}{2}##
The quotient rule appears to be called for, but it would be silly to use it here, as ##\frac{x^2}{2} = (1/2)x^2##. The better choice is to use the constant multiple rule and the power rule.
5. g(x) = (x² + x)²
You could use the chain rule here, or you could expand the right side and use the sum rule and power rule. If you are not required to use the chain rule, then the alternative I gave is a good choice.

The bottom line -- always use the simplest rule that applies.
Constant multiple rule is "better than" the product rule.
Product rule is usually "better than" the quotient rule.
Simple rules such as the constant rule (d/dx(C) = 0), constant multiple rule, sum rule, and power rule are better to use than the product rule, quotient rule, and chain rule.

By "better than" I mean simpler to use, which means you are less likely to make a mistake. A longer computation provides more opportunities to make a mistake.

Mark44 · Mar 28, 2015

Shyan said:

Chain rule, in the restricted sense that you know, can't be used for F(x). But if you consider the full-grown chain rule as I explained in post #2, then it can be applied to F(x) too.

Drakkith is currently working with functions of a single variable, so please try to tailor your help with his current level of knowledge in mind.

Mastering the Chain Rule: Tips and Tricks for Calculus Students

1. What is the chain rule of differentiation?

2. Why is the chain rule important?

3. How is the chain rule applied?

4. Can the chain rule be used for any type of composite function?

5. How does the chain rule relate to the product and quotient rules of differentiation?

Similar threads

Hot Threads

Recent Insights