# Chain Rule of Differentiation

1. Mar 27, 2015

### Staff: Mentor

I'm in Calc 1 and the Chain Rule is giving me one hell of a rough time. I've spent about 10-12 hours over the last few days just on the homework problems in this one section (only getting about 15-20 problems done) and still feel like I barely understand it. Does anyone have any tips, tricks, hints, or suggestions that you found helpful when learning the chain rule?

I understand that the chain rule is used when you have composite functions, but there's something about the whole process that makes it super difficult for some reason. Perhaps its trying to keep track of each function as I work? I've tried using letters in place of the functions when I decompose them, but it's almost equally confusing. Writing out the entire function as I go seems to work, but it takes forever and is prone to sign errors and other little mistakes. Any suggestions with this?

And if someone tells me to just do more problems I'm going to wrap the chain rule around their neck and pull it tight!!

Thanks!

2. Mar 27, 2015

### ShayanJ

Consider the functions $f(x,y,v)$,$x(u,v)$ , $y(u,v,t)$ and $t(v)$. Now let's calculate $\frac{df}{dv}$!
The thing you should keep in mind, is that you should have a map of all the variables and their relationships and then you should identify all the paths that go from f to v. One path is really straightforward because f explicitly depends on v, so we should have a $\frac{\partial f}{\partial v}$. The next path, is through x, because x depends on v. So we should traverse this path. The first step is from f to x, which is traversed by $\frac{\partial f}{\partial x}$. Now we should go from x to v, which is done by multiplying the latter derivative by $\frac{\partial x}{\partial v}$. The next two paths are through y. One is traversed from f to y to v and the other from f to y to t to v, which means we should have $\frac{\partial f}{\partial y} \frac{\partial y}{\partial v}$ and $\frac{\partial f}{\partial y}\frac{\partial y}{\partial t}\frac{d t}{dv}$.

3. Mar 27, 2015

### Staff: Mentor

I'm sorry, Shyan, but you lost me in your first sentence. I've never seen a function with multiple variables inside it and I can't follow the rest of your post.

4. Mar 27, 2015

### ShayanJ

So I don't understand what's your problem with the chain rule. If you give an example and explain why its hard for you to do the example, I can help better.

5. Mar 27, 2015

### Staff: Mentor

The chain rule, for functions of one variable, is all about differentiating function compositions. If you have a function that doesn't involve composition, such as f(x) = x3, differentiation is pretty straightforward. $f'(x) = \frac d {dx} x^3 = 3x^2$.

Now, if you have a function that involves a composition, such as g(x) = (x2 + x)3, you could think of this as g(x) = u3, where u(x) = x2 + x. Here $g'(x) = \frac d {dx} u^3 = \frac d {du} u^3 * \frac{du}{dx} = 3u^2 * (2x + 1) = 3(u^2 + x)^2 * (2x + 1)$.

The key to understanding the notation above is realizing the $\frac {du^3}{dx} = \frac{du^3}{du} * \frac{du}{dx}$. In essence, we have multiplied by $\frac{du}{du}$. Although you've probably been told not to regard $\frac{dy}{dx}$ and other derivativeas fractions, it does no harm to do so here, and has the added advantage of producing the correct answer.

If you expand (x2 + x)3, and then differentiate, you'll get exactly the same thing, providing that you factor what you get using this alternate technique.

6. Mar 28, 2015

### Staff: Mentor

If I knew exactly what the problem was, I'd tell you. It just feels like crawling through mud when I do these problems.

Okay, how do you know if a function is composite? I know 2x+2 is a composite function, but I don't feel like I could explain to someone why. Is 2x a composite function? Is (2x)/(1+x)?

7. Mar 28, 2015

### Staff: Mentor

Let g(x) = x + 2, and f(x) = 22. Then 2x + 2 = g(f(x)), which is also written as $(g \circ f)(x)$.
You could write it as a composite function, but it would be trivial, with one of the functions being g(x) = x.
You could write it as a composite function, but there wouldn't be much point in doing so. f(x) = 2x, g(x) = $\frac x {1 + x}$. The $\frac{2x} {1 + x} = f(g(x))$. It's more important to recognize this one as a rational function: the quotient of two polynomials. To differentiate this function, it would be silly to use the chain rule. The quotient rule would be most applicable here. However, if you wrote $\frac{2x} {1 + x}$ as (2x) * (1 + x)-1, then you would probably be thinking of using the product rule, following by the chain rule (to get the derivative of (1 + x)-1).

8. Mar 28, 2015

### Staff: Mentor

Okay, now I'm confused. I thought you had to use the chain rule if the function is composite...

9. Mar 28, 2015

### ShayanJ

OK. Let's start from the beginning. The definition of a derivative is:
$\displaystyle f'(x)=\lim_{\delta \to 0} \frac{f(x+\delta)-f(x)}{\delta}$
This definition gives us x'=1. This is a direct computation, no tricks involved. Other examples are $(e^x)'=e^x$, $(\sin x)'=\cos x$,etc.
You can do all differentiations like this, directly. But it'll be really painful. So people figured out some tricks:
$(uv)'=u'v+uv'$,$(\frac u v)'=\frac{u'v-uv'}{v^2}$,etc.
So differentiating is now a procedure with different layers of abstraction with the last layer being the direct derivatives with no abstraction. I can say that each layer of abstraction is a composition and so any time you use a rule of the above kind, you're dealing with a composite function. In fact in differentiating something like $y=\tan(\frac{2x}{x+1})$, we do $\frac{dy}{dx}=\frac{dy}{du} \frac{du}{dv}\frac{dv}{dx}$ where $u=\frac{2x}{x+1}$ and $v=x$.

10. Mar 28, 2015

### Staff: Mentor

Okay, that all seems to make sense, Shyan. Now my question is when do you use the chain rule? Do you use it only when the other methods don't work?

11. Mar 28, 2015

### ShayanJ

We always use it because the only alternative is using the definition of derivative directly which is painful. The point is that we don't always mention we used it!
Lets go back to my example, $y=\tan(\frac{2x}{x+1})$. If I want to take the derivative, I write something like:
$y'=\sec^2(\frac{2x}{x+1}) \frac{2(x+1)-2x}{(x+1)^2}$. Yeah, I didn't put $u=\frac{2x}{x+1}$ and v=x but it doesn't mean I didn't use chain rule. I first calculated the derivative of tangent regardless of what is it that it depends on and then I took the derivative of the argument w.r.t. x. Its chain rule, whether I mention it or not.

12. Mar 28, 2015

### Staff: Mentor

No no, I mean that if I have a function F(x) = 2x2, that's technically a composite function consisting of G(U) = U2 and U(x) = 2x, right? But we don't use the chain rule here, we use the power rule and just multiply 2x2 by 2, reduce the exponent by 1, and come up with F'(x) = 4x. So why didn't we use the chain rule if the function was composite?

13. Mar 28, 2015

### ShayanJ

Yeah, you're right but this is one of those cases where the function being differentiate is an elementary function and there is no layer of abstraction.
You should distinguish between rules like $(x^n)'=nx^{n-1}$, $(\sin x)'=\cos x$,etc. and rules like $(uv)'=u'v+uv'$, $(u+v)'=u'+v'$ etc.
The former rules are the rock bottom of the layers of abstraction while the latter rules are for transition between layers, or something like that!
I should say that still we can think of $(x^n)'=nx^{n-1}$ as a rule of second kind and only think of x'=1 as a rock bottom rule.

Last edited: Mar 28, 2015
14. Mar 28, 2015

### SteamKing

Staff Emeritus
Well, here's a basic problem: if U(x) = 2x and G(U) = U2, then G(U) ≠ 2x2. G(U) = U2(x) = (2x)2 = 4x2

If you wanted to find dG(U) / dx, then the chain rule would give dG(U) / dx = 2U(x)*dU(x) / dx = 2* (2x) * 2 = 8x = d(4x2)/dx

15. Mar 28, 2015

### Staff: Mentor

Whoops, you're right. Would it be U(x) = x2 and G(U) = 2U?

16. Mar 28, 2015

### Staff: Mentor

That makes sense.

17. Mar 28, 2015

### Staff: Mentor

So, let's say we have a function F(x) = (2x-1)(x+1).
Is it correct to say that G(x) = 2x-1 and H(x) = x+1 and neither function is part of the other? In other words, you can't use the chain rule because neither H(x) or G(x) are dependent on the other? This contrasts with the function T(x) = 2(x+1)-1 which can be decomposed into V(U) = 2U-1 and U(x) = X+1, which means that V(U) is dependent on U(x). Is all that right?

18. Mar 28, 2015

### ShayanJ

Chain rule, in the restricted sense that you know, can't be used for F(x). But if you consider the full-grown chain rule as I explained in post #2, then it can be applied to F(x) too.

19. Mar 28, 2015

### SteamKing

Staff Emeritus
That would work.

20. Mar 28, 2015

### Staff: Mentor

Yes, correct.
Yes.

It is often the case that you can use more than one differentiation rule, and as you get more experience you'll be able to decide which is the better rule to use. Some examples:

1. f(x) = (x + 1)(x - 2)
The product rule would work, or you could expand the right side to x2 - x - 2, and differentiate using the sum rule (derivative of a sum is the sum of the derivatives) and the power rule (d/dx(xn = nxn - 1)
2. g(x) = 3*sin(x)
The product rule would work, but should never be used when one of the factors is a constant (3 here). Instead, use the constant multiple rule (d/dx(k*f(x)) = k * f'(x) ).
3. h(x) = 1/x
Here you could use the quotient rule, or you could rewrite the function's formula as x-1 and use the power rule, which would be easier in this problem.
4. f(x) = $\frac{x^2}{2}$
The quotient rule appears to be called for, but it would be silly to use it here, as $\frac{x^2}{2} = (1/2)x^2$. The better choice is to use the constant multiple rule and the power rule.
5. g(x) = (x2 + x)2
You could use the chain rule here, or you could expand the right side and use the sum rule and power rule. If you are not required to use the chain rule, then the alternative I gave is a good choice.

The bottom line -- always use the simplest rule that applies.
Constant multiple rule is "better than" the product rule.
Product rule is usually "better than" the quotient rule.
Simple rules such as the constant rule (d/dx(C) = 0), constant multiple rule, sum rule, and power rule are better to use than the product rule, quotient rule, and chain rule.

By "better than" I mean simpler to use, which means you are less likely to make a mistake. A longer computation provides more opportunities to make a mistake.