Why can you cancel out the dx in u-substitution?

1. Apr 6, 2012

Say we want to find ∫(-cosx)sinx dx

set u = -cosx
so du/dx = sinx

∫u $\frac{du}{dx}$dx

And then the dx's apparently cancel out? What is going on? I thought $\frac{du}{dx}$ meant lim Δx→0 $\frac{u(x+Δx)-u(x)}{Δx}$ and that the dx in the context of an integral was a representation of the Δx's from the lim n→∞ $\sum^{n}_{i=1}$f($x_{i^{*}}$)Δx?

So how is this canceling justified with the above def'ns?

Last edited: Apr 6, 2012
2. Apr 6, 2012

I like Serena

The notation $\frac {\Delta u}{\Delta x}$ indicates the ratio between a change in u versus a change in x.
If the change is small enough (into the limit), this is equal to the derivative.
Indeed, that is the definition of the derivative.

A "very small" $\Delta x$ is denoted as dx.
When I say "very small", I mean small into the limit.
So the ratio ${du \over dx}$ is equal to the derivative.

Effectively this means that you can treat ${du \over dx}$ as any fraction.

For instance ${dy \over du} \cdot {du \over dx} = {dy \over dx}$ represents the chain rule that says that the derivative of y(u(x)) is equal to y'(u(x)) u'(x).
It's just a more intuitive notation.

3. Apr 6, 2012

∫∫→
Thanks I like Serena that makes sense except there is still one thing I am unclear on: I get that $\frac{du}{dx}$ represents how u changes in response to a tiny (as in approaching 0) change in x, but my prof told us that the dx in f(x)dx was just part of the notation of an integral to remind us of its connection to the riemann sum. So it seems wierd to me to mess with the dx since it is a part of the operator.

Can you maybe explain more in-depth the connection between u-substitution and the chain rule and how it allows us to cances dx's? I understand that we do u-substitution when we see a integrand with a function and its derivative present because the chain rule takes a function f(u) to its derivative by doing f'(u)*$\frac{du}{dx}$ so that means if we have a function with both u and $\frac{du}{dx}$ in it then we know its antiderivative must have had the chain rule used on it to get in that form. So the antiderivative (indefinite integral) of f'(u)*$\frac{du}{dx}$ is just going to be the antiderivative of f'(u).

But where does that all tie in with the concept of differentials that was mentioned in my book? Why does taking the chain rule in reverse require so much explanation and introduction of new terminology? I feel like I am missing something because it has to be more complicated than just: "if going from f(u) to its derivative is achieved by f'(u)*$\frac{du}{dx}$, then going from the derivative of f(u) back to f(u) must be achieved by ignoring $\frac{du}{dx}$ and just taking the antiderivative of f'(u).

Although that seems like it will mechanically yield f(u) as intended, what in the definitions and meaning of indefinite integrals allows us to do this? It just seems like a trick to me and I am having trouble relating it to the Riemann sum and limit stuff and what an integral actually is (sigh).

sorry for the novel but i want to be specific and i'm exasperated!

4. Apr 6, 2012

Char. Limit

Isn't the differential of a function of x defined as the differential of x multiplied by the derivative of said function with respect to x?*

*: In notation, du = u'(x) dx

5. Apr 7, 2012

I like Serena

An integral is the sum of inscribed rectangles.
Each rectangle has height f(x) and width dx.
Again dx is just a very small width, or increase in x.

The problem is that the intuitive notion of infinitesimals is not considered mathematically solid.
There have been a number of attempts to define infinitesimals in a way that is mathematically solid, but those attempts have not been accepted by the mathematical community.
A derivative and an integral are defined using notation with limits.

Mathematically you're not supposed to talk about changes that are zero.
However, the intuitive notion of ratios of infinitesimals and multiplications and divisions with it always works.
That is because a derivative really is a fraction, just taken into the limit.

Not a trick, just a more intuitive notion.
Physicists use it all the time, while mathematicians scoff on it.

6. Apr 9, 2012

jppike

This isn't true. In the 1960's a logician named Abraham Robinson proved the mathematical validity of infinitesimals by constructing the Hyperreal Number System, an ordered field which strictly contains the Real Numbers, but also includes numbers which are smaller than every real number (and larger! and in between!). This can be done (though it is not how Robinson originally constructed it) by defining an ultrafilter, and taking equivalence classes of sequences of real numbers modulo this ultrafilter.

It has now developed into a branch of mathematics known as Non-Standard Analysis, and is completely mathematically rigorous, and if one takes the existence of infinitesimals to be valid (which it is), then one can use this method to teach elementary calculus without resorting to the unfamiliar notion of epsilons and deltas. In fact, this very thread demonstrates exactly why there are many mathematicians who believe that a first introduction to calculus should be done using Non-Standard Analysis, as it is significantly more intuitive.

If the OP is interested enough to take a look at how the calculus is developed with this intuitive approach, I direct him to the free online book:

Elementary Calculus: An Infinitesimal Approach by H. Jerome Keisler

Maybe it will be of some help

7. Apr 9, 2012

I like Serena

I believe you just confirmed my statement.

8. Apr 9, 2012

jppike

I'm sorry? There has been a mathematically rigorous definition of infinitesimals that has been accepted by the mathematical community...

9. Apr 9, 2012

I like Serena

Ah, my difference in opinion is then that I believe that it's not the world-wide mathematical community that has accepted it.
Not everyone considers it valid.

10. Apr 9, 2012

micromass

Yes, they do. Robinson's work is valid, and I don't think you'll find a mathematician that doesn't think so.

The reason why we don't teach Robinson's approach is (I think):

1) It's difficult. Indeed, to define the hyperreals, one needs ultrafilters and Los' theorem and such stuff. This is far too difficult to give to students of real analysis or even calculus. OK, one could take the existence of hyperreals on faith, but mathematicians don't tend to like that. As it stands, you'll need some logic to rigorously do everything.
Another approach is the approach of the surreal numbers. These can easily be defined analogously to Dedekind cuts. But they are quite unintuitive.
The standard approach has its difficulties in epsilon-delta stuff. But that is much easier compared to the other approaches mentioned.

2) It's untraditional. All textbooks already treat real analysis with limits and not with infinitesimals. The limit-approach is far too entrenched now to change.
Furthermore, all research articles also use limits. So one will need to learn limits anyway!!! So there is no real reason to start doing things with infinitesimals.

Also, I would like to mention that things like dx have a well-defined meaning in differential geometry. So one does not need infinitesimals to give it meaning. The meaning of dx in differential geometry is made exactly to let $dy=\frac{dy}{dx}dx$ work out. Whether this gives any improvement to the existing theory, I don't know.

11. Apr 9, 2012

micromass

To the OP:

I should praise you for finding this out. Many people justify this rule as "just cancelling out the dx". This is of course NOT what is happening. Indeed, whatever people say $\frac{du}{dx}$ is NOT a fraction as it is not defined as a fraction. Therefore, cancelling out things is not allowed just like that, but it requires a rigorous reason.

So we wish to prove that

$$\int_a^b f(g(x))\frac{dg}{dx}dx=\int_{g(a)}^{g(b)} f(u)du$$

Let F be a primitive of f. The left-hand side is equal to F(g(b))-F(g(a)) (by the fundamental theorem of calculus).

By the chain rule, we have that

$$\frac{dF\circ g}{dx}(x)=\frac{dF}{du}(g(x))\frac{dg}{dx}(x)=f(g(x))\frac{dg}{dx}(x)$$

Thus we see that $F\circ g$ is an antiderivative of $f(g(x))\frac{dg}{dx}$. This means (fundamental theorem of calculus) that

$$\int_a^b f(g(x))\frac{dg}{dx}dx=F(g(b))-F(g(a))$$

This proves that the two integrals in question are equal.

So we see that this result was not trivially "eliminating dx from both sides". It involved the chain rule and the fundamental theorem of calculus!!

12. Apr 9, 2012

I like Serena

Hmm, I thought that
$$\frac{du}{dx}\overset{\textrm{def}}{=} \lim\limits_{Δx \to 0} \frac{Δu}{Δx}$$
by definition (with the proper definitions for u and x).
Since this is just another way of writing:
$$u'(x)\overset{\textrm{def}}{=}\lim\limits_{h \to 0} \frac{u(x+h)-u(x)}{h}$$

So it's not exactly a fraction, but a fraction with the denominator taken into the limit.

13. Apr 9, 2012

micromass

Yes, so it's the limit of a fraction and NOT a fraction.
And it is not obvious that the limit of a fraction should have the same properties as a fraction.

14. Apr 9, 2012

I like Serena

Doesn't it?
I think that as long as the denominator is not actually zero and the function is continuously differentiable, it does.

15. Apr 9, 2012

micromass

What does that have to do with it??

You can't say that $\frac{df}{dx}=\frac{df}{du}\frac{du}{dx}$ (for example) by saying that they are both limits of fractions and that thus the same thing most hold for those expressions. The argument is a bit more complicated than that.

16. Apr 9, 2012

I like Serena

Let's see.
I've been puzzling at this before and I wondered if my argument was mathematically solid.

Let $f, u,$ and $f^*\overset{\textrm{def}}{=}f\circ u$ be continuously differentiable functions from $ℝ→ℝ$, let $x \in ℝ$, and let the derivative of u be non-zero at x.

Let $Δx>0$ be small enough such that the derivative of u is non-zero in the interval [x, x+Δx].

Now define:
$$\begin{array}{lll}Δf^* & \overset{\textrm{def}}{=}& f^*(x+Δx) - f^*(x) \\ Δu & \overset{\textrm{def}}{=}& u(x+Δx) - u(x) \\ Δf & \overset{\textrm{def}}{=}& f(u(x)+Δu) - f(u(x)) \end{array}$$
Note that this means that Δu is non-zero.

Then at point x we have:
$$\frac{df^*}{dx}\overset{\textrm{def}}{=} \lim\limits_{Δx \to 0} \frac{Δf^*}{Δx}=\lim\limits_{Δx \to 0} \frac{Δf^*}{Δu} \frac{Δu}{Δx} \qquad\qquad(1)$$

Since $\frac{Δf\ ^*}{Δu}=\frac{(f\circ u)(x+Δx)-(f\circ u)(x)}{Δu}=\frac{f(u(x)+Δu)-f(u(x))}{Δu}$ and $Δu \to 0$ as $Δx \to 0$, it follows that:
$$\lim\limits_{Δx \to 0} \frac{Δf^*}{Δu}=\lim\limits_{Δu \to 0} \frac{Δf}{Δu} \qquad\qquad(2)$$

And thus at point x:
$$\frac{df^*}{dx}=\lim\limits_{Δx \to 0} \frac{Δf^*}{Δu} \frac{Δu}{Δx}=\lim\limits_{Δu \to 0} \frac{Δf}{Δu} \cdot \lim\limits_{Δx \to 0} \frac{Δu}{Δx}=\frac{df}{du} \frac{du}{dx} \qquad\qquad(3)$$
$\Box.$

This is the long version, which I'd like to shorten.
Can you poke holes in it?

17. Apr 9, 2012

micromass

I think that proof is sound. But your proof isn't as general as you can do, of course, which is a problem imo.

18. Apr 10, 2012

jppike

Before I begin my response I should mention that I am honestly not one of the advocates that infinitesimal calculus should be one's first introduction to the calculus. Indeed, I don't believe I have a sound enough understanding of Non Standard Analysis yet to really make such a decision. I'm just playing devil's advocate:

1) If you're going to argue that in terms of mathematical logic, we have to take something on faith in order to begin a study of the infinitesimal calculus, then you should basically be arguing against all of introductory mathematics. Indeed, in a first introduction to a number of courses we take many things on faith that are deeply rooted into mathematical logic; consider the axiom of choice and induction as the most obvious examples. How often is Zorn's Lemma implicitly used in a first calculus course? Furthermore, in a first course on calculus one is expected to take a lot of results for granted anyways! I recall that in my first calculus course we were expected to just accept, for example, the Intermediate Value Theorem. It wasn't until a year later in analysis that I saw a rigorous proof of it for the Real numbers. In fact, when one is introduced to the calculus for the very first time (say, in high school), one is rarely introduced using Weierstrass's epsilon-delta formalism. A limit is "as a gets arbitrarily close to a, f(x) gets arbitrarily close to L". How is accepting this definition any better than accepting an infinitesimal?

2) Non-Standard Analysis is already starting to produce enough applications that I would suggest that a modern mathematician ought to be at least somewhat familiar with it, if not have studied a semester course in it. In such a case, why is it better to learn standard analysis and then have to try and convert epsilon-delta language into infinitesimal language than the other way around? Indeed, the main argument is that it's not; that once you have learned the intuitive approach using infinitesimals, the epsilon-delta formulation is easy to transition to. At any rate the argument that we shouldn't change the way we do it because that's the way we've been doing it isn't really much of an argument, is it? If one can demonstrate that being introduced to NSA makes for a deeper understanding of the concepts of analysis, then that's the way it should be taught, regardless of how we have been teaching it so far.

19. Apr 10, 2012

micromass

I actually detest playing devil's advocate. I prefer it if people would just say their opinion, instead of trying to set up a pointless argument. So if it's going to be a devil's advocate thing, then this will be my last reply on the topic.

Consider me old-fashioned, but what you accept on the standard approach is much more intuitive than what you accept in the infinitesimal approach. Accepting induction and the axiom of choice are quite obvious things to accept. When I first saw induction, I thought it was very obvious. The same with the axiom of choice (it was only later that I found out that it was problematic). The concept of limit is also an obvious one.
On the other hand, I find infinitesimals less intuitive. I mean: the existence of a number e such that 0<e<1/n for all n. I can hear the questions coming from a high school student:

What's the decimal representation of e??
It doesn't have any.
How can a number not have any representation, is it a fake number??
Uuuh...
Oh, I get it, e=1-0.99999... no??
Hmmm...

In a calculus course, this might be ok. But what in a real analysis course?? This is supposed to be the foundation of analysis, so it's supposed to have a construction of the reals and the hyperreals. This is too hard to do.

OK, you always take things on faith, but not being able to construct the space you're working with is a big no-no.

I realize I am talking in a point-of-view of a standard analyst. Perhaps if I encountered hyperreals in high school and before, then I would talk differently.

To be honest, I never encountered an application of it. I just think it's a neat concept.

It's a huge argument. ALL the textbooks are written in standard language. So a lot of wonderful books like Rudin, Pugh, Spivak, etc. become obsolete. It will be a huge undertaking to correct the books or to produce new ones.

And the standard approach needs to be learned anyway. Almost every research article is written in standard language, while almost no article is written with infinitesimals. So what's the point??
You really have the choice between "teaching the standard approach" and "teaching the standard approach AND infinitesimals". That last thing requires extra time, extra books, perhaps confused students. And that for almost no benefits.

I doubt that it really makes for a deeper understand of the concepts of analysis. If we teach both standard analysis AND hyperreals, then we need to divide the course in half and spend less time on the more important concepts. That will actually reduce the understanding of the concepts.

The same discussion happens with replacing $\pi$ with $\tau=2\pi$. It's a useless undertakings. It doesn't matter if $\tau$ is a more intuitive concept. It's too late to change it now.

If I could go back in time and replace $\pi$ with $\tau$ the moment it was invented, then I would do so. But it's too deeply rooted in the community now. The same thing with hyperreals (ignoring the fact that hyperreals require a lot of very very nontrivial logic).

20. Apr 10, 2012

I like Serena

I think it is general enough for the problem statement of the OP who has a continuous differentiable function from $ℝ→ℝ$.
Indeed, I think this is the only type of function you'll see in high school.

My more general proposition (without proof) is:

Let $u: I→ℝ$ be a continuous differentiable function on an open interval $I \subset ℝ$ around $x \in ℝ$, with $u(b)-u(a)≠0$ for any $a, b \in I$.
Let $f: u(I)→ℝ$ be a continuously differentiable function on u(I).
And let $f^* \equiv f\circ u$.

Then $\frac{df\ ^*}{dx}, \frac{df}{du}, \frac{du}{dx}, \frac{dx}{du}$ all behave as fractions on interval I.

In particular this implies the chain rule: $\frac{df\ ^*}{dx} = \frac{df}{du} \frac{du}{dx}$.
And it also implies the inverse derivative rule: $\frac{du}{dx} = {1 \over \frac{dx}{du}}$.

I am wondering if this is too general, but I believe it is true.
Can you think of a counter example?