Why can you cancel out the dx in u-substitution?

nickadams · Apr 6, 2012

Say we want to find ∫(-cosx)sinx dx

set u = -cosx
so du/dx = sinx

∫u [itex]\frac{du}{dx}[/itex]dxAnd then the dx's apparently cancel out? What is going on? I thought [itex]\frac{du}{dx}[/itex] meant lim Δx→0 [itex]\frac{u(x+Δx)-u(x)}{Δx}[/itex] and that the dx in the context of an integral was a representation of the Δx's from the lim n→∞ [itex]\sum^{n}_{i=1}[/itex]f([itex]x_{i^{*}}[/itex])Δx?

So how is this canceling justified with the above def'ns?

I like Serena · Apr 6, 2012

The notation ##\frac {\Delta u}{\Delta x}## indicates the ratio between a change in u versus a change in x.
If the change is small enough (into the limit), this is equal to the derivative.
Indeed, that is the definition of the derivative.

A "very small" ##\Delta x## is denoted as dx.
When I say "very small", I mean small into the limit.
So the ratio ##{du \over dx}## is equal to the derivative.

Effectively this means that you can treat ##{du \over dx}## as any fraction.

For instance ##{dy \over du} \cdot {du \over dx} = {dy \over dx}## represents the chain rule that says that the derivative of y(u(x)) is equal to y'(u(x)) u'(x).
It's just a more intuitive notation.

nickadams · Apr 6, 2012

∫∫→

I like Serena said:

The notation ##\frac {\Delta u}{\Delta x}## indicates the ratio between a change in u versus a change in x.
If the change is small enough (into the limit), this is equal to the derivative.
Indeed, that is the definition of the derivative.

A "very small" ##\Delta x## is denoted as dx.
When I say "very small", I mean small into the limit.
So the ratio ##{du \over dx}## is equal to the derivative.

Effectively this means that you can treat ##{du \over dx}## as any fraction.

For instance ##{dy \over du} \cdot {du \over dx} = {dy \over dx}## represents the chain rule that says that the derivative of y(u(x)) is equal to y'(u(x)) u'(x).
It's just a more intuitive notation.

Thanks I like Serena that makes sense except there is still one thing I am unclear on: I get that [itex]\frac{du}{dx}[/itex] represents how u changes in response to a tiny (as in approaching 0) change in x, but my prof told us that the dx in ∫f(x)dx was just part of the notation of an integral to remind us of its connection to the riemann sum. So it seems weird to me to mess with the dx since it is a part of the operator.

Can you maybe explain more in-depth the connection between u-substitution and the chain rule and how it allows us to cances dx's? I understand that we do u-substitution when we see a integrand with a function and its derivative present because the chain rule takes a function f(u) to its derivative by doing f'(u)*[itex]\frac{du}{dx}[/itex] so that means if we have a function with both u and [itex]\frac{du}{dx}[/itex] in it then we know its antiderivative must have had the chain rule used on it to get in that form. So the antiderivative (indefinite integral) of f'(u)*[itex]\frac{du}{dx}[/itex] is just going to be the antiderivative of f'(u).

But where does that all tie in with the concept of differentials that was mentioned in my book? Why does taking the chain rule in reverse require so much explanation and introduction of new terminology? I feel like I am missing something because it has to be more complicated than just: "if going from f(u) to its derivative is achieved by f'(u)*[itex]\frac{du}{dx}[/itex], then going from the derivative of f(u) back to f(u) must be achieved by ignoring [itex]\frac{du}{dx}[/itex] and just taking the antiderivative of f'(u).

Although that seems like it will mechanically yield f(u) as intended, what in the definitions and meaning of indefinite integrals allows us to do this? It just seems like a trick to me and I am having trouble relating it to the Riemann sum and limit stuff and what an integral actually is (sigh).
sorry for the novel but i want to be specific and I'm exasperated!

Char. Limit · Apr 6, 2012

Isn't the differential of a function of x defined as the differential of x multiplied by the derivative of said function with respect to x?*

*: In notation, du = u'(x) dx

I like Serena · Apr 7, 2012

nickadams said:

∫∫→

Thanks I like Serena that makes sense except there is still one thing I am unclear on: I get that [itex]\frac{du}{dx}[/itex] represents how u changes in response to a tiny (as in approaching 0) change in x, but my prof told us that the dx in ∫f(x)dx was just part of the notation of an integral to remind us of its connection to the riemann sum. So it seems weird to me to mess with the dx since it is a part of the operator.

An integral is the sum of inscribed rectangles.
Each rectangle has height f(x) and width dx.
Again dx is just a very small width, or increase in x.

But where does that all tie in with the concept of differentials that was mentioned in my book? Why does taking the chain rule in reverse require so much explanation and introduction of new terminology?

The problem is that the intuitive notion of infinitesimals is not considered mathematically solid.
There have been a number of attempts to define infinitesimals in a way that is mathematically solid, but those attempts have not been accepted by the mathematical community.
A derivative and an integral are defined using notation with limits.

Mathematically you're not supposed to talk about changes that are zero.
However, the intuitive notion of ratios of infinitesimals and multiplications and divisions with it always works.
That is because a derivative really is a fraction, just taken into the limit.

Although that seems like it will mechanically yield f(u) as intended, what in the definitions and meaning of indefinite integrals allows us to do this? It just seems like a trick to me and I am having trouble relating it to the Riemann sum and limit stuff and what an integral actually is (sigh).

Not a trick, just a more intuitive notion.
Physicists use it all the time, while mathematicians scoff on it.

jppike · Apr 9, 2012

I like Serena said:

The problem is that the intuitive notion of infinitesimals is not considered mathematically solid.
There have been a number of attempts to define infinitesimals in a way that is mathematically solid, but those attempts have not been accepted by the mathematical community.
A derivative and an integral are defined using notation with limits.

This isn't true. In the 1960's a logician named Abraham Robinson proved the mathematical validity of infinitesimals by constructing the Hyperreal Number System, an ordered field which strictly contains the Real Numbers, but also includes numbers which are smaller than every real number (and larger! and in between!). This can be done (though it is not how Robinson originally constructed it) by defining an ultrafilter, and taking equivalence classes of sequences of real numbers modulo this ultrafilter.

It has now developed into a branch of mathematics known as Non-Standard Analysis, and is completely mathematically rigorous, and if one takes the existence of infinitesimals to be valid (which it is), then one can use this method to teach elementary calculus without resorting to the unfamiliar notion of epsilons and deltas. In fact, this very thread demonstrates exactly why there are many mathematicians who believe that a first introduction to calculus should be done using Non-Standard Analysis, as it is significantly more intuitive.

If the OP is interested enough to take a look at how the calculus is developed with this intuitive approach, I direct him to the free online book:

Elementary Calculus: An Infinitesimal Approach by H. Jerome Keisler

Maybe it will be of some help

I like Serena · Apr 9, 2012

I believe you just confirmed my statement.

jppike · Apr 9, 2012

I like Serena said:

I believe you just confirmed my statement.

I'm sorry? There has been a mathematically rigorous definition of infinitesimals that has been accepted by the mathematical community...

I like Serena · Apr 9, 2012

Ah, my difference in opinion is then that I believe that it's not the world-wide mathematical community that has accepted it.
Not everyone considers it valid.

micromass · Apr 9, 2012

I like Serena said:

Ah, my difference in opinion is then that I believe that it's not the world-wide mathematical community that has accepted it.
Not everyone considers it valid.

Yes, they do. Robinson's work is valid, and I don't think you'll find a mathematician that doesn't think so.

The reason why we don't teach Robinson's approach is (I think):

1) It's difficult. Indeed, to define the hyperreals, one needs ultrafilters and Los' theorem and such stuff. This is far too difficult to give to students of real analysis or even calculus. OK, one could take the existence of hyperreals on faith, but mathematicians don't tend to like that. As it stands, you'll need some logic to rigorously do everything.
Another approach is the approach of the surreal numbers. These can easily be defined analogously to Dedekind cuts. But they are quite unintuitive.
The standard approach has its difficulties in epsilon-delta stuff. But that is much easier compared to the other approaches mentioned.

2) It's untraditional. All textbooks already treat real analysis with limits and not with infinitesimals. The limit-approach is far too entrenched now to change.
Furthermore, all research articles also use limits. So one will need to learn limits anyway! So there is no real reason to start doing things with infinitesimals.

Also, I would like to mention that things like dx have a well-defined meaning in differential geometry. So one does not need infinitesimals to give it meaning. The meaning of dx in differential geometry is made exactly to let [itex]dy=\frac{dy}{dx}dx[/itex] work out. Whether this gives any improvement to the existing theory, I don't know.
You can find more information about this in "Calculus on manifolds" by Spivak.

micromass · Apr 9, 2012

To the OP:

nickadams said:

Say we want to find ∫(-cosx)sinx dx

set u = -cosx
so du/dx = sinx

∫u [itex]\frac{du}{dx}[/itex]dxAnd then the dx's apparently cancel out? What is going on? I thought [itex]\frac{du}{dx}[/itex] meant lim Δx→0 [itex]\frac{u(x+Δx)-u(x)}{Δx}[/itex] and that the dx in the context of an integral was a representation of the Δx's from the lim n→∞ [itex]\sum^{n}_{i=1}[/itex]f([itex]x_{i^{*}}[/itex])Δx?

So how is this canceling justified with the above def'ns?

I should praise you for finding this out. Many people justify this rule as "just cancelling out the dx". This is of course NOT what is happening. Indeed, whatever people say [itex]\frac{du}{dx}[/itex] is NOT a fraction as it is not defined as a fraction. Therefore, cancelling out things is not allowed just like that, but it requires a rigorous reason.

So we wish to prove that

[tex]\int_a^b f(g(x))\frac{dg}{dx}dx=\int_{g(a)}^{g(b)} f(u)du[/tex]

Let F be a primitive of f. The left-hand side is equal to F(g(b))-F(g(a)) (by the fundamental theorem of calculus). By the chain rule, we have that

[tex]\frac{dF\circ g}{dx}(x)=\frac{dF}{du}(g(x))\frac{dg}{dx}(x)=f(g(x))\frac{dg}{dx}(x)[/tex]

Thus we see that [itex]F\circ g[/itex] is an antiderivative of [itex]f(g(x))\frac{dg}{dx}[/itex]. This means (fundamental theorem of calculus) that

[tex]\int_a^b f(g(x))\frac{dg}{dx}dx=F(g(b))-F(g(a))[/tex]

This proves that the two integrals in question are equal.

So we see that this result was not trivially "eliminating dx from both sides". It involved the chain rule and the fundamental theorem of calculus!

I like Serena · Apr 9, 2012

micromass said:

Indeed, whatever people say [itex]\frac{du}{dx}[/itex] is NOT a fraction as it is not defined as a fraction.

Hmm, I thought that
$$\frac{du}{dx}\overset{\textrm{def}}{=} \lim\limits_{Δx \to 0} \frac{Δu}{Δx}$$
by definition (with the proper definitions for u and x).
Since this is just another way of writing:
$$u'(x)\overset{\textrm{def}}{=}\lim\limits_{h \to 0} \frac{u(x+h)-u(x)}{h}$$

So it's not exactly a fraction, but a fraction with the denominator taken into the limit.

micromass · Apr 9, 2012

Yes, so it's the limit of a fraction and NOT a fraction.
And it is not obvious that the limit of a fraction should have the same properties as a fraction.

I like Serena · Apr 9, 2012

micromass said:

Yes, so it's the limit of a fraction and NOT a fraction.
And it is not obvious that the limit of a fraction should have the same properties as a fraction.

Doesn't it?
I think that as long as the denominator is not actually zero and the function is continuously differentiable, it does.

micromass · Apr 9, 2012

I like Serena said:

Doesn't it?
I think that as long as the denominator is not actually zero and the function is continuously differentiable, it does.

What does that have to do with it??

You can't say that [itex]\frac{df}{dx}=\frac{df}{du}\frac{du}{dx}[/itex] (for example) by saying that they are both limits of fractions and that thus the same thing most hold for those expressions. The argument is a bit more complicated than that.

I like Serena · Apr 9, 2012

micromass said:

You can't say that [itex]\frac{df}{dx}=\frac{df}{du}\frac{du}{dx}[/itex] (for example) by saying that they are both limits of fractions and that thus the same thing most hold for those expressions. The argument is a bit more complicated than that.

Let's see.
I've been puzzling at this before and I wondered if my argument was mathematically solid.Let ##f, u,## and ##f^*\overset{\textrm{def}}{=}f\circ u## be continuously differentiable functions from ##ℝ→ℝ##, let ##x \in ℝ##, and let the derivative of u be non-zero at x.

Let ##Δx>0## be small enough such that the derivative of u is non-zero in the interval [x, x+Δx].

Now define:
$$\begin{array}{lll}Δf^* & \overset{\textrm{def}}{=}& f^*(x+Δx) - f^*(x) \\
Δu & \overset{\textrm{def}}{=}& u(x+Δx) - u(x) \\
Δf & \overset{\textrm{def}}{=}& f(u(x)+Δu) - f(u(x))
\end{array}$$
Note that this means that Δu is non-zero.Then at point x we have:
$$\frac{df^*}{dx}\overset{\textrm{def}}{=} \lim\limits_{Δx \to 0} \frac{Δf^*}{Δx}=\lim\limits_{Δx \to 0} \frac{Δf^*}{Δu} \frac{Δu}{Δx} \qquad\qquad(1)$$

Since ##\frac{Δf\ ^*}{Δu}=\frac{(f\circ u)(x+Δx)-(f\circ u)(x)}{Δu}=\frac{f(u(x)+Δu)-f(u(x))}{Δu}## and ##Δu \to 0## as ##Δx \to 0##, it follows that:
$$\lim\limits_{Δx \to 0} \frac{Δf^*}{Δu}=\lim\limits_{Δu \to 0} \frac{Δf}{Δu} \qquad\qquad(2)$$

And thus at point x:
$$\frac{df^*}{dx}=\lim\limits_{Δx \to 0} \frac{Δf^*}{Δu} \frac{Δu}{Δx}=\lim\limits_{Δu \to 0} \frac{Δf}{Δu} \cdot \lim\limits_{Δx \to 0} \frac{Δu}{Δx}=\frac{df}{du} \frac{du}{dx} \qquad\qquad(3)$$
##\Box.##This is the long version, which I'd like to shorten.
Can you poke holes in it?

micromass · Apr 9, 2012

I think that proof is sound. But your proof isn't as general as you can do, of course, which is a problem imo.

jppike · Apr 10, 2012

micromass said:

Yes, they do. Robinson's work is valid, and I don't think you'll find a mathematician that doesn't think so.

The reason why we don't teach Robinson's approach is (I think):

1) It's difficult. Indeed, to define the hyperreals, one needs ultrafilters and Los' theorem and such stuff. This is far too difficult to give to students of real analysis or even calculus. OK, one could take the existence of hyperreals on faith, but mathematicians don't tend to like that. As it stands, you'll need some logic to rigorously do everything.
Another approach is the approach of the surreal numbers. These can easily be defined analogously to Dedekind cuts. But they are quite unintuitive.
The standard approach has its difficulties in epsilon-delta stuff. But that is much easier compared to the other approaches mentioned.

2) It's untraditional. All textbooks already treat real analysis with limits and not with infinitesimals. The limit-approach is far too entrenched now to change.
Furthermore, all research articles also use limits. So one will need to learn limits anyway! So there is no real reason to start doing things with infinitesimals.

Also, I would like to mention that things like dx have a well-defined meaning in differential geometry. So one does not need infinitesimals to give it meaning. The meaning of dx in differential geometry is made exactly to let [itex]dy=\frac{dy}{dx}dx[/itex] work out. Whether this gives any improvement to the existing theory, I don't know.
You can find more information about this in "Calculus on manifolds" by Spivak.

Before I begin my response I should mention that I am honestly not one of the advocates that infinitesimal calculus should be one's first introduction to the calculus. Indeed, I don't believe I have a sound enough understanding of Non Standard Analysis yet to really make such a decision. I'm just playing devil's advocate:

1) If you're going to argue that in terms of mathematical logic, we have to take something on faith in order to begin a study of the infinitesimal calculus, then you should basically be arguing against all of introductory mathematics. Indeed, in a first introduction to a number of courses we take many things on faith that are deeply rooted into mathematical logic; consider the axiom of choice and induction as the most obvious examples. How often is Zorn's Lemma implicitly used in a first calculus course? Furthermore, in a first course on calculus one is expected to take a lot of results for granted anyways! I recall that in my first calculus course we were expected to just accept, for example, the Intermediate Value Theorem. It wasn't until a year later in analysis that I saw a rigorous proof of it for the Real numbers. In fact, when one is introduced to the calculus for the very first time (say, in high school), one is rarely introduced using Weierstrass's epsilon-delta formalism. A limit is "as a gets arbitrarily close to a, f(x) gets arbitrarily close to L". How is accepting this definition any better than accepting an infinitesimal?

2) Non-Standard Analysis is already starting to produce enough applications that I would suggest that a modern mathematician ought to be at least somewhat familiar with it, if not have studied a semester course in it. In such a case, why is it better to learn standard analysis and then have to try and convert epsilon-delta language into infinitesimal language than the other way around? Indeed, the main argument is that it's not; that once you have learned the intuitive approach using infinitesimals, the epsilon-delta formulation is easy to transition to. At any rate the argument that we shouldn't change the way we do it because that's the way we've been doing it isn't really much of an argument, is it? If one can demonstrate that being introduced to NSA makes for a deeper understanding of the concepts of analysis, then that's the way it should be taught, regardless of how we have been teaching it so far.

micromass · Apr 10, 2012

jppike said:

Before I begin my response I should mention that I am honestly not one of the advocates that infinitesimal calculus should be one's first introduction to the calculus. Indeed, I don't believe I have a sound enough understanding of Non Standard Analysis yet to really make such a decision. I'm just playing devil's advocate:

I actually detest playing devil's advocate. I prefer it if people would just say their opinion, instead of trying to set up a pointless argument. So if it's going to be a devil's advocate thing, then this will be my last reply on the topic.

1) If you're going to argue that in terms of mathematical logic, we have to take something on faith in order to begin a study of the infinitesimal calculus, then you should basically be arguing against all of introductory mathematics. Indeed, in a first introduction to a number of courses we take many things on faith that are deeply rooted into mathematical logic; consider the axiom of choice and induction as the most obvious examples. How often is Zorn's Lemma implicitly used in a first calculus course? Furthermore, in a first course on calculus one is expected to take a lot of results for granted anyways! I recall that in my first calculus course we were expected to just accept, for example, the Intermediate Value Theorem. It wasn't until a year later in analysis that I saw a rigorous proof of it for the Real numbers. In fact, when one is introduced to the calculus for the very first time (say, in high school), one is rarely introduced using Weierstrass's epsilon-delta formalism. A limit is "as a gets arbitrarily close to a, f(x) gets arbitrarily close to L". How is accepting this definition any better than accepting an infinitesimal?

Consider me old-fashioned, but what you accept on the standard approach is much more intuitive than what you accept in the infinitesimal approach. Accepting induction and the axiom of choice are quite obvious things to accept. When I first saw induction, I thought it was very obvious. The same with the axiom of choice (it was only later that I found out that it was problematic). The concept of limit is also an obvious one.
On the other hand, I find infinitesimals less intuitive. I mean: the existence of a number e such that 0<e<1/n for all n. I can hear the questions coming from a high school student:

What's the decimal representation of e??
It doesn't have any.
How can a number not have any representation, is it a fake number??
Uuuh...
Oh, I get it, e=1-0.99999... no??
Hmmm...

In a calculus course, this might be ok. But what in a real analysis course?? This is supposed to be the foundation of analysis, so it's supposed to have a construction of the reals and the hyperreals. This is too hard to do.

OK, you always take things on faith, but not being able to construct the space you're working with is a big no-no.

I realize I am talking in a point-of-view of a standard analyst. Perhaps if I encountered hyperreals in high school and before, then I would talk differently.

2) Non-Standard Analysis is already starting to produce enough applications that I would suggest that a modern mathematician ought to be at least somewhat familiar with it, if not have studied a semester course in it.

To be honest, I never encountered an application of it. I just think it's a neat concept.

In such a case, why is it better to learn standard analysis and then have to try and convert epsilon-delta language into infinitesimal language than the other way around? Indeed, the main argument is that it's not; that once you have learned the intuitive approach using infinitesimals, the epsilon-delta formulation is easy to transition to. At any rate the argument that we shouldn't change the way we do it because that's the way we've been doing it isn't really much of an argument, is it?

It's a huge argument. ALL the textbooks are written in standard language. So a lot of wonderful books like Rudin, Pugh, Spivak, etc. become obsolete. It will be a huge undertaking to correct the books or to produce new ones.

And the standard approach needs to be learned anyway. Almost every research article is written in standard language, while almost no article is written with infinitesimals. So what's the point??
You really have the choice between "teaching the standard approach" and "teaching the standard approach AND infinitesimals". That last thing requires extra time, extra books, perhaps confused students. And that for almost no benefits.

If one can demonstrate that being introduced to NSA makes for a deeper understanding of the concepts of analysis, then that's the way it should be taught, regardless of how we have been teaching it so far.

I doubt that it really makes for a deeper understand of the concepts of analysis. If we teach both standard analysis AND hyperreals, then we need to divide the course in half and spend less time on the more important concepts. That will actually reduce the understanding of the concepts.

The same discussion happens with replacing [itex]\pi[/itex] with [itex]\tau=2\pi[/itex]. It's a useless undertakings. It doesn't matter if [itex]\tau[/itex] is a more intuitive concept. It's too late to change it now.

If I could go back in time and replace [itex]\pi[/itex] with [itex]\tau[/itex] the moment it was invented, then I would do so. But it's too deeply rooted in the community now. The same thing with hyperreals (ignoring the fact that hyperreals require a lot of very very nontrivial logic).

I like Serena · Apr 10, 2012

micromass said:

I think that proof is sound. But your proof isn't as general as you can do, of course, which is a problem imo.

I think it is general enough for the problem statement of the OP who has a continuous differentiable function from ##ℝ→ℝ##.
Indeed, I think this is the only type of function you'll see in high school.
My more general proposition (without proof) is:

Let ##u: I→ℝ## be a continuous differentiable function on an open interval ##I \subset ℝ## around ##x \in ℝ##, with ##u(b)-u(a)≠0## for any ##a, b \in I##.
Let ##f: u(I)→ℝ## be a continuously differentiable function on u(I).
And let ##f^* \equiv f\circ u##.

Then ##\frac{df\ ^*}{dx}, \frac{df}{du}, \frac{du}{dx}, \frac{dx}{du}## all behave as fractions on interval I.In particular this implies the chain rule: ##\frac{df\ ^*}{dx} = \frac{df}{du} \frac{du}{dx}##.
And it also implies the inverse derivative rule: ##\frac{du}{dx} = {1 \over \frac{dx}{du}}##.
I am wondering if this is too general, but I believe it is true.
Can you think of a counter example?

micromass · Apr 10, 2012

I like Serena said:

Let ##f: u(I)→ℝ## be a continuously differentiable function on u(I).

Your u(I) is not necessarily open, so differentiability might not make any sense.

Then ##\frac{df\ ^*}{dx}, \frac{df}{du}, \frac{du}{dx}, \frac{dx}{du}## all behave as fractions on interval I.

What does this statement even mean? How would you rigorize it? And even more important: how would you prove it.

I like Serena · Apr 10, 2012

micromass said:

Your u(I) is not necessarily open, so differentiability might not make any sense.

It would be, since u is continuous and u(a)-u(b)≠0, u has to be monotonous, therefore invertible, therefore the image of an open set is also an open set.

What does this statement even mean? How would you rigorize it? And even more important: how would you prove it.

Yes. I realize it's not rigorous.
The only way I know to make it more rigorous, is to create a series of conditions and show that it's true for each condition (like it is for the chain rule and for the inverse derivative).
Then it would be rigorous for each of these conditions.

However, I wanted to make a more general statement and a list of conditions would defeat my purpose.
I'm still wondering if there's not a way to formulate something that's mathematically acceptable, without resorting to the complexities of Robinson's theorems.
Or at least to find a couple of counter examples.

I certainly believe that teaching Leibniz's notation in high school would be better than teaching Lagrange's notation (f').
Especially since I believe it is the most common notation actually used in practical sciences, and even in theoretical physics (and therefore not really untraditional).
I think it's mostly high school and mathematics that stick to Lagrange's notation (f'), but we would need a poll to make sure. :)

jppike · Apr 11, 2012

micromass said:

I actually detest playing devil's advocate. I prefer it if people would just say their opinion, instead of trying to set up a pointless argument. So if it's going to be a devil's advocate thing, then this will be my last reply on the topic.

So the argument is not pointless if I don't announce an opinion outright? Sorry this just doesn't seem to make sense to me, the point of debate is to discuss a topic from various points of view, how do my deep and true feelings of the matter change that? I'm taking a specific side either way, and I will discuss it's merits as best as I can. Would it help if I told you I was firmly of the belief that we should study NSA first? If you don't wish to continue the discussion that is fine of course, but this just seems like a strange reason to me.

Consider me old-fashioned, but what you accept on the standard approach is much more intuitive than what you accept in the infinitesimal approach. Accepting induction and the axiom of choice are quite obvious things to accept. When I first saw induction, I thought it was very obvious. The same with the axiom of choice (it was only later that I found out that it was problematic).

Transfinite induction is also very problematic, and there are many mathematicians who don't believe in it. The unfortunate thing is that it is powerful, and the proofs of many important theorems require it. I choose to believe in induction aswell, but the point is that while it may seem obvious, it is certainly not.

In a calculus course, this might be ok. But what in a real analysis course?? This is supposed to be the foundation of analysis, so it's supposed to have a construction of the reals and the hyperreals. This is too hard to do.

OK, you always take things on faith, but not being able to construct the space you're working with is a big no-no.

If you sweep some of the more technical underlying logic under the rug, as you would be doing anyways, you certainly can construct the hyperreal numbers in a way that is as convincing as anything you would see in an introductory analysis course. See any of the introductory books on NSA, Goldblatt for example. The only real logic he explicitly uses in the construction is Zorn's Lemma, which, as we agree, is to be believed anyways. This was my point about the fact that we sweep logic under the rug in an analysis course regardless, I realize that I didn't make it clear but yes I am talking about being able to construct the hyperreals in a convincing way.

To be honest, I never encountered an application of it. I just think it's a neat concept.

There are many applications popping up, even in physics. In fact, if you were ever interested you should look through a book called "Non-Standard Methods in Stochastic Analysis and Mathematical Physics", the authors get some results in these fields that are not all re-workings of known results into the language of NSA (some are), but in fact some new results that came about through the study of Loeb Measures and other NSA methods.

It's a huge argument. ALL the textbooks are written in standard language. So a lot of wonderful books like Rudin, Pugh, Spivak, etc. become obsolete. It will be a huge undertaking to correct the books or to produce new ones.

And the standard approach needs to be learned anyway.

If the standard approach needs to be learned anyway, how are the books listed above becoming obsolete?

You really have the choice between "teaching the standard approach" and "teaching the standard approach AND infinitesimals". That last thing requires extra time, extra books, perhaps confused students. And that for almost no benefits.

This argument is predicated on the idea that NSA has nothing to offer which standard analysis does not. Anyone who argues the case for NSA does not believe that, nor do I, for I don't believe that they are equivalent methods. The extension to non-standard set theory has already been able to produce some previously unknown results, and there seems to be no reason to believe it couldn't continue to do so. So the argument is that mathematicians should be exposed to both.

I doubt that it really makes for a deeper understand of the concepts of analysis. If we teach both standard analysis AND hyperreals, then we need to divide the course in half and spend less time on the more important concepts. That will actually reduce the understanding of the concepts.

Surely they wouldn't simultaneously develop both language in the same course. But this is a very good point, fitting additional material into a mathematics education would certainly prove quite problematic.

The same discussion happens with replacing [itex]\pi[/itex] with [itex]\tau=2\pi[/itex]. It's a useless undertakings. It doesn't matter if [itex]\tau[/itex] is a more intuitive concept. It's too late to change it now.

If I could go back in time and replace [itex]\pi[/itex] with [itex]\tau[/itex] the moment it was invented, then I would do so. But it's too deeply rooted in the community now. The same thing with hyperreals (ignoring the fact that hyperreals require a lot of very very nontrivial logic).

Again, this is assuming that the two methods can produce identical results, but for a difference in notation. I don't think anybody advocating the fact that students should be exposed to non-standard analysis believe this is the case. Of course, they seem to be equivalent under the most common foundations of set theory (Zermelo–Fraenkel set theory), but as I said it is the extended non-standard set theory that is capable of providing results.

Congruent · Apr 12, 2012

How often is Zorn's Lemma implicitly used in a first calculus course?

I can't come up with a single theorem appropriate to discuss in a first calculus course that requires using Zorn's lemma or any transfinite method. Could you provide an example?

micromass · Apr 12, 2012

Congruent said:

I can't come up with a single theorem appropriate to discuss in a first calculus course that requires using Zorn's lemma or any transfinite method. Could you provide an example?

For example, the equivalence of the statement
- [itex]f:\mathbb{R}\rightarrow \mathbb{R}[/itex] is continuous at a point x
- For every sequence [itex](x_n)_n[/itex] that converges to x, we have that [itex]f(x_n)\rightarrow f(x)[/itex]

requires a form of the axiom of choice.

Congruent · Apr 12, 2012

micromass said:

For example, the equivalence of the statement
- [itex]f:\mathbb{R}\rightarrow \mathbb{R}[/itex] is continuous at a point x
- For every sequence [itex](x_n)_n[/itex] that converges to x, we have that [itex]f(x_n)\rightarrow f(x)[/itex]

requires a form of the axiom of choice.

Really? I imagine it has to come up in if for every sequence [itex]\{x_n\}_{n=1}^{\infty}[/itex] which converges to x, it follows that [itex]f(x_n) \to f(x)[/itex], then f is continuous at x, seeing as the other direction is pretty straight forward. But I imagine to prove this, you'd do something like suppose f is not continuous at a point x. So there's an [itex]\epsilon > 0[/itex] so that for any [itex]\delta > 0[/itex], there's a point [itex]y \in (x - \delta, x + \delta)[/itex] so that [itex]|f(x) - f(y)| \ge \epsilon[/itex]. So for each n, select such a [itex]x_n \in (x - \frac{1}{n}, x + \frac{1}{n})[/itex]. By construction, [itex]x_n \to x[/itex], but [itex]|f(x_n) - f(x)| \ge \epsilon[/itex] for every n, so [itex]f(x_n) \not\to f(x)[/itex].

Can you show me where choice got used? An arbitrary choice over the sets was never required, because there's a guarantee of the existence of a particular point in each interval. Or am I wrong about that last bit?

micromass · Apr 12, 2012

Congruent said:

So for each n, select such a [itex]x_n \in (x - \frac{1}{n}, x + \frac{1}{n})[/itex].

This is where choice is used. You can never give exact definition of the [itex]x_n[/itex]. This is why it requires choice.

Can you show me where choice got used? An arbitrary choice over the sets was never required, because there's a guarantee of the existence of a particular point in each interval. Or am I wrong about that last bit?

That's not what choice is about. Of course you can pick a point in each interval. But you are required to make the choice simultaneously.

Choice is equivalent to saying that for each collection of nonempty sets [itex]X_i,i\in I[/itex] we have that

[tex]\prod_{i\in I} X_i[/tex]

is not-empty. Of course, since the [itex]X_i[/itex] are nonempty, then we can pick an [itex]x\in X_i[/itex]. But the problem is to make a simultaneous choice. More rigorously: if you don't want to use the axiom of choice, then you should be able to write down the set explicitly.

Congruent · Apr 12, 2012

micromass said:

This is where choice is used. You can never give exact definition of the [itex]x_n[/itex]. This is why it requires choice.
That's not what choice is about. Of course you can pick a point in each interval. But you are required to make the choice simultaneously.

Choice is equivalent to saying that for each collection of nonempty sets [itex]X_i,i\in I[/itex] we have that

[tex]\prod_{i\in I} X_i[/tex]

is not-empty. Of course, since the [itex]X_i[/itex] are nonempty, then we can pick an [itex]x\in X_i[/itex]. But the problem is to make a simultaneous choice. More rigorously: if you don't want to use the axiom of choice, then you should be able to write down the set explicitly.

I see. But I'm only required to make countably many such choices, no? So wouldn't this still only be an instance of countable choice?

Sorry, I'm entirely self-taught, and most of these things are discussed without any mention of choice being used. I suppose mostly because it's so non-controversial.

micromass · Apr 12, 2012

Congruent said:

I see. But I'm only required to make countably many such choices, no? So wouldn't this still only be an instance of countable choice?

Yes, indeed, it will only be countable choice. Even less: it only requires countable choice on subsets of the reals.

Most things in analysis don't need the full axiom of choice. You should rarely need more than the axiom of dependent choice. The Hahn-Banach theorems are a notable exception which needs the ultrafilter lemma.

But the point was that not everything in calculus follows from the ZF axioms and that sometimes a form of choice is needed (although not the full axiom of choice).

micromass · Apr 12, 2012

Here is a nice paper that describes where choice principles are needed in elementary analysis: http://dml.cz/dmlcz/118951

Why can you cancel out the dx in u-substitution?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Why ##a^0=1##?

Undergrad Finding the minimum distance between two curves

High School Straightforward integration…

High School Arc Length for Hyperbolic Sin

Undergrad Ambiguity of the term "indefinite integral"

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect