- 122

- 10

- Summary
- Although the proof of the substitution formula is easy enough, I feel like the formula hides a much more intuitive yet rigorous explanation. The way it is written suggests some direct connection between the 2 sides of the equation, and I'm trying to express that connection in an explicit mathematical way.

Hello everyone.

First off, I'm sorry if this post is excessively long, but after tackling this for so many hours I've decided I could use some help, and I need to show everything I did to express exactly what I wish to do. Also, to be clear, this post deals with integration by substitution. Now to the matter at hand.

The substitution formula as I know it goes as follows:

If ##f## and ##g'## are continuous, then:

$$\int_{g(a)}^{g(b)}f(u) \, du = \int_a^bf(g(x))\cdot g'(x) \, dx$$

The standard proof of the formula is easy enough, using the chain rule and the FToC. However, I somewhat feel like that is a roundabout way of proving it that doesn't really express the meaning of the formula. The formula itself, particularly when written in Leibniz notation, seems to show something more elegant about the connection between the sides of the equation. If I view it at the most basic intuitive level, the integration notation ##\int_a^bf(x) \, dx## expresses the summation of the products of intervals of ever decreasing size, ##t_i-t_{i-1}##, and the value of ##f## somewhere on those intervals, in other words the value of Riemann sums comprised of smaller and smaller intervals. Viewing it that way, the left side shows the summation of ##f## when the variable is ##g(x)##, over (ever decreasing) intervals of said variable, ##g(t_i)-g(t_{i-1})##, only ##g(x)## is denoted simply as ##u##. The right side shows the exact same summation, only it divides and then multiplies each product by ##t_i-t_{i-1}## to get ##\frac {g(t_i)-g(t_{i-1})} {t_i-t_{i-1}} \cdot (t_i-t_{i-1})## instead of ##g(t_i)-g(t_{i-1})##. But since the integral is essentially the limit as the intervals get smaller and smaller, this becomes ##g'(x)dx##.

Looking at it this way, the formula seems to express something almost trivial, the same expression on both sides, only on the right side each part of the expression is divided and then multiplied by the same (nonzero) value. Having noticed that, I sought to find out if there is any way to express that intuitive interpretation in a rigorous and explicit way using the basic concepts like Riemann sums and ##\epsilon - \delta## arguments, to prove the formula in a way that fully encapsulates the intuitive interpretation, rather than the conventional (although much simpler) way that seems to miss this.

However, so far I succeeded in doing this only when ##g## is one-one on ##[a,b]## (or when ##a=b## but that's redundant). The proof I found for this case goes as follows:

Assumptions: ##f## and ##g'## are continuous, ##g## is one-one on ##[a,b]##.

Since ##g'## exists, ##g## is continuous, and since it is one-one on ##[a,b]##, it is either increasing or decreasing on that interval, say the first. Let ##P=\{t_0,...,t_n\}## be any partition of ##[a,b]##, then ##P## generates a unique partition ##P'=\{g(t_0),...,g(t_n)\}## of ##[g(a),g(b)]## and vice-versa (Since it is one-one and increasing). Furthermore, for any ##x_i \in [g(t_{i-1}),g(t_i)]##, ##x_i=g(y_i)## for some unique ##y_i \in [t_{i-1},t_i]##. Therefore, for any partition ##P'=\{s_0,...,s_n\}## of ##[g(a),g(b)]##, and any choices of ##x_i \in [s_{i-1},s_i]##, we have:

$$\sum_{i=1}^n f(x_i)(s_i-s_{i-1})=\sum_{i=1}^n f(g(y_i))(g(t_i)-g(t_{i-1}))=\sum_{i=1}^n f(g(y_i)) \frac {(g(t_i)-g(t_{i-1}))} {t_i-t_{i-1}} (t_i-t_{i-1})$$

(This is pretty much exactly the intuition I had with the formula, all that is left to do is show that as the intervals become smaller, the sides of this equation approach the corresponding sides of the original formula). Now, for every ##\epsilon > 0##:

1. By the basic property of Riemann sums, ##\exists \delta' > 0## such that, if ##P'=\{s_0,...,s_n\}## is a partition of ##[g(a),g(b)]## with ##s_i-s_{i-1}<\delta'## for ##1 \leq i \leq n##, then for any choices of ##x_i\in[s_{i-1},s_i]##, $$\left| \sum_{i=1}^n f(x_i)(s_i-s_{i-1})-\int_{g(a)}^{g(b)}f(u) \, du \right| <\frac \epsilon 2$$

2. Since ##g## is continuous, it is uniformly continuous on ##[a,b]##, therefore ##\exists \delta_1 > 0## such that, if ##t_i-t_{i-1}<\delta_1##, then ##g(t_i)-g(t_{i-1})<\delta'##

3. Since ##(f \circ g)## and ##g'## are continuous, it can be shown that ##\exists \delta_2>0## such that if ##P=\{t_0,...,t_n\}## is a partition of ##[a,b]## with ##t_i-t_{i-1}<\delta_2## for ##1 \leq i \leq n##, then for any choices of ##y_i,z_i \in [t_{i-1},t_i]##,$$\left| \sum_{i=1}^n f(g(y_i)) g'(z_i) (t_i-t_{i-1})- \int_a^bf(g(x))\cdot g'(x) \, dx \right|<\frac \epsilon 2$$

Let ##\delta = \min(\delta_1,\delta_2)##. Suppose ##P=\{t_0,...,t_n\}## is a partition of ##[a,b]## with ##t_i-t_{i-1}<\delta##, and let ##P'=\{g(t_1),...,g(t_n)\}=\{s_1...,s_n\}## be the corresponding partition of ##[g(a),g(b)]##. By the Mean Value Theorem, ##\frac {(g(t_i)-g(t_{i-1}))} {t_i-t_{i-1}}=g'(z_i)## for some ##z_i \in (t_i,t_{i-1})##, and by (2.), ##s_i-s_{i-1}<\delta'## for ##1 \leq i \leq n##. Therefore, by (1.): $$\left|\int_{g(a)}^{g(b)}f(u) \, du - \sum_{i=1}^n f(x_i)(s_i-s_{i-1})\right| <\frac \epsilon 2$$ and since $$\sum_{i=1}^n f(x_i)(s_i-s_{i-1})=\sum_{i=1}^n f(g(y_i)) \frac {(g(t_i)-g(t_{i-1}))} {t_i-t_{i-1}} (t_i-t_{i-1})=\sum_{i=1}^n f(g(y_i)) g'(z_i) (t_i-t_{i-1})$$ for ##y_i,z_i \in [t_{i-1},t_i]##, by (3.): $$\left| \sum_{i=1}^n f(x_i)(s_i-s_{i-1})-\int_a^bf(g(x))\cdot g'(x) \, dx \right|<\frac \epsilon 2$$ Thus, we have $$\left|\int_{g(a)}^{g(b)}f(u) \, du - \int_a^bf(g(x))\cdot g'(x) \, dx \right|<\epsilon$$

Since this is true for all ##\epsilon > 0##, we must have:

$$\int_{g(a)}^{g(b)}f(u) \, du = \int_a^bf(g(x))\cdot g'(x) \, dx$$

The proof for when ##g## is decreasing is similar with some minor tweaks, or one can consider the functions ##-g(x)## and ##f(-x)## and use the above proof. Q.E.D.

So, assuming I haven't messed anything up, I've shown that the intuition matches the actual math for the case of a one-one ##g##. However, things broke down for me once ##g## is no longer one-one, especially since the isomorphism between the partitions of ##[a,b]## and partitions of ##[g(a),g(b)]## no longer exists. The entire proof relies on the idea that if the distance between the points in one partition approaches 0, the distance between the corresponding points in the other partition approaches 0 as well, but I can't seem to make that happen.

If I start with a partition ##P=\{t_0,...,t_n\}## of ##[a,b]##, I can create from it a partition ##P'=\{s_0,...,s_n\}## of ##[g(a),g(b)]##, using least upper bounds and the Intermediate Value Theorem for example, and even make the ##s_i##'s get close to each other as the ##t_i##'s get close to each other, but there is no guarantee that if 2 points of ##P'## are close, then the corresponding points of ##P## are close. For example, if ##g(x)=x^3-x## and ##[a,b]=[-2,2]##, so ##[g(a),g(b)]=[-6,6]##. Say I take any partition ##P=\{t_0,...,t_n\}## of ##[a,b]## and create from it a partition ##P'=\{s_0,...,s_n\}## of ##[g(a),g(b)]## where each point in ##P'## is the value of ##g## at some point in ##P##. Let ##t_{k_i}## be the point in ##P## such that ##s_i=g(t_{k_i})##. Suppose I create ##P'## so that ##k_j>k_q## for ##j>q##, let ##t_{k_m}## be the smallest ##t_{k_i}## satisfying ##t_{k_m}\geq -\sqrt {\frac 1 3}##. If ##t_{k_m} \leq 0##, then clearly ##t_{k_{m+1}}>1## since ##s_{m+1}>s_m##, so ##|t_{k_{m+1}}-t_{k_m}|>1##, and if ##t_{k_m}>0## then ##t_{k_{m-1}}<-1##, so ##|t_{k_m}-t_{k_{m-1}}|>1##. So no matter how small I make the intervals comprising ##P##, there will always be at least 1 interval in ##P'## such that the distance between the corresponding points in ##P## is greater than 1. However, if I don't create ##P'## in this manner, then there is no correlation between 2 points in ##P'## becoming close and and their corresponding points in ##P## becoming close, since the order of the points in ##P'## is all jumbled up...

On the other hand, if I begin with a partition of ##[g(a),g(b)]## and try to create from it a partition of ##[a,b]##, the situation becomes even more dire, as I cant even guarantee that the ##t_i##'s will become close as the ##s_i##'s become close, as immediately evident by cases like the function ##g(x)=(x-1)^2-1## considered on the interval ##[0,2.1]##.

So this is where I've been stuck for quite a while now, until I've finally decided to ask for some help. At this point there are several things I need to clarify:

1. Is my intuition regarding the formula actually justified and founded or did I interpret this in the completely wrong way?

2. Was my proof of the case for a one-one ##g## actually correct?

3. Is there any way to extend this proof to the general case?

4. If not, is there any proof of the formula that explicitly demonstrates the intuition, or am I just wasting my time? (Hell, maybe the original proof does this and I just didn't notice).

To anyone who took the time to read through this entire thing, thank you so much, and I look forward to hearing your insight.

First off, I'm sorry if this post is excessively long, but after tackling this for so many hours I've decided I could use some help, and I need to show everything I did to express exactly what I wish to do. Also, to be clear, this post deals with integration by substitution. Now to the matter at hand.

The substitution formula as I know it goes as follows:

If ##f## and ##g'## are continuous, then:

$$\int_{g(a)}^{g(b)}f(u) \, du = \int_a^bf(g(x))\cdot g'(x) \, dx$$

The standard proof of the formula is easy enough, using the chain rule and the FToC. However, I somewhat feel like that is a roundabout way of proving it that doesn't really express the meaning of the formula. The formula itself, particularly when written in Leibniz notation, seems to show something more elegant about the connection between the sides of the equation. If I view it at the most basic intuitive level, the integration notation ##\int_a^bf(x) \, dx## expresses the summation of the products of intervals of ever decreasing size, ##t_i-t_{i-1}##, and the value of ##f## somewhere on those intervals, in other words the value of Riemann sums comprised of smaller and smaller intervals. Viewing it that way, the left side shows the summation of ##f## when the variable is ##g(x)##, over (ever decreasing) intervals of said variable, ##g(t_i)-g(t_{i-1})##, only ##g(x)## is denoted simply as ##u##. The right side shows the exact same summation, only it divides and then multiplies each product by ##t_i-t_{i-1}## to get ##\frac {g(t_i)-g(t_{i-1})} {t_i-t_{i-1}} \cdot (t_i-t_{i-1})## instead of ##g(t_i)-g(t_{i-1})##. But since the integral is essentially the limit as the intervals get smaller and smaller, this becomes ##g'(x)dx##.

Looking at it this way, the formula seems to express something almost trivial, the same expression on both sides, only on the right side each part of the expression is divided and then multiplied by the same (nonzero) value. Having noticed that, I sought to find out if there is any way to express that intuitive interpretation in a rigorous and explicit way using the basic concepts like Riemann sums and ##\epsilon - \delta## arguments, to prove the formula in a way that fully encapsulates the intuitive interpretation, rather than the conventional (although much simpler) way that seems to miss this.

However, so far I succeeded in doing this only when ##g## is one-one on ##[a,b]## (or when ##a=b## but that's redundant). The proof I found for this case goes as follows:

Assumptions: ##f## and ##g'## are continuous, ##g## is one-one on ##[a,b]##.

Since ##g'## exists, ##g## is continuous, and since it is one-one on ##[a,b]##, it is either increasing or decreasing on that interval, say the first. Let ##P=\{t_0,...,t_n\}## be any partition of ##[a,b]##, then ##P## generates a unique partition ##P'=\{g(t_0),...,g(t_n)\}## of ##[g(a),g(b)]## and vice-versa (Since it is one-one and increasing). Furthermore, for any ##x_i \in [g(t_{i-1}),g(t_i)]##, ##x_i=g(y_i)## for some unique ##y_i \in [t_{i-1},t_i]##. Therefore, for any partition ##P'=\{s_0,...,s_n\}## of ##[g(a),g(b)]##, and any choices of ##x_i \in [s_{i-1},s_i]##, we have:

$$\sum_{i=1}^n f(x_i)(s_i-s_{i-1})=\sum_{i=1}^n f(g(y_i))(g(t_i)-g(t_{i-1}))=\sum_{i=1}^n f(g(y_i)) \frac {(g(t_i)-g(t_{i-1}))} {t_i-t_{i-1}} (t_i-t_{i-1})$$

(This is pretty much exactly the intuition I had with the formula, all that is left to do is show that as the intervals become smaller, the sides of this equation approach the corresponding sides of the original formula). Now, for every ##\epsilon > 0##:

1. By the basic property of Riemann sums, ##\exists \delta' > 0## such that, if ##P'=\{s_0,...,s_n\}## is a partition of ##[g(a),g(b)]## with ##s_i-s_{i-1}<\delta'## for ##1 \leq i \leq n##, then for any choices of ##x_i\in[s_{i-1},s_i]##, $$\left| \sum_{i=1}^n f(x_i)(s_i-s_{i-1})-\int_{g(a)}^{g(b)}f(u) \, du \right| <\frac \epsilon 2$$

2. Since ##g## is continuous, it is uniformly continuous on ##[a,b]##, therefore ##\exists \delta_1 > 0## such that, if ##t_i-t_{i-1}<\delta_1##, then ##g(t_i)-g(t_{i-1})<\delta'##

3. Since ##(f \circ g)## and ##g'## are continuous, it can be shown that ##\exists \delta_2>0## such that if ##P=\{t_0,...,t_n\}## is a partition of ##[a,b]## with ##t_i-t_{i-1}<\delta_2## for ##1 \leq i \leq n##, then for any choices of ##y_i,z_i \in [t_{i-1},t_i]##,$$\left| \sum_{i=1}^n f(g(y_i)) g'(z_i) (t_i-t_{i-1})- \int_a^bf(g(x))\cdot g'(x) \, dx \right|<\frac \epsilon 2$$

Let ##\delta = \min(\delta_1,\delta_2)##. Suppose ##P=\{t_0,...,t_n\}## is a partition of ##[a,b]## with ##t_i-t_{i-1}<\delta##, and let ##P'=\{g(t_1),...,g(t_n)\}=\{s_1...,s_n\}## be the corresponding partition of ##[g(a),g(b)]##. By the Mean Value Theorem, ##\frac {(g(t_i)-g(t_{i-1}))} {t_i-t_{i-1}}=g'(z_i)## for some ##z_i \in (t_i,t_{i-1})##, and by (2.), ##s_i-s_{i-1}<\delta'## for ##1 \leq i \leq n##. Therefore, by (1.): $$\left|\int_{g(a)}^{g(b)}f(u) \, du - \sum_{i=1}^n f(x_i)(s_i-s_{i-1})\right| <\frac \epsilon 2$$ and since $$\sum_{i=1}^n f(x_i)(s_i-s_{i-1})=\sum_{i=1}^n f(g(y_i)) \frac {(g(t_i)-g(t_{i-1}))} {t_i-t_{i-1}} (t_i-t_{i-1})=\sum_{i=1}^n f(g(y_i)) g'(z_i) (t_i-t_{i-1})$$ for ##y_i,z_i \in [t_{i-1},t_i]##, by (3.): $$\left| \sum_{i=1}^n f(x_i)(s_i-s_{i-1})-\int_a^bf(g(x))\cdot g'(x) \, dx \right|<\frac \epsilon 2$$ Thus, we have $$\left|\int_{g(a)}^{g(b)}f(u) \, du - \int_a^bf(g(x))\cdot g'(x) \, dx \right|<\epsilon$$

Since this is true for all ##\epsilon > 0##, we must have:

$$\int_{g(a)}^{g(b)}f(u) \, du = \int_a^bf(g(x))\cdot g'(x) \, dx$$

The proof for when ##g## is decreasing is similar with some minor tweaks, or one can consider the functions ##-g(x)## and ##f(-x)## and use the above proof. Q.E.D.

So, assuming I haven't messed anything up, I've shown that the intuition matches the actual math for the case of a one-one ##g##. However, things broke down for me once ##g## is no longer one-one, especially since the isomorphism between the partitions of ##[a,b]## and partitions of ##[g(a),g(b)]## no longer exists. The entire proof relies on the idea that if the distance between the points in one partition approaches 0, the distance between the corresponding points in the other partition approaches 0 as well, but I can't seem to make that happen.

If I start with a partition ##P=\{t_0,...,t_n\}## of ##[a,b]##, I can create from it a partition ##P'=\{s_0,...,s_n\}## of ##[g(a),g(b)]##, using least upper bounds and the Intermediate Value Theorem for example, and even make the ##s_i##'s get close to each other as the ##t_i##'s get close to each other, but there is no guarantee that if 2 points of ##P'## are close, then the corresponding points of ##P## are close. For example, if ##g(x)=x^3-x## and ##[a,b]=[-2,2]##, so ##[g(a),g(b)]=[-6,6]##. Say I take any partition ##P=\{t_0,...,t_n\}## of ##[a,b]## and create from it a partition ##P'=\{s_0,...,s_n\}## of ##[g(a),g(b)]## where each point in ##P'## is the value of ##g## at some point in ##P##. Let ##t_{k_i}## be the point in ##P## such that ##s_i=g(t_{k_i})##. Suppose I create ##P'## so that ##k_j>k_q## for ##j>q##, let ##t_{k_m}## be the smallest ##t_{k_i}## satisfying ##t_{k_m}\geq -\sqrt {\frac 1 3}##. If ##t_{k_m} \leq 0##, then clearly ##t_{k_{m+1}}>1## since ##s_{m+1}>s_m##, so ##|t_{k_{m+1}}-t_{k_m}|>1##, and if ##t_{k_m}>0## then ##t_{k_{m-1}}<-1##, so ##|t_{k_m}-t_{k_{m-1}}|>1##. So no matter how small I make the intervals comprising ##P##, there will always be at least 1 interval in ##P'## such that the distance between the corresponding points in ##P## is greater than 1. However, if I don't create ##P'## in this manner, then there is no correlation between 2 points in ##P'## becoming close and and their corresponding points in ##P## becoming close, since the order of the points in ##P'## is all jumbled up...

On the other hand, if I begin with a partition of ##[g(a),g(b)]## and try to create from it a partition of ##[a,b]##, the situation becomes even more dire, as I cant even guarantee that the ##t_i##'s will become close as the ##s_i##'s become close, as immediately evident by cases like the function ##g(x)=(x-1)^2-1## considered on the interval ##[0,2.1]##.

So this is where I've been stuck for quite a while now, until I've finally decided to ask for some help. At this point there are several things I need to clarify:

1. Is my intuition regarding the formula actually justified and founded or did I interpret this in the completely wrong way?

2. Was my proof of the case for a one-one ##g## actually correct?

3. Is there any way to extend this proof to the general case?

4. If not, is there any proof of the formula that explicitly demonstrates the intuition, or am I just wasting my time? (Hell, maybe the original proof does this and I just didn't notice).

To anyone who took the time to read through this entire thing, thank you so much, and I look forward to hearing your insight.

Last edited: