I Attempting to find an intuitive proof of the substitution formula

Summary
Although the proof of the substitution formula is easy enough, I feel like the formula hides a much more intuitive yet rigorous explanation. The way it is written suggests some direct connection between the 2 sides of the equation, and I'm trying to express that connection in an explicit mathematical way.
Hello everyone.

First off, I'm sorry if this post is excessively long, but after tackling this for so many hours I've decided I could use some help, and I need to show everything I did to express exactly what I wish to do. Also, to be clear, this post deals with integration by substitution. Now to the matter at hand.

The substitution formula as I know it goes as follows:
If $f$ and $g'$ are continuous, then:
$$\int_{g(a)}^{g(b)}f(u) \, du = \int_a^bf(g(x))\cdot g'(x) \, dx$$

The standard proof of the formula is easy enough, using the chain rule and the FToC. However, I somewhat feel like that is a roundabout way of proving it that doesn't really express the meaning of the formula. The formula itself, particularly when written in Leibniz notation, seems to show something more elegant about the connection between the sides of the equation. If I view it at the most basic intuitive level, the integration notation $\int_a^bf(x) \, dx$ expresses the summation of the products of intervals of ever decreasing size, $t_i-t_{i-1}$, and the value of $f$ somewhere on those intervals, in other words the value of Riemann sums comprised of smaller and smaller intervals. Viewing it that way, the left side shows the summation of $f$ when the variable is $g(x)$, over (ever decreasing) intervals of said variable, $g(t_i)-g(t_{i-1})$, only $g(x)$ is denoted simply as $u$. The right side shows the exact same summation, only it divides and then multiplies each product by $t_i-t_{i-1}$ to get $\frac {g(t_i)-g(t_{i-1})} {t_i-t_{i-1}} \cdot (t_i-t_{i-1})$ instead of $g(t_i)-g(t_{i-1})$. But since the integral is essentially the limit as the intervals get smaller and smaller, this becomes $g'(x)dx$.

Looking at it this way, the formula seems to express something almost trivial, the same expression on both sides, only on the right side each part of the expression is divided and then multiplied by the same (nonzero) value. Having noticed that, I sought to find out if there is any way to express that intuitive interpretation in a rigorous and explicit way using the basic concepts like Riemann sums and $\epsilon - \delta$ arguments, to prove the formula in a way that fully encapsulates the intuitive interpretation, rather than the conventional (although much simpler) way that seems to miss this.

However, so far I succeeded in doing this only when $g$ is one-one on $[a,b]$ (or when $a=b$ but that's redundant). The proof I found for this case goes as follows:

Assumptions: $f$ and $g'$ are continuous, $g$ is one-one on $[a,b]$.

Since $g'$ exists, $g$ is continuous, and since it is one-one on $[a,b]$, it is either increasing or decreasing on that interval, say the first. Let $P=\{t_0,...,t_n\}$ be any partition of $[a,b]$, then $P$ generates a unique partition $P'=\{g(t_0),...,g(t_n)\}$ of $[g(a),g(b)]$ and vice-versa (Since it is one-one and increasing). Furthermore, for any $x_i \in [g(t_{i-1}),g(t_i)]$, $x_i=g(y_i)$ for some unique $y_i \in [t_{i-1},t_i]$. Therefore, for any partition $P'=\{s_0,...,s_n\}$ of $[g(a),g(b)]$, and any choices of $x_i \in [s_{i-1},s_i]$, we have:

$$\sum_{i=1}^n f(x_i)(s_i-s_{i-1})=\sum_{i=1}^n f(g(y_i))(g(t_i)-g(t_{i-1}))=\sum_{i=1}^n f(g(y_i)) \frac {(g(t_i)-g(t_{i-1}))} {t_i-t_{i-1}} (t_i-t_{i-1})$$

(This is pretty much exactly the intuition I had with the formula, all that is left to do is show that as the intervals become smaller, the sides of this equation approach the corresponding sides of the original formula). Now, for every $\epsilon > 0$:

1. By the basic property of Riemann sums, $\exists \delta' > 0$ such that, if $P'=\{s_0,...,s_n\}$ is a partition of $[g(a),g(b)]$ with $s_i-s_{i-1}<\delta'$ for $1 \leq i \leq n$, then for any choices of $x_i\in[s_{i-1},s_i]$, $$\left| \sum_{i=1}^n f(x_i)(s_i-s_{i-1})-\int_{g(a)}^{g(b)}f(u) \, du \right| <\frac \epsilon 2$$
2. Since $g$ is continuous, it is uniformly continuous on $[a,b]$, therefore $\exists \delta_1 > 0$ such that, if $t_i-t_{i-1}<\delta_1$, then $g(t_i)-g(t_{i-1})<\delta'$

3. Since $(f \circ g)$ and $g'$ are continuous, it can be shown that $\exists \delta_2>0$ such that if $P=\{t_0,...,t_n\}$ is a partition of $[a,b]$ with $t_i-t_{i-1}<\delta_2$ for $1 \leq i \leq n$, then for any choices of $y_i,z_i \in [t_{i-1},t_i]$,$$\left| \sum_{i=1}^n f(g(y_i)) g'(z_i) (t_i-t_{i-1})- \int_a^bf(g(x))\cdot g'(x) \, dx \right|<\frac \epsilon 2$$

Let $\delta = \min(\delta_1,\delta_2)$. Suppose $P=\{t_0,...,t_n\}$ is a partition of $[a,b]$ with $t_i-t_{i-1}<\delta$, and let $P'=\{g(t_1),...,g(t_n)\}=\{s_1...,s_n\}$ be the corresponding partition of $[g(a),g(b)]$. By the Mean Value Theorem, $\frac {(g(t_i)-g(t_{i-1}))} {t_i-t_{i-1}}=g'(z_i)$ for some $z_i \in (t_i,t_{i-1})$, and by (2.), $s_i-s_{i-1}<\delta'$ for $1 \leq i \leq n$. Therefore, by (1.): $$\left|\int_{g(a)}^{g(b)}f(u) \, du - \sum_{i=1}^n f(x_i)(s_i-s_{i-1})\right| <\frac \epsilon 2$$ and since $$\sum_{i=1}^n f(x_i)(s_i-s_{i-1})=\sum_{i=1}^n f(g(y_i)) \frac {(g(t_i)-g(t_{i-1}))} {t_i-t_{i-1}} (t_i-t_{i-1})=\sum_{i=1}^n f(g(y_i)) g'(z_i) (t_i-t_{i-1})$$ for $y_i,z_i \in [t_{i-1},t_i]$, by (3.): $$\left| \sum_{i=1}^n f(x_i)(s_i-s_{i-1})-\int_a^bf(g(x))\cdot g'(x) \, dx \right|<\frac \epsilon 2$$ Thus, we have $$\left|\int_{g(a)}^{g(b)}f(u) \, du - \int_a^bf(g(x))\cdot g'(x) \, dx \right|<\epsilon$$

Since this is true for all $\epsilon > 0$, we must have:
$$\int_{g(a)}^{g(b)}f(u) \, du = \int_a^bf(g(x))\cdot g'(x) \, dx$$

The proof for when $g$ is decreasing is similar with some minor tweaks, or one can consider the functions $-g(x)$ and $f(-x)$ and use the above proof. Q.E.D.

So, assuming I haven't messed anything up, I've shown that the intuition matches the actual math for the case of a one-one $g$. However, things broke down for me once $g$ is no longer one-one, especially since the isomorphism between the partitions of $[a,b]$ and partitions of $[g(a),g(b)]$ no longer exists. The entire proof relies on the idea that if the distance between the points in one partition approaches 0, the distance between the corresponding points in the other partition approaches 0 as well, but I can't seem to make that happen.

If I start with a partition $P=\{t_0,...,t_n\}$ of $[a,b]$, I can create from it a partition $P'=\{s_0,...,s_n\}$ of $[g(a),g(b)]$, using least upper bounds and the Intermediate Value Theorem for example, and even make the $s_i$'s get close to each other as the $t_i$'s get close to each other, but there is no guarantee that if 2 points of $P'$ are close, then the corresponding points of $P$ are close. For example, if $g(x)=x^3-x$ and $[a,b]=[-2,2]$, so $[g(a),g(b)]=[-6,6]$. Say I take any partition $P=\{t_0,...,t_n\}$ of $[a,b]$ and create from it a partition $P'=\{s_0,...,s_n\}$ of $[g(a),g(b)]$ where each point in $P'$ is the value of $g$ at some point in $P$. Let $t_{k_i}$ be the point in $P$ such that $s_i=g(t_{k_i})$. Suppose I create $P'$ so that $k_j>k_q$ for $j>q$, let $t_{k_m}$ be the smallest $t_{k_i}$ satisfying $t_{k_m}\geq -\sqrt {\frac 1 3}$. If $t_{k_m} \leq 0$, then clearly $t_{k_{m+1}}>1$ since $s_{m+1}>s_m$, so $|t_{k_{m+1}}-t_{k_m}|>1$, and if $t_{k_m}>0$ then $t_{k_{m-1}}<-1$, so $|t_{k_m}-t_{k_{m-1}}|>1$. So no matter how small I make the intervals comprising $P$, there will always be at least 1 interval in $P'$ such that the distance between the corresponding points in $P$ is greater than 1. However, if I don't create $P'$ in this manner, then there is no correlation between 2 points in $P'$ becoming close and and their corresponding points in $P$ becoming close, since the order of the points in $P'$ is all jumbled up...

On the other hand, if I begin with a partition of $[g(a),g(b)]$ and try to create from it a partition of $[a,b]$, the situation becomes even more dire, as I cant even guarantee that the $t_i$'s will become close as the $s_i$'s become close, as immediately evident by cases like the function $g(x)=(x-1)^2-1$ considered on the interval $[0,2.1]$.

So this is where I've been stuck for quite a while now, until I've finally decided to ask for some help. At this point there are several things I need to clarify:

1. Is my intuition regarding the formula actually justified and founded or did I interpret this in the completely wrong way?
2. Was my proof of the case for a one-one $g$ actually correct?
3. Is there any way to extend this proof to the general case?
4. If not, is there any proof of the formula that explicitly demonstrates the intuition, or am I just wasting my time? (Hell, maybe the original proof does this and I just didn't notice).

To anyone who took the time to read through this entire thing, thank you so much, and I look forward to hearing your insight.

Last edited:

Math_QED

Homework Helper
I did not read the whole thing, but what exactly isn't intuitive at the proof using chain rule and FToC?

Maybe you should investigate why the chain rule is intuitive and the FToC is intuitive and then the substitution formula will become intuitive too as a corollary?

The chain rule is intuitive to me because it can be visualised as "calceling" fractions. I think someone recently wrote an insight about the intuition behind FToC. You might want to look into that.

• PeroK

Maybe you should investigate why the chain rule is intuitive and the FToC is intuitive and then the substitution formula will become intuitive too as a corollary?
This was actually one of the first things I tried. I took the time to prove all of these Theorems on paper and understand each stage of the proofs (and once you prove them rigorously with $\epsilon-\delta$ arguments, the intuition practically forces itself upon you). At this point I actually understand pretty much perfectly the intuition both behind the chain rule and the FToC. However I still cannot "express" my intuition about the substitution formula using the intuition of the chain rule and the FToC, despite the fact that these are the only theorems used in the original proof of the formula.

I did not read the whole thing, but what exactly isn't intuitive at the proof using chain rule and FToC?
For me, the original proof makes perfect sense, and I understand it, but I still view it a clever yet roundabout way of proving the formula. The original proof essentially says "Let $F$ be a function such $F'=f$, if we view $g(x)$ as a variable "inserted into $F$", then we get the left side of the formula. But if we instead consider the different function $F \circ g$ with $x$ as the variable instead, then we get the right side of the formula. Since it's essentially the same thing viewed from a different "perspectives", the 2 sides are identical".

Now, this intuition is true, and I do appreciate its existence, but even though that is a perfectly acceptable way to view the substitution formula, by simply looking at how it is written I feel like it expresses something even more fundamental and elegant, something practically trivial. This something is a direct correlation between the sides of the equation, without even considering $F$, which stems from the basic definition (and intuition) of integrals and is what I expressed in the paragraph below the statement of the substitution formula.

So I'm trying to find a proof that expresses that very basic intuition, mainly in order to see if it is actually founded on the mathematics or is just a coincidence. I hope that clarified things a bit.

mathman

u=g(x) and du=g'(x)dx seems pretty intuitive enough.

• FactChecker

u=g(x) and du=g'(x)dx seems pretty intuitive enough.
I mean... it is, I'm just trying to prove it. The assertion that $du = \frac {du} {dx} \cdot dx$ is what drove me to try to find the proof. I can't use this equation in a formal proof because this is just notation, it doesn't mean anything on its own since standard calculus doesn't actually deal with "infinitely small intervals", it deals with limits. The equation is at best a simplified representation of the the actual case, which is why you would never use that equation and just cancel out the $dx$'s in a formal proof. I don't want to accept that notion at face value just because it looks like it might work, which is why I sought out to rigorously prove that it really does represent what is going on.

mathman

A formal proof might involve setting up Riemann sums using both representations and show they lead to the same limit.

FactChecker

Gold Member
2018 Award
It is often true that the details of a proof are too messy to be intuitive and that an intuitive approach is not a rigorous proof. Not all proofs are simple. You should be happy that the intuition and the proof lead to the same result.

Stephen Tashi

However, things broke down for me once $g$ is no longer one-one, especially since the isomorphism between the partitions of $[a,b]$ and partitions of $[g(a),g(b)]$ no longer exists.
Why must you make the mapping between partitions of $[a,b]$ and partitions of $[g(a),g(b)]$ depend on $g$?

For example, $m(x) = g(a) + (x-a) (g(b) - g(a))/(b-a)$ does the job.
To anyone who took the time to read through this entire thing
That might be the null set. I haven't read your entire proof.

"Attempting to find an intuitive proof of the substitution formula"

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving