Proving Injectivity of a Vector-Valued Function Using the Mean Value Theorem

  • Context: MHB 
  • Thread starter Thread starter mathmari
  • Start date Start date
  • Tags Tags
    Injective
Click For Summary

Discussion Overview

The discussion revolves around proving the injectivity of a vector-valued function using the mean value theorem in the context of differential calculus in $\mathbb{R}^n$. Participants explore the implications of the mean value theorem for continuously differentiable functions and the conditions under which a function can be shown to be injective, particularly focusing on local versus global injectivity.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that if the determinant of the Jacobian matrix is non-zero for all points in a convex region, then the function is injective.
  • Others argue that the proof may only establish local injectivity, as illustrated by a counterexample involving the function $f(x, y)=(e^y\cos(x), e^y\sin(x))$, which has a non-singular derivative everywhere but is not injective.
  • A later reply questions the correctness of the initial proof approach, suggesting that the matrix in the original post does not represent the Jacobian at a single point.
  • Some participants clarify that the mean value theorem can be applied to each component of the vector-valued function, leading to a contradiction if injectivity is assumed.
  • There is a discussion about the implications of a linear map evaluating to zero at a non-zero vector, indicating that this does not imply the map itself is zero.

Areas of Agreement / Disagreement

Participants express differing views on whether the initial proof correctly establishes global injectivity, with some asserting that it does not and others suggesting that it may. The discussion remains unresolved regarding the validity of the proof and the interpretation of the mean value theorem in this context.

Contextual Notes

There are limitations regarding the assumptions made about the Jacobian matrix and the implications of the mean value theorem, particularly in distinguishing between local and global injectivity. The discussion highlights the need for careful consideration of the conditions under which injectivity can be claimed.

mathmari
Gold Member
MHB
Messages
4,984
Reaction score
7
Hey! :o

I want to prove the following criteroin using the mean value theorem for differential calculus in $\mathbb{R}^n$:

Let $G\subset \mathbb{R}^n$ a convex region, $f:G\rightarrow \mathbb{R}^n$ continuously differentiable and it holds that \begin{equation*}\det \begin{pmatrix}\frac{\partial{f_1}}{\partial{x_1}}(c_1) & \ldots & \frac{\partial{f_1}}{\partial{x_n}}(c_1)\\ \vdots & \vdots & \vdots \\ \frac{\partial{f_n}}{\partial{x_1}}(c_n) & \ldots & \frac{\partial{f_n}}{\partial{x_n}}(c_n)\end{pmatrix}\neq 0 \ \text{ for all } c_1, c_2, \ldots , c_n\in G\end{equation*} Then $f$ is injective. I have done the following:

We assume that there are $a,b\in G$ with $f(a)=f(b)$.
From the mean value theorem for vector-valued functions, as $f$ is, it holds that \begin{align*}&f(b)-f(a)=(b-a)\int_0^1J_f(a+t(b-a))dt\ \\ & \overset{f(a)=f(b)}{\Longrightarrow} \ (b-a)\int_0^1J_f(a+t(b-a))dt=0\\ & \overset{a\neq b}{\Longrightarrow} \ \int_0^1J_f(a+t(b-a))dt=0\end{align*}

Since $G$ is convex and $a,b\in G$ it follows that $a+t(b-a)\in G$. This implies that $J_f(a+t(b-a))\neq0$.

Is everything correct so far? (Wondering)

How can we conclude from that that it is not possible that $\int_0^1J_f(a+t(b-a))dt=0$ ? (Wondering)
 
Physics news on Phys.org
mathmari said:
Hey! :o

I want to prove the following criteroin using the mean value theorem for differential calculus in $\mathbb{R}^n$:

Let $G\subset \mathbb{R}^n$ a convex region, $f:G\rightarrow \mathbb{R}^n$ continuously differentiable and it holds that \begin{equation*}\det \begin{pmatrix}\frac{\partial{f_1}}{\partial{x_1}}(c_1) & \ldots & \frac{\partial{f_1}}{\partial{x_n}}(c_1)\\ \vdots & \vdots & \vdots \\ \frac{\partial{f_n}}{\partial{x_1}}(c_n) & \ldots & \frac{\partial{f_n}}{\partial{x_n}}(c_n)\end{pmatrix}\neq 0 \ \text{ for all } c_1, c_2, \ldots , c_n\in G\end{equation*} Then $f$ is injective. I have done the following:

We assume that there are $a,b\in G$ with $f(a)=f(b)$.
From the mean value theorem for vector-valued functions, as $f$ is, it holds that \begin{align*}&f(b)-f(a)=(b-a)\int_0^1J_f(a+t(b-a))dt\ \\ & \overset{f(a)=f(b)}{\Longrightarrow} \ (b-a)\int_0^1J_f(a+t(b-a))dt=0\\ & \overset{a\neq b}{\Longrightarrow} \ \int_0^1J_f(a+t(b-a))dt=0\end{align*}

Since $G$ is convex and $a,b\in G$ it follows that $a+t(b-a)\in G$. This implies that $J_f(a+t(b-a))\neq0$.

Is everything correct so far? (Wondering)

How can we conclude from that that it is not possible that $\int_0^1J_f(a+t(b-a))dt=0$ ? (Wondering)
In general one can only show local injectivity. For example, consider the map $f:\mathbf R^2\to \mathbf R^2$ defined as $f(x, y)=(e^y\cos(x), e^y\sin(x))$. Then $f$ has non-singular derivative everywhere but $f$ is not an injective map.

Local injectivity follows from the inverse function theorem, but of course, one can establish this ab initio.
 
caffeinemachine said:
In general one can only show local injectivity. For example, consider the map $f:\mathbf R^2\to \mathbf R^2$ defined as $f(x, y)=(e^y\cos(x), e^y\sin(x))$. Then $f$ has non-singular derivative everywhere but $f$ is not an injective map.

Local injectivity follows from the inverse function theorem, but of course, one can establish this ab initio.

At the exercise statement it says that that criterion of global invertibility has to be proved using the mean value theorem of differential calculus. So, is the word "global" here wrong?

Is the way I proved that criterion completely wrong? What do I have to do then? Could you give me a hint?

(Wondering)
 
mathmari said:
At the exercise statement it says that that criterion of global invertibility has to be proved using the mean value theorem of differential calculus. So, is the word "global" here wrong?

Is the way I proved that criterion completely wrong? What do I have to do then? Could you give me a hint?

(Wondering)
I actually misread the problem. The matrix you have in the OP has its rows with partial derivatives evaluated at various points of $G$. So it's not the Jacobian matrix of $f$ at any point.

Given the hypothesis of the problem, global injectivity is easy. Suppose $f(a)=f(b)$ for some $a\neq b$ in $G$.

Then for each component $f_i$ of $f$, we have a point $c_i$ on the line joining $a$ and $b$ such that $Df_i|c_i(b-a)=0$.

(This is because of the mean value theorem in one variable. Basically, we look at the real-valued function obtained by restricting $f_i$ along the line joining $a$ and $b$. The ordinary MVT says that there is a point between $a$ and $b$ where the directional derivative of $f_i$ along $b-a$ is $0$).

Thus we have found point $c_1, \ldots, c_n\in G$ such that the matrix that you have in the OP evaluates $b-a$ to $0$, contradicting the non-singularity assumption.
 
caffeinemachine said:
Given the hypothesis of the problem, global injectivity is easy. Suppose $f(a)=f(b)$ for some $a\neq b$ in $G$.

Then for each component $f_i$ of $f$, we have a point $c_i$ on the line joining $a$ and $b$ such that $Df_i|c_i(b-a)=0$.

(This is because of the mean value theorem in one variable. Basically, we look at the real-valued function obtained by restricting $f_i$ along the line joining $a$ and $b$. The ordinary MVT says that there is a point between $a$ and $b$ where the directional derivative of $f_i$ along $b-a$ is $0$).

Thus we have found point $c_1, \ldots, c_n\in G$ such that the matrix that you have in the OP evaluates $b-a$ to $0$, contradicting the non-singularity assumption.

We assume that $f$ is not injective, i.e. that $f(a)=f(b)$ for some $a\neq b$ in $G$.

Then from the MVT for each component $f_i$ of $f$ we have that $$f_i(b)-f_i(a)=Df_i|c_i(b-a)$$ right? (Wondering)

Since $f(a)=f(b)$ it follows that $f_i(a)=f_i(b)$ for each $i$. That implies that $Df_i|c_i(b-a)=0 \ \overset{a\neq b}{\Longrightarrow} \
Df_i|c_i=0, \forall i$.

We consider a matrix where at each row $i$ we have the $Df_i|c_i$. Do we get in that way the matrix as in the initial post? (Wondering)

Since $Df_i|c_i=0$ for each $i$, we get the zero matrix and so the determinant of that matrix will also be equal to $0$, a contradiction.

So, the assumption is wrong and therefore $f$ is injective. Have I understood the proof correctly? (Wondering)
 
Last edited by a moderator:
mathmari said:
Since $f(a)=f(b)$ it follows that $f_i(a)=f_i(b)$ for each $i$. That implies that $Df_i|c_i(b-a)=0 \ \overset{a\neq b}{\Longrightarrow} \
Df_i|c_i=0, \forall i$.
The last implication is incorrect. If a lineat map $\mathbf R^n\to \mathbf R$ evalutes to zero at a nonzero vector, that does not mean that the linear map is zero. It's just that it has a non trivial kernel.

mathmari said:
We consider a matrix where at each row $i$ we have the $Df_i|c_i$. Do we get in that way the matrix as in the initial post? (Wondering)
Yes. If you think of $Df_i|_{c_i}$ as a vector, then the $j$-th component of this vector is $(\partial f_i/\partial x_j)|_{c_i}$.

mathmari said:
Since $Df_i|c_i=0$ for each $i$, we get the zero matrix and so the determinant of that matrix will also be equal to $0$, a contradiction.

We don't get the zero-matrix. We just get a non-singular matrix, since this matrix sends $b-a$ to $0$.
 
caffeinemachine said:
The last implication is incorrect. If a lineat map $\mathbf R^n\to \mathbf R$ evalutes to zero at a nonzero vector, that does not mean that the linear map is zero. It's just that it has a non trivial kernel.

So, it is $Df_i|_{c_i}$ at the point $b-a$ and not $Df_i|_{c_i}$ multiplied by $(b-a)$ ? I had misunderstood that.

Which is the general formula of the MTV in this case? Isn't it the difference of the function of at two points $a,b$, divided by the difference of $a,b$ and this is equal to the derivative of $f$ at a point between $a$ and $b$ ? (Wondering)
 
mathmari said:
So, it is $Df_i|_{c_i}$ at the point $b-a$ and not $Df_i|_{c_i}$ multiplied by $(b-a)$ ? I had misunderstood that.

Which is the general formula of the MTV in this case? Isn't it the difference of the function of at two points $a,b$, divided by the difference of $a,b$ and this is equal to the derivative of $f$ at a point between $a$ and $b$ ? (Wondering)

$Df_i|_{c_i}$ is a linear map from $\mathbf R^n$ to $\mathbf R$. Its value at the point $b-a$ is $0$. When we have a linear map $T:\mathbf R^n\to \mathbf R$, and we have a vector $v\in \mathbf R^n$, what phrase do we use to refer to $Tv$? Do we say "$T$ multiplied by $v$" or do we say "$T$ at $v$"? I actually do not know. But "multiplied by" would not be my choice of terminology.

Assuming the one variabel MVT, define $g_i:\mathbf R\to \mathbf R$ as $g_i(t)=f_i(a+t(b-a))$. Then $g_i(0)=g_i(1)$. Thus there is $t_i\in (0, 1)$ such that $g_i'(t_i)=0$. Therefore $Df_i|_{a+t_i(b-a)}(b-a) = 0$. Write $c_i$ to denote $a+t_i(b-a)$.

Does this make things clear?
 
caffeinemachine said:
$Df_i|_{c_i}$ is a linear map from $\mathbf R^n$ to $\mathbf R$. Its value at the point $b-a$ is $0$. When we have a linear map $T:\mathbf R^n\to \mathbf R$, and we have a vector $v\in \mathbf R^n$, what phrase do we use to refer to $Tv$? Do we say "$T$ multiplied by $v$" or do we say "$T$ at $v$"? I actually do not know. But "multiplied by" would not be my choice of terminology.

Assuming the one variabel MVT, define $g_i:\mathbf R\to \mathbf R$ as $g_i(t)=f_i(a+t(b-a))$. Then $g_i(0)=g_i(1)$. Thus there is $t_i\in (0, 1)$ such that $g_i'(t_i)=0$. Therefore $Df_i|_{a+t_i(b-a)}(b-a) = 0$. Write $c_i$ to denote $a+t_i(b-a)$.

Does this make things clear?
So, $Df_i|_{a+t_i(b-a)}(b-a)$ is the dot product of the gradient $Df_i|_{a+t_i(b-a)}$ and the vector $(b-a)$. Or am I still thinking wrong? (Wondering)

Because, isn't it as follows?

$$g_i'(t_i)=\frac{\partial}{\partial{t_i}}f_i(a+t(b-a))=\frac{\partial{f_i}}{\partial{x_i}}\cdot \frac{\partial{(a+t(b-a))_i}}{\partial{t_i}}$$
 
  • #10
mathmari said:
So, $Df_i|_{a+t_i(b-a)}(b-a)$ is the dot product of the gradient $Df_i|_{a+t_i(b-a)}$ and the vector $(b-a)$. Or am I still thinking wrong? (Wondering)

Because, isn't it as follows?

$$g_i'(t_i)=\frac{\partial}{\partial{t_i}}f_i(a+t(b-a))=\frac{\partial{f_i}}{\partial{x_i}}\cdot \frac{\partial{(a+t(b-a))_i}}{\partial{t_i}}$$

No it should be
$$g_i'(t_i)=\left.\frac{d}{dt}f_i(a+t(b-a))\right|_{t_i}= Df_i|_{a+t_i(b-a)}(b-a)$$

The last term is same as

$$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$
 
  • #11
caffeinemachine said:
No it should be
$$g_i'(t_i)=\left.\frac{d}{dt}f_i(a+t(b-a))\right|_{t_i}= Df_i|_{a+t_i(b-a)}(b-a)$$

The last term is same as

$$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$

But at $$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$ isn't $(b_j-a_j)$ multiplied with the derivative? It is not that the derivative is evaluated at $(b_j-a_j)$, is it?

And so at $Df_i|_{a+t_i(b-a)}(b-a)$ we have the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not?

(Wondering)
 
  • #12
mathmari said:
But at $$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$ isn't $(b_j-a_j)$ multiplied with the derivative? It is not that the derivative is evaluated at $(b_j-a_j)$, is it?
Indeed, $b_i-a_i$ is multiplied with the $i$-th partial derivative in the above expression.

mathmari said:
And so at $Df_i|_{a+t_i(b-a)}(b-a)$ we have the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not?

I suppose your question was " Is $Df_i|_{a+t_i(b-a)}(b-a)$ same as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not? "

Well, strictly speaking, I'd say no. $Df_i|_{a+t_i(b-a)}$ is a linear map $\mathbf R^n\to \mathbf R$ and $b-a$ is a vector in $\mathbf R^n$. One cannot take the dot product of a linear operator with a vector in its domain. But since $\mathbf R^n$ has a standard inner product, $Df_i|_{a+t_i(b-a)}$ can be thought of as a vector.
Once this identification is made, onc can think of $Df_i|_{a+t_i(b-a)}(b-a)$ as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$.
 
  • #13
caffeinemachine said:
Indeed, $b_i-a_i$ is multiplied with the $i$-th partial derivative in the above expression.
I suppose your question was " Is $Df_i|_{a+t_i(b-a)}(b-a)$ same as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not? "

Well, strictly speaking, I'd say no. $Df_i|_{a+t_i(b-a)}$ is a linear map $\mathbf R^n\to \mathbf R$ and $b-a$ is a vector in $\mathbf R^n$. One cannot take the dot product of a linear operator with a vector in its domain. But since $\mathbf R^n$ has a standard inner product, $Df_i|_{a+t_i(b-a)}$ can be thought of as a vector.
Once this identification is made, onc can think of $Df_i|_{a+t_i(b-a)}(b-a)$ as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$.

Ah ok! Thanks a lot! (Smile)
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 24 ·
Replies
24
Views
5K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 0 ·
Replies
0
Views
627
  • · Replies 26 ·
Replies
26
Views
4K
  • · Replies 0 ·
Replies
0
Views
2K
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K