Proving Injectivity of a Vector-Valued Function Using the Mean Value Theorem

mathmari · Jun 1, 2018

Hey!

I want to prove the following criteroin using the mean value theorem for differential calculus in $\mathbb{R}^n$:

Let $G\subset \mathbb{R}^n$ a convex region, $f:G\rightarrow \mathbb{R}^n$ continuously differentiable and it holds that \begin{equation*}\det \begin{pmatrix}\frac{\partial{f_1}}{\partial{x_1}}(c_1) & \ldots & \frac{\partial{f_1}}{\partial{x_n}}(c_1)\\ \vdots & \vdots & \vdots \\ \frac{\partial{f_n}}{\partial{x_1}}(c_n) & \ldots & \frac{\partial{f_n}}{\partial{x_n}}(c_n)\end{pmatrix}\neq 0 \ \text{ for all } c_1, c_2, \ldots , c_n\in G\end{equation*} Then $f$ is injective. I have done the following:

We assume that there are $a,b\in G$ with $f(a)=f(b)$.
From the mean value theorem for vector-valued functions, as $f$ is, it holds that \begin{align*}&f(b)-f(a)=(b-a)\int_0^1J_f(a+t(b-a))dt\ \\ & \overset{f(a)=f(b)}{\Longrightarrow} \ (b-a)\int_0^1J_f(a+t(b-a))dt=0\\ & \overset{a\neq b}{\Longrightarrow} \ \int_0^1J_f(a+t(b-a))dt=0\end{align*}

Since $G$ is convex and $a,b\in G$ it follows that $a+t(b-a)\in G$. This implies that $J_f(a+t(b-a))\neq0$.

Is everything correct so far? (Wondering)

How can we conclude from that that it is not possible that $\int_0^1J_f(a+t(b-a))dt=0$ ? (Wondering)

caffeinemachine · Jun 1, 2018

mathmari said:

Hey!

I want to prove the following criteroin using the mean value theorem for differential calculus in $\mathbb{R}^n$:

Let $G\subset \mathbb{R}^n$ a convex region, $f:G\rightarrow \mathbb{R}^n$ continuously differentiable and it holds that \begin{equation*}\det \begin{pmatrix}\frac{\partial{f_1}}{\partial{x_1}}(c_1) & \ldots & \frac{\partial{f_1}}{\partial{x_n}}(c_1)\\ \vdots & \vdots & \vdots \\ \frac{\partial{f_n}}{\partial{x_1}}(c_n) & \ldots & \frac{\partial{f_n}}{\partial{x_n}}(c_n)\end{pmatrix}\neq 0 \ \text{ for all } c_1, c_2, \ldots , c_n\in G\end{equation*} Then $f$ is injective. I have done the following:

We assume that there are $a,b\in G$ with $f(a)=f(b)$.
From the mean value theorem for vector-valued functions, as $f$ is, it holds that \begin{align*}&f(b)-f(a)=(b-a)\int_0^1J_f(a+t(b-a))dt\ \\ & \overset{f(a)=f(b)}{\Longrightarrow} \ (b-a)\int_0^1J_f(a+t(b-a))dt=0\\ & \overset{a\neq b}{\Longrightarrow} \ \int_0^1J_f(a+t(b-a))dt=0\end{align*}

Since $G$ is convex and $a,b\in G$ it follows that $a+t(b-a)\in G$. This implies that $J_f(a+t(b-a))\neq0$.

Is everything correct so far? (Wondering)

How can we conclude from that that it is not possible that $\int_0^1J_f(a+t(b-a))dt=0$ ? (Wondering)

In general one can only show local injectivity. For example, consider the map $f:\mathbf R^2\to \mathbf R^2$ defined as $f(x, y)=(e^y\cos(x), e^y\sin(x))$. Then $f$ has non-singular derivative everywhere but $f$ is not an injective map.

Local injectivity follows from the inverse function theorem, but of course, one can establish this ab initio.

mathmari · Jun 2, 2018

caffeinemachine said:

In general one can only show local injectivity. For example, consider the map $f:\mathbf R^2\to \mathbf R^2$ defined as $f(x, y)=(e^y\cos(x), e^y\sin(x))$. Then $f$ has non-singular derivative everywhere but $f$ is not an injective map.

Local injectivity follows from the inverse function theorem, but of course, one can establish this ab initio.

At the exercise statement it says that that criterion of global invertibility has to be proved using the mean value theorem of differential calculus. So, is the word "global" here wrong?

Is the way I proved that criterion completely wrong? What do I have to do then? Could you give me a hint?

(Wondering)

caffeinemachine · Jun 2, 2018

mathmari said:

At the exercise statement it says that that criterion of global invertibility has to be proved using the mean value theorem of differential calculus. So, is the word "global" here wrong?

Is the way I proved that criterion completely wrong? What do I have to do then? Could you give me a hint?

(Wondering)

I actually misread the problem. The matrix you have in the OP has its rows with partial derivatives evaluated at various points of $G$. So it's not the Jacobian matrix of $f$ at any point.

Given the hypothesis of the problem, global injectivity is easy. Suppose $f(a)=f(b)$ for some $a\neq b$ in $G$.

Then for each component $f_i$ of $f$, we have a point $c_i$ on the line joining $a$ and $b$ such that $Df_i|c_i(b-a)=0$.

(This is because of the mean value theorem in one variable. Basically, we look at the real-valued function obtained by restricting $f_i$ along the line joining $a$ and $b$. The ordinary MVT says that there is a point between $a$ and $b$ where the directional derivative of $f_i$ along $b-a$ is $0$).

Thus we have found point $c_1, \ldots, c_n\in G$ such that the matrix that you have in the OP evaluates $b-a$ to $0$, contradicting the non-singularity assumption.

mathmari · Jun 2, 2018

caffeinemachine said:

Given the hypothesis of the problem, global injectivity is easy. Suppose $f(a)=f(b)$ for some $a\neq b$ in $G$.

Then for each component $f_i$ of $f$, we have a point $c_i$ on the line joining $a$ and $b$ such that $Df_i|c_i(b-a)=0$.

(This is because of the mean value theorem in one variable. Basically, we look at the real-valued function obtained by restricting $f_i$ along the line joining $a$ and $b$. The ordinary MVT says that there is a point between $a$ and $b$ where the directional derivative of $f_i$ along $b-a$ is $0$).

Thus we have found point $c_1, \ldots, c_n\in G$ such that the matrix that you have in the OP evaluates $b-a$ to $0$, contradicting the non-singularity assumption.

We assume that $f$ is not injective, i.e. that $f(a)=f(b)$ for some $a\neq b$ in $G$.

Then from the MVT for each component $f_i$ of $f$ we have that $$f_i(b)-f_i(a)=Df_i|c_i(b-a)$$ right? (Wondering)

Since $f(a)=f(b)$ it follows that $f_i(a)=f_i(b)$ for each $i$. That implies that $Df_i|c_i(b-a)=0 \ \overset{a\neq b}{\Longrightarrow} \
Df_i|c_i=0, \forall i$.

We consider a matrix where at each row $i$ we have the $Df_i|c_i$. Do we get in that way the matrix as in the initial post? (Wondering)

Since $Df_i|c_i=0$ for each $i$, we get the zero matrix and so the determinant of that matrix will also be equal to $0$, a contradiction.

So, the assumption is wrong and therefore $f$ is injective. Have I understood the proof correctly? (Wondering)

caffeinemachine · Jun 2, 2018

mathmari said:

Since $f(a)=f(b)$ it follows that $f_i(a)=f_i(b)$ for each $i$. That implies that $Df_i|c_i(b-a)=0 \ \overset{a\neq b}{\Longrightarrow} \
Df_i|c_i=0, \forall i$.

The last implication is incorrect. If a lineat map $\mathbf R^n\to \mathbf R$ evalutes to zero at a nonzero vector, that does not mean that the linear map is zero. It's just that it has a non trivial kernel.

mathmari said:

We consider a matrix where at each row $i$ we have the $Df_i|c_i$. Do we get in that way the matrix as in the initial post? (Wondering)

Yes. If you think of $Df_i|_{c_i}$ as a vector, then the $j$-th component of this vector is $(\partial f_i/\partial x_j)|_{c_i}$.

mathmari said:

Since $Df_i|c_i=0$ for each $i$, we get the zero matrix and so the determinant of that matrix will also be equal to $0$, a contradiction.

We don't get the zero-matrix. We just get a non-singular matrix, since this matrix sends $b-a$ to $0$.

mathmari · Jun 2, 2018

caffeinemachine said:

The last implication is incorrect. If a lineat map $\mathbf R^n\to \mathbf R$ evalutes to zero at a nonzero vector, that does not mean that the linear map is zero. It's just that it has a non trivial kernel.

So, it is $Df_i|_{c_i}$ at the point $b-a$ and not $Df_i|_{c_i}$ multiplied by $(b-a)$ ? I had misunderstood that.

Which is the general formula of the MTV in this case? Isn't it the difference of the function of at two points $a,b$, divided by the difference of $a,b$ and this is equal to the derivative of $f$ at a point between $a$ and $b$ ? (Wondering)

caffeinemachine · Jun 2, 2018

mathmari said:

So, it is $Df_i|_{c_i}$ at the point $b-a$ and not $Df_i|_{c_i}$ multiplied by $(b-a)$ ? I had misunderstood that.

Which is the general formula of the MTV in this case? Isn't it the difference of the function of at two points $a,b$, divided by the difference of $a,b$ and this is equal to the derivative of $f$ at a point between $a$ and $b$ ? (Wondering)

$Df_i|_{c_i}$ is a linear map from $\mathbf R^n$ to $\mathbf R$. Its value at the point $b-a$ is $0$. When we have a linear map $T:\mathbf R^n\to \mathbf R$, and we have a vector $v\in \mathbf R^n$, what phrase do we use to refer to $Tv$? Do we say "$T$ multiplied by $v$" or do we say "$T$ at $v$"? I actually do not know. But "multiplied by" would not be my choice of terminology.

Assuming the one variabel MVT, define $g_i:\mathbf R\to \mathbf R$ as $g_i(t)=f_i(a+t(b-a))$. Then $g_i(0)=g_i(1)$. Thus there is $t_i\in (0, 1)$ such that $g_i'(t_i)=0$. Therefore $Df_i|_{a+t_i(b-a)}(b-a) = 0$. Write $c_i$ to denote $a+t_i(b-a)$.

Does this make things clear?

mathmari · Jun 2, 2018

caffeinemachine said:

$Df_i|_{c_i}$ is a linear map from $\mathbf R^n$ to $\mathbf R$. Its value at the point $b-a$ is $0$. When we have a linear map $T:\mathbf R^n\to \mathbf R$, and we have a vector $v\in \mathbf R^n$, what phrase do we use to refer to $Tv$? Do we say "$T$ multiplied by $v$" or do we say "$T$ at $v$"? I actually do not know. But "multiplied by" would not be my choice of terminology.

Assuming the one variabel MVT, define $g_i:\mathbf R\to \mathbf R$ as $g_i(t)=f_i(a+t(b-a))$. Then $g_i(0)=g_i(1)$. Thus there is $t_i\in (0, 1)$ such that $g_i'(t_i)=0$. Therefore $Df_i|_{a+t_i(b-a)}(b-a) = 0$. Write $c_i$ to denote $a+t_i(b-a)$.

Does this make things clear?

So, $Df_i|_{a+t_i(b-a)}(b-a)$ is the dot product of the gradient $Df_i|_{a+t_i(b-a)}$ and the vector $(b-a)$. Or am I still thinking wrong? (Wondering)

Because, isn't it as follows?

$$g_i'(t_i)=\frac{\partial}{\partial{t_i}}f_i(a+t(b-a))=\frac{\partial{f_i}}{\partial{x_i}}\cdot \frac{\partial{(a+t(b-a))_i}}{\partial{t_i}}$$

caffeinemachine · Jun 2, 2018

mathmari said:

So, $Df_i|_{a+t_i(b-a)}(b-a)$ is the dot product of the gradient $Df_i|_{a+t_i(b-a)}$ and the vector $(b-a)$. Or am I still thinking wrong? (Wondering)

Because, isn't it as follows?

$$g_i'(t_i)=\frac{\partial}{\partial{t_i}}f_i(a+t(b-a))=\frac{\partial{f_i}}{\partial{x_i}}\cdot \frac{\partial{(a+t(b-a))_i}}{\partial{t_i}}$$

No it should be
$$g_i'(t_i)=\left.\frac{d}{dt}f_i(a+t(b-a))\right|_{t_i}= Df_i|_{a+t_i(b-a)}(b-a)$$

The last term is same as

$$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$

mathmari · Jun 2, 2018

caffeinemachine said:

No it should be
$$g_i'(t_i)=\left.\frac{d}{dt}f_i(a+t(b-a))\right|_{t_i}= Df_i|_{a+t_i(b-a)}(b-a)$$

The last term is same as

$$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$

But at $$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$ isn't $(b_j-a_j)$ multiplied with the derivative? It is not that the derivative is evaluated at $(b_j-a_j)$, is it?

And so at $Df_i|_{a+t_i(b-a)}(b-a)$ we have the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not?

(Wondering)

caffeinemachine · Jun 2, 2018

mathmari said:

But at $$
\sum_{j=1}^n \left.\frac{\partial f_i}{\partial x_j}\right|_{a+t_i(b-a)}(b_j-a_j)
$$ isn't $(b_j-a_j)$ multiplied with the derivative? It is not that the derivative is evaluated at $(b_j-a_j)$, is it?

Indeed, $b_i-a_i$ is multiplied with the $i$-th partial derivative in the above expression.

mathmari said:

And so at $Df_i|_{a+t_i(b-a)}(b-a)$ we have the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not?

I suppose your question was " Is $Df_i|_{a+t_i(b-a)}(b-a)$ same as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not? "

Well, strictly speaking, I'd say no. $Df_i|_{a+t_i(b-a)}$ is a linear map $\mathbf R^n\to \mathbf R$ and $b-a$ is a vector in $\mathbf R^n$. One cannot take the dot product of a linear operator with a vector in its domain. But since $\mathbf R^n$ has a standard inner product, $Df_i|_{a+t_i(b-a)}$ can be thought of as a vector.
Once this identification is made, onc can think of $Df_i|_{a+t_i(b-a)}(b-a)$ as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$.

mathmari · Jul 7, 2018

caffeinemachine said:

Indeed, $b_i-a_i$ is multiplied with the $i$-th partial derivative in the above expression.
I suppose your question was " Is $Df_i|_{a+t_i(b-a)}(b-a)$ same as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$, or not? "

Well, strictly speaking, I'd say no. $Df_i|_{a+t_i(b-a)}$ is a linear map $\mathbf R^n\to \mathbf R$ and $b-a$ is a vector in $\mathbf R^n$. One cannot take the dot product of a linear operator with a vector in its domain. But since $\mathbf R^n$ has a standard inner product, $Df_i|_{a+t_i(b-a)}$ can be thought of as a vector.
Once this identification is made, onc can think of $Df_i|_{a+t_i(b-a)}(b-a)$ as the dot product of $Df_i|_{a+t_i(b-a)}$ and $(b-a)$.

Ah ok! Thanks a lot! (Smile)

Proving Injectivity of a Vector-Valued Function Using the Mean Value Theorem

1. What does it mean for a function to be injective?

2. How do you show that a function is injective?

3. What is the importance of proving that a function is injective?

4. Can a function be both injective and surjective?

5. Are there any visual representations of injective functions?

Similar threads

Hot Threads

Recent Insights