Want to understand proof of Dimension/ Rank-Nullity theorem

Click For Summary
SUMMARY

The discussion centers on the proof of the Rank-Nullity Theorem, which states that for a linear transformation \( f: V \to W \) between finite-dimensional vector spaces, the equation \( \text{dim}(V) = \text{nullity}(f) + \text{rank}(f) \) holds. The participants clarify the definitions of kernel and image, emphasizing that the kernel is a subspace of \( V \) and that the image is spanned by the transformed basis vectors. Key steps in the proof involve demonstrating the linear independence of the image basis and showing that every element in the image can be expressed as a linear combination of these basis vectors.

PREREQUISITES
  • Understanding of linear transformations and vector spaces
  • Familiarity with the concepts of kernel and image in linear algebra
  • Knowledge of linear independence and basis of vector spaces
  • Proficiency in mathematical notation and proofs
NEXT STEPS
  • Study the formal proof of the Rank-Nullity Theorem in linear algebra textbooks
  • Learn about the properties of linear transformations and their implications
  • Explore the concept of basis and dimension in vector spaces
  • Practice proving the linear independence of sets of vectors
USEFUL FOR

Students of linear algebra, mathematics educators, and anyone seeking to deepen their understanding of vector space theory and linear transformations.

Emspak
Messages
240
Reaction score
1
OK, I am working on proofs of the rank-nullity (otherwise in my class known as the dimension theorem).

Here's a proof that my professor gave in the class. I want to be sure I understand the reasoning. So I will lay out what he had here with a less-precise layman's wording, as I want to be sure I know what I am doing. It makes the proof easier to memorize for me.

So:

Let V and W be vector spaces.

T:V→W is linear
V is finite-dimensional
function f \in HomK(V,W)
Let dim(V) = n for some n\in \mathbb N and dim(ker(f) = r

dim(V) = nullity(T) + rank(T) = dim(ker(f) + dim(Im(f))

in some notations (like the one in our text) this wold look like dim(V) = nullity(T) + rank(T) = dim(N(T)) + dim(R(T))

on to the proof:

ker(f) \subseteq V. And it is a subspace.

Why a subspace? Because, since the kernel of any function is the set of vectors that goes to zero, adding to those vectors another vector in V will still be in V, a will multiplying them (since they go to zero).

since we let dim(V)=n all the bases (basis-es?) of V will have n elements.

therefore \exists a basis {x1, x2, ... , xn} of ker(f) where r≤n.

(The reason is that any basis will have an equal or lesser number of dimensions than the space it describes. ker(f) is a subspace).

by the exchange lemma, which says that given any linearly independent subset
\exists {y1, y2, ... ys} \in V such that {y1, y2, ... yn}\cap{x1, x2,... ,xr} = \varnothing the next step says that {y1, y2, ... yn}\cup{x1, x2,... ,xr} is a basis of V.

Now, my question is if that is because the intersection of the two sets is the empty set and they are linearly independent?

After that, we get to saying that {f(y_1), f(y_2),... f(y_n)} is a basis of Im(f).

But I am not sure why that is.

He then says we can claim the following:
λ_1 f(y_1) + λ_2 f(y_2)+... +λ_s f(y_s)= 0

for some λ1, λ2, ..., λs \in K

so taking
f \Big( \sum_{i=1}^s x_i y_i \Big) = 0
we can make that into
\Big[ \Big( \sum_{i=1}^s \lambda_i f(y_i) \Big) \Big] = \sum_{i=1}^s \lambda_i y_i \in ker(f)

That step I am a bit fuzzy on the reasoning. IIUC, it's just saying that taking the sum of f using the union of x an y sets equals zero (its just f(x,y) ) and the summation of the product of λ and all the f(y) terms is the same as the sum of all the λy terms and they are all in the kernel of f. But I wanted to be sure.

He then said that the above implies that there exists some set of scalars, α1, α2, ... αs\in K s.t. \sum_{i=1}^s \lambda_i y_i = \sum_{j=1}^r y_i x_j and that further implies

\sum_{j=1}^r \alpha_j x_j - \sum_{i=1}^s \alpha_i x_i = 0 which implies αj, λi = 0 for all 1≤j≤r and 1≤i≤s.

and that further implies that the set {f(y1), f(y2), ... ,f(ys)} is linearly independent.

Then he says: for all z in the Im(f) there exists x\in V s.t. z=f(x) (this seems obvious at one level but I felt like it was just sleight of hand).

then

z = f \Big(\sum_{j=1}^r \alpha_j x_j - \sum_{i=1}^s \lambda_i y_i \Big) = \sum_{j=1}^r \alpha_j f(x_i) + \sum_{i=1}^s x_i f(y_i)= 0 + \sum_{i=1}^s x_i f(y_i)

and then he says dim(V) = r + s = dim(ker(f)) + dim(Im(f))

its the last few steps I can't seem to justify in my head. Any help would be appreciated (and seeing if I copied this wrong from the board).
 
Last edited:
Physics news on Phys.org
There are a number of transcription errors in your post.

Emspak said:
OK, I am working on proofs of the rank-nullity (otherwise in my class known as the dimension theorem).

Here's a proof that my professor gave in the class. I want to be sure I understand the reasoning. So I will lay out what he had here with a less-precise layman's wording,

That's a bad idea. Precision is vitally important in mathematics.

So:

Let V and V be vector spaces.

V and W.

T:V→W is linear
V is finite-dimensional
function f \in HomK(V,W)

You need a linear map V \to W. You've defined two: T and f. Since you don't mention T again I'll stick with f.

Let dim(V) = n for some n\in \mathbb N and dim(ker(f) = r
dim(V) = nullity(T) + rank(T) = dim(ker(f) + dim(Im(f))

A precise statement of the theorem is:

Let V and W be vector spaces over a field K, and let f : V \to W be linear. Let V be finite dimensional with n = \dim V and r = \dim \ker(f). Then \dim \mathrm{Im}(f) = n - r.


on to the proof:

ker(f) \subseteq V. And it is a subspace.

Why a subspace? Because, since the kernel of any function is the set of vectors that goes to zero, adding to those vectors another vector in V will still be in V, a will multiplying them (since they go to zero).

The proof that \ker(f) is a subspace should be the subject of a separate lemma; otherwise it makes no sense to talk about \dim \ker(f).

since we let dim(V)=n all the bases (basis-es?) of V will have n elements.

therefore \exists a basis {x1, x2, ... , xn} of ker(f) where r≤n.

(The reason is that any basis will have an equal or lesser number of dimensions than the space it describes. ker(f) is a subspace).

So far so good, but obviously the index should only run from 1 to r: \{x_1, \dots, x_r\} is the basis of \ker(f).

by the exchange lemma, which says that given any linearly independent subset
\exists {y1, y2, ... ys} \in V such that {y1, y2, ... yn}\cap{x1, x2,... ,xr} = \varnothing the next step says that {y1, y2, ... yn}\cup{x1, x2,... ,xr} is a basis of V.

Now, my question is if that is because the intersection of the two sets is the empty set and they are linearly independent?

It's the result of applying the exchange lemma: Given a basis for a subspace of V and a basis for V, you can obtain a basis of V which contains the basis of the subspace. The notation of the statement is probably going to be confusing here, but the key point is that you can apply the exchange lemma to obtain a basis \{x_1, \dots, x_r, y_1, \dots, y_s\} of V consisting of a basis \{x_1, \dots, x_r\} of \ker (f) and s = n - r vectors \{y_1, \dots, y_s\} which are not in \ker(f).

After that, we get to saying that {f(y_1), f(y_2),... f(y_n)} is a basis of Im(f).

But I am not sure why that is.

This is an assertion. It requires proof.

Why prove that? You're trying ultimately to prove that \dim \mathrm{Im}(f) = s. The way to do that is to prove that the s vectors f(y_1), \dots, f(y_s) are a basis of \mathrm{Im}(f), since then by definition \dim \mathrm{Im}(f) = s.

How do you do that? You need to show that every element of \mathrm{Im}(f) can be expressed as a linear combination of those vectors, and that those vectors are linearly independent.

Your professor has decided to switch the order of those, so he first shows that those vectors are linearly independent. He does so by contradiction: he assumes that they are not linearly independent, and shows that this means that the basis of V we've chosen is not linearly independent. That's not possible by definition of a basis, so he has the desired contradiction.

He then says we can claim the following:
λ_1 f(y_1) + λ_2 f(y_2)+... +λ_s f(y_s)= 0

for some λ1, λ2, ..., λs \in K

But the crucial point is that you must assume that at least one \lambda_j is not zero.

so taking
f \Big( \sum_{i=1}^s x_i y_i \Big) = 0
we can make that into
\Big[ \Big( \sum_{i=1}^s \lambda_i f(y_i) \Big) \Big] = \sum_{i=1}^s \lambda_i y_i \in ker(f)

That step I am a bit fuzzy on the reasoning. IIUC, it's just saying that taking the sum of f using the union of x an y sets equals zero (its just f(x,y) ) and the summation of the product of λ and all the f(y) terms is the same as the sum of all the λy terms and they are all in the kernel of f. But I wanted to be sure.

He's using the linearity of f to show that
<br /> 0 = \sum_{i = 1}^s \lambda_i f(y_i) = f\left(\sum_{i = 1}^s \lambda_i y_i\right) = 0.<br />
The immediate consequence of that is that \sum_{i = 1}^s \lambda_i y_i \in \ker(f).


He then said that the above implies that there exists some set of scalars, α1, α2, ... αs\in K s.t. \sum_{i=1}^s \lambda_i y_i = \sum_{j=1}^r y_i x_j
Yes: \sum_{i = 1}^s \lambda_i y_i \in \ker(f), so it can be written as a linear combination of basis vectors of \ker(f):
<br /> \sum_{i = 1}^s \lambda_i y_i = \sum_{j=1}^r \alpha_j x_j<br />

and that further implies

\sum_{j=1}^r \alpha_j x_j - \sum_{i=1}^s \alpha_i x_i = 0 which implies αj, λi = 0 for all 1≤j≤r and 1≤i≤s.

and that further implies that the set {f(y1), f(y2), ... ,f(ys)} is linearly independent.

That's a bit garbled. You've shown that
<br /> \sum_{i = 1}^s \lambda_i y_i - \sum_{j=1}^r \alpha_j x_j = 0<br />
Since at least one of the \lambda_i doesn't vanish, it follows that the vectors \{x_1, \dots, x_r, y_1, \dots, y_s\} are not linearly independent. But that's impossible: those vectors are by construction a basis of V, so they are linearly independent. This is the contradiction we're looking for: f(y_1), \dots, f(y_s) must be linearly independent.

Now he shows that \{f(y_1), \dots, f(y_s)\} spans \mathrm{Im}(f).

Then he says: for all z in the Im(f) there exists x\in V s.t. z=f(x) (this seems obvious at one level but I felt like it was just sleight of hand).

That's the definition of \mathrm{Im}(f): it's exactly those things you get from applying f to every member of V. So if z \in \mathrm{Im}(f), it can only be because there exists x \in V such that z = f(x).

You've got a basis for V, so you can express x \in V as a linear combination of basis vectors:
<br /> x = \sum_{j = 1}^r \alpha_j x_j + \sum_{i = 1}^s \lambda_i y_i.<br />
Then, because f is linear,
<br /> f(x) = \sum_{j = 1}^r \alpha_j f(x_j) + \sum_{i = 1}^s \lambda_i f(y_i)<br /> = 0 + \sum_{i = 1}^s \lambda_i f(y_i)<br />
because by construction f(x_j) = 0 for all j. That completes the proof.
 
thanks, i noticed some of the typos too later on when it was pointed out. THe problem is that this guy asks for proofs on the exam and wants us to basically do it from memory. I feel like I am watching some guy talk in latin and we're supposed to memorize the bloody liturgy, you know? At least in vector calculus I felt like there was a procedure to follow, and that I could do the problems knowing that, so I didn't have so much memorization.
 
Question: A clock's minute hand has length 4 and its hour hand has length 3. What is the distance between the tips at the moment when it is increasing most rapidly?(Putnam Exam Question) Answer: Making assumption that both the hands moves at constant angular velocities, the answer is ## \sqrt{7} .## But don't you think this assumption is somewhat doubtful and wrong?

Similar threads

Replies
34
Views
3K
  • · Replies 13 ·
Replies
13
Views
2K
Replies
15
Views
2K
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
2
Views
1K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
Replies
1
Views
2K