Want to understand proof of Dimension/ Rank-Nullity theorem

Emspak · Jun 23, 2013

OK, I am working on proofs of the rank-nullity (otherwise in my class known as the dimension theorem).

Here's a proof that my professor gave in the class. I want to be sure I understand the reasoning. So I will lay out what he had here with a less-precise layman's wording, as I want to be sure I know what I am doing. It makes the proof easier to memorize for me.

So:

Let V and W be vector spaces.

T:V→W is linear
V is finite-dimensional
function f \in Hom_K(V,W)
Let dim(V) = n for some n\in \mathbb N and dim(ker(f) = r

dim(V) = nullity(T) + rank(T) = dim(ker(f) + dim(Im(f))

in some notations (like the one in our text) this wold look like dim(V) = nullity(T) + rank(T) = dim(N(T)) + dim(R(T))

on to the proof:

ker(f) \subseteq V. And it is a subspace.

Why a subspace? Because, since the kernel of any function is the set of vectors that goes to zero, adding to those vectors another vector in V will still be in V, a will multiplying them (since they go to zero).

since we let dim(V)=n all the bases (basis-es?) of V will have n elements.

therefore \exists a basis {x₁, x₂, ... , x_n} of ker(f) where r≤n.

(The reason is that any basis will have an equal or lesser number of dimensions than the space it describes. ker(f) is a subspace).

by the exchange lemma, which says that given any linearly independent subset
\exists {y₁, y₂, ... y_s} \in V such that {y₁, y₂, ... y_n}\cap{x₁, x₂,... ,x_r} = \varnothing the next step says that {y₁, y₂, ... y_n}\cup{x₁, x₂,... ,x_r} is a basis of V.

Now, my question is if that is because the intersection of the two sets is the empty set and they are linearly independent?

After that, we get to saying that {f(y_1), f(y_2),... f(y_n)} is a basis of Im(f).

But I am not sure why that is.

He then says we can claim the following:
λ_1 f(y_1) + λ_2 f(y_2)+... +λ_s f(y_s)= 0

for some λ₁, λ₂, ..., λ_s \in K

so taking
f \Big( \sum_{i=1}^s x_i y_i \Big) = 0
we can make that into
\Big[ \Big( \sum_{i=1}^s \lambda_i f(y_i) \Big) \Big] = \sum_{i=1}^s \lambda_i y_i \in ker(f)

That step I am a bit fuzzy on the reasoning. IIUC, it's just saying that taking the sum of f using the union of x an y sets equals zero (its just f(x,y) ) and the summation of the product of λ and all the f(y) terms is the same as the sum of all the λy terms and they are all in the kernel of f. But I wanted to be sure.

He then said that the above implies that there exists some set of scalars, α₁, α₂, ... α_s\in K s.t. \sum_{i=1}^s \lambda_i y_i = \sum_{j=1}^r y_i x_j and that further implies

\sum_{j=1}^r \alpha_j x_j - \sum_{i=1}^s \alpha_i x_i = 0 which implies α_j, λ_i = 0 for all 1≤j≤r and 1≤i≤s.

and that further implies that the set {f(y₁), f(y₂), ... ,f(y_s)} is linearly independent.

Then he says: for all z in the Im(f) there exists x\in V s.t. z=f(x) (this seems obvious at one level but I felt like it was just sleight of hand).

then

z = f \Big(\sum_{j=1}^r \alpha_j x_j - \sum_{i=1}^s \lambda_i y_i \Big) = \sum_{j=1}^r \alpha_j f(x_i) + \sum_{i=1}^s x_i f(y_i)= 0 + \sum_{i=1}^s x_i f(y_i)

and then he says dim(V) = r + s = dim(ker(f)) + dim(Im(f))

its the last few steps I can't seem to justify in my head. Any help would be appreciated (and seeing if I copied this wrong from the board).

pasmith · Jun 23, 2013

There are a number of transcription errors in your post.

Emspak said:

OK, I am working on proofs of the rank-nullity (otherwise in my class known as the dimension theorem).

Here's a proof that my professor gave in the class. I want to be sure I understand the reasoning. So I will lay out what he had here with a less-precise layman's wording,

That's a bad idea. Precision is vitally important in mathematics.

So:

Let V and V be vector spaces.

V and W.

T:V→W is linear
V is finite-dimensional
function f \in Hom_K(V,W)

You need a linear map V \to W. You've defined two: T and f. Since you don't mention T again I'll stick with f.

Let dim(V) = n for some n\in \mathbb N and dim(ker(f) = r
dim(V) = nullity(T) + rank(T) = dim(ker(f) + dim(Im(f))

A precise statement of the theorem is:

Let V and W be vector spaces over a field K, and let f : V \to W be linear. Let V be finite dimensional with n = \dim V and r = \dim \ker(f). Then \dim \mathrm{Im}(f) = n - r.

on to the proof:

ker(f) \subseteq V. And it is a subspace.

Why a subspace? Because, since the kernel of any function is the set of vectors that goes to zero, adding to those vectors another vector in V will still be in V, a will multiplying them (since they go to zero).

The proof that \ker(f) is a subspace should be the subject of a separate lemma; otherwise it makes no sense to talk about \dim \ker(f).

since we let dim(V)=n all the bases (basis-es?) of V will have n elements.

therefore \exists a basis {x₁, x₂, ... , x_n} of ker(f) where r≤n.

(The reason is that any basis will have an equal or lesser number of dimensions than the space it describes. ker(f) is a subspace).

So far so good, but obviously the index should only run from 1 to r: \{x_1, \dots, x_r\} is the basis of \ker(f).

by the exchange lemma, which says that given any linearly independent subset
\exists {y₁, y₂, ... y_s} \in V such that {y₁, y₂, ... y_n}\cap{x₁, x₂,... ,x_r} = \varnothing the next step says that {y₁, y₂, ... y_n}\cup{x₁, x₂,... ,x_r} is a basis of V.

Now, my question is if that is because the intersection of the two sets is the empty set and they are linearly independent?

It's the result of applying the exchange lemma: Given a basis for a subspace of V and a basis for V, you can obtain a basis of V which contains the basis of the subspace. The notation of the statement is probably going to be confusing here, but the key point is that you can apply the exchange lemma to obtain a basis \{x_1, \dots, x_r, y_1, \dots, y_s\} of V consisting of a basis \{x_1, \dots, x_r\} of \ker (f) and s = n - r vectors \{y_1, \dots, y_s\} which are not in \ker(f).

After that, we get to saying that {f(y_1), f(y_2),... f(y_n)} is a basis of Im(f).

But I am not sure why that is.

This is an assertion. It requires proof.

Why prove that? You're trying ultimately to prove that \dim \mathrm{Im}(f) = s. The way to do that is to prove that the s vectors f(y_1), \dots, f(y_s) are a basis of \mathrm{Im}(f), since then by definition \dim \mathrm{Im}(f) = s.

How do you do that? You need to show that every element of \mathrm{Im}(f) can be expressed as a linear combination of those vectors, and that those vectors are linearly independent.

Your professor has decided to switch the order of those, so he first shows that those vectors are linearly independent. He does so by contradiction: he assumes that they are not linearly independent, and shows that this means that the basis of V we've chosen is not linearly independent. That's not possible by definition of a basis, so he has the desired contradiction.

He then says we can claim the following:
λ_1 f(y_1) + λ_2 f(y_2)+... +λ_s f(y_s)= 0

for some λ₁, λ₂, ..., λ_s \in K

But the crucial point is that you must assume that at least one \lambda_j is not zero.

so taking
f \Big( \sum_{i=1}^s x_i y_i \Big) = 0
we can make that into
\Big[ \Big( \sum_{i=1}^s \lambda_i f(y_i) \Big) \Big] = \sum_{i=1}^s \lambda_i y_i \in ker(f)

That step I am a bit fuzzy on the reasoning. IIUC, it's just saying that taking the sum of f using the union of x an y sets equals zero (its just f(x,y) ) and the summation of the product of λ and all the f(y) terms is the same as the sum of all the λy terms and they are all in the kernel of f. But I wanted to be sure.

He's using the linearity of f to show that
 0 = \sum_{i = 1}^s \lambda_i f(y_i) = f\left(\sum_{i = 1}^s \lambda_i y_i\right) = 0. 
The immediate consequence of that is that \sum_{i = 1}^s \lambda_i y_i \in \ker(f).

He then said that the above implies that there exists some set of scalars, α₁, α₂, ... α_s\in K s.t. \sum_{i=1}^s \lambda_i y_i = \sum_{j=1}^r y_i x_j

Yes: \sum_{i = 1}^s \lambda_i y_i \in \ker(f), so it can be written as a linear combination of basis vectors of \ker(f):
 \sum_{i = 1}^s \lambda_i y_i = \sum_{j=1}^r \alpha_j x_j

and that further implies

\sum_{j=1}^r \alpha_j x_j - \sum_{i=1}^s \alpha_i x_i = 0 which implies α_j, λ_i = 0 for all 1≤j≤r and 1≤i≤s.

and that further implies that the set {f(y₁), f(y₂), ... ,f(y_s)} is linearly independent.

That's a bit garbled. You've shown that
 \sum_{i = 1}^s \lambda_i y_i - \sum_{j=1}^r \alpha_j x_j = 0 
Since at least one of the \lambda_i doesn't vanish, it follows that the vectors \{x_1, \dots, x_r, y_1, \dots, y_s\} are not linearly independent. But that's impossible: those vectors are by construction a basis of V, so they are linearly independent. This is the contradiction we're looking for: f(y_1), \dots, f(y_s) must be linearly independent.

Now he shows that \{f(y_1), \dots, f(y_s)\} spans \mathrm{Im}(f).

Then he says: for all z in the Im(f) there exists x\in V s.t. z=f(x) (this seems obvious at one level but I felt like it was just sleight of hand).

That's the definition of \mathrm{Im}(f): it's exactly those things you get from applying f to every member of V. So if z \in \mathrm{Im}(f), it can only be because there exists x \in V such that z = f(x).

You've got a basis for V, so you can express x \in V as a linear combination of basis vectors:
 x = \sum_{j = 1}^r \alpha_j x_j + \sum_{i = 1}^s \lambda_i y_i. 
Then, because f is linear,
 f(x) = \sum_{j = 1}^r \alpha_j f(x_j) + \sum_{i = 1}^s \lambda_i f(y_i) = 0 + \sum_{i = 1}^s \lambda_i f(y_i) 
because by construction f(x_j) = 0 for all j. That completes the proof.

Emspak · Jun 23, 2013

thanks, i noticed some of the typos too later on when it was pointed out. THe problem is that this guy asks for proofs on the exam and wants us to basically do it from memory. I feel like I am watching some guy talk in latin and we're supposed to memorize the bloody liturgy, you know? At least in vector calculus I felt like there was a procedure to follow, and that I could do the problems knowing that, so I didn't have so much memorization.

Want to understand proof of Dimension/ Rank-Nullity theorem

Thread 'Distance between a Clock's hands when the distance is increasing most rapidly'

Similar threads

Distance between a Clock's hands when the distance is increasing most rapidly

Volume with spherical coordinates

Use greedy vertex coloring algorithm to prove the upper bound of χ

Does this series converge uniformly?

Independent components of three indexed systems ##T_{ijk}##

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers