Show that the Gram-Schmidt process gives the shortest length

In summary: I'm not even sure what we are working with anymore. Would it be better to refer to the # of components as ##m## rather than ##n##?In general, I would try to avoid using ##i## as an index for both ##v_i## and ##\lambda_i##. You can write$$\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^2-\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|^2 = \sum_{i=1}^n \lambda_i^2 \|v_i
  • #1
DeadOriginal
274
2

Homework Statement


Let ##v_{1},...,v_{m}## be an orthonormal set of vectors in ##V##. Let ##v\not\in S(v_{1},...,v_{m})##. Show that the vector ##v'=v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}## given by the Gram-Schmidt process has the shortest length among all vectors of the form ##v-x## for ##x\in Sv_{1},...,v_{n})##.

The Attempt at a Solution


Since the problem wants me to deal with the length of vectors I thought that I would probably have to use something along the lines of
$$
||v'||\leq ||v-x||\Leftrightarrow \left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|\leq \left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|
$$
Since ##x\in S(v_{1},...,v_{n})## ##x## must be a linear combination of ##v_{1},...,v_{n}## so ##x=\sum\limits_{i=1}^{n}\lambda_{i}v_{i}##.
I then tried to show that
$$
0\leq \left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|-\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|.
$$
To get rid of the square roots I squared both sides to get
$$
0\leq \left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^{2}-2\left(\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|\right) \left(\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|\right)+\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|^{2}
$$
and this is equivalent to
$$
0\leq (v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i},v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i})-2 \left(\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|\right)
\left(\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|\right)+(v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i},v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}).
$$
I am not sure how to proceed from here.
 
Physics news on Phys.org
  • #2
DeadOriginal said:
I then tried to show that
$$
0\leq \left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|-\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|.
$$
To get rid of the square roots I squared both sides to get
$$
0\leq \left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^{2}-2\left(\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|\right) \left(\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|\right)+\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|^{2}
$$

It might seem like nitpicking, but you should never write a step of a proof that includes the equality that you're trying to show. It's not only bad form, but it can lead to confusion for both the reader and author. Be clear about what is to be shown and then the rest of the proof should contain statements that you can show to be true.

Instead, try something like:

"If we can show that

$$
0\leq \left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|-\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|,
$$

then the lemma follows. Therefore, we expand

$$
\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|-\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right| = \left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^{2}-2\left(\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|\right) \left(\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|\right)+\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|^{2}.
$$

etc."

and this is equivalent to
$$
0\leq (v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i},v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i})-2 \left(\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|\right)
\left(\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|\right)+(v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i},v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}).
$$
I am not sure how to proceed from here.

You've either got lots of typos or you made mistakes in expanding the expressions. In any case, the cross terms here make the expression very complicated. I would suggest computing

$$
\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^2-\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|^2.
$$

If this is positive semidefinite, then the lemma follows. If you expand carefully, you should be able to show that this expression can be written as a sum of squares.
 
  • #3
I can see how the expression

$$
\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^2-\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|^2.
$$

becomes the difference between two inner products:

$$
(v-x,v-x)-(v',v')
$$

and writing that out I do get a sum of squares but I don't see how I can show that this sum is greater than 0. In particular, if we let ##(v-x)_{i}## be the ##i##th component of ##v-x##, I don't see how I can show

$$
(v-x)_{1}^{2}+\cdots +(v-x)_{n}^{2}-v_{1}'^{2}-\cdots -v_{n}'^{2}\geq 0.
$$
 
  • #4
DeadOriginal said:
I don't see how I can show

$$
(v-x)_{1}^{2}+\cdots +(v-x)_{n}^{2}-v_{1}'^{2}-\cdots -v_{n}'^{2}\geq 0.
$$

You have to express this in terms of ##v##, ##v_i## and ##\lambda_i##. After some cancellations, you should be able to write the remaining terms as a sum of squares with positive coefficients.
 
  • #5
fzero said:
You have to express this in terms of ##v##, ##v_i## and ##\lambda_i##. After some cancellations, you should be able to write the remaining terms as a sum of squares with positive coefficients.

I tried to expand this further into
$$
(v_{1}-\lambda_{1}v_{1})^{2}+\cdots+(v_{n}-\lambda_{n}v_{n})-(v_{1}-(v,v_{1})v_{1})^{2}-\cdots-(v_{n}-(v,v_{n})v_{n})^{2}
$$
which is equal to
$$
v_{1}^{2}+\cdots+v_{n}^{2}+\lambda_{1}^{2}v_{1}^{2}+\cdots+\lambda_{n}^{2}v_{n}^{2}-2\lambda_{1}v_{1}^{2}-\cdots - 2\lambda_{n}v_{n}^{2} - v_{1}^{2}-\cdots -v_{n}^{2} -(v,v_{1})^{2}v_{1}^{2}-\cdots - (v,v_{n})^{2}v_{n}^{2} + 2(v,v_{1})v_{1}^{2} +\cdots +2(v,v_{n})v_{n}^{2}
$$
which after canceling out becomes
$$
\lambda_{1}^{2}v_{1}^{2}+\cdots+\lambda_{n}^{2}v_{n}^{2}-2\lambda_{1}v_{1}^{2}-\cdots - 2\lambda_{n}v_{n}^{2}-(v,v_{1})^{2}v_{1}^{2}-\cdots - (v,v_{n})^{2}v_{n}^{2} + 2(v,v_{1})v_{1}^{2} +\cdots +2(v,v_{n})v_{n}^{2}.
$$

The only way I found to expand ##(v,v_{i})## further was ##(v,v_{i})=(||v||)(||v_{i}||)\cos\theta##. For ##i=1,...,m## we have ##||v_{i}||=1## but that's as far as I can go.

So factoring out all of the ##v_{i}'s## I am left with
$$
v_{1}^{2}(\lambda_{1}^{2}+2||v||\cos\theta-2\lambda_{1}-||v||^{2}\cos^{2}\theta)+\cdots+v_{n}^{2}(\lambda_{n}^{2}+2||v||\cos \theta-2\lambda_{n}-(||v||)^{2}(||v_{n}||)^{2}\cos^{2}\theta).
$$

I think it might be that I am getting confused by how to denote each component of our vector.
 
Last edited:
  • #6
DeadOriginal said:
I tried to expand this further into
$$
(v_{1}-\lambda_{1}v_{1})^{2}+\cdots+(v_{n}-\lambda_{n}v_{n})-(v_{1}-(v,v_{1})v_{1})^{2}-\cdots-(v_{n}-(v,v_{n})v_{n})^{2}
$$
which is equal to
$$
v_{1}^{2}+\cdots+v_{n}^{2}+\lambda_{1}^{2}v_{1}^{2}+\cdots+\lambda_{n}^{2}v_{n}^{2}-2\lambda_{1}v_{1}^{2}-\cdots - 2\lambda_{n}v_{n}^{2} - v_{1}^{2}-\cdots -v_{n}^{2} -(v,v_{1})^{2}v_{1}^{2}-\cdots - (v,v_{n})^{2}v_{n}^{2} + 2(v,v_{1})v_{1}^{2} +\cdots +2(v,v_{n})v_{n}^{2}
$$
which after canceling out becomes
$$
\lambda_{1}^{2}v_{1}^{2}+\cdots+\lambda_{n}^{2}v_{n}^{2}-2\lambda_{1}v_{1}^{2}-\cdots - 2\lambda_{n}v_{n}^{2}-(v,v_{1})^{2}v_{1}^{2}-\cdots - (v,v_{n})^{2}v_{n}^{2} + 2(v,v_{1})v_{1}^{2} +\cdots +2(v,v_{n})v_{n}^{2}.
$$

The only way I found to expand ##(v,v_{i})## further was ##(v,v_{i})=(||v||)(||v_{i}||)\cos\theta##. For ##i=1,...,m## we have ##||v_{i}||=1## but that's as far as I can go.

So factoring out all of the ##v_{i}'s## I am left with
$$
v_{1}^{2}(\lambda_{1}^{2}+2||v||\cos\theta-2\lambda_{1}-||v||^{2}\cos^{2}\theta)+\cdots+v_{n}^{2}(\lambda_{n}^{2}+2||v||\cos \theta-2\lambda_{n}-(||v||)^{2}(||v_{n}||)^{2}\cos^{2}\theta).
$$

I think it might be that I am getting confused by how to denote each component of our vector.

You are making this MUCH too complicated with all the formulas. Any vector of the form v-x (where x is in the span of the v's) can also be expressed as v'-x' (where x' is a possibly different vector in the span of the v's). v' is orthogonal to any vector in the span of the v's. Since v' and x' are orthogonal, the Pythagorean theorem can tell you about the length of v'-x'.
 
  • #7
DeadOriginal said:
I tried to expand this further into
$$
(v_{1}-\lambda_{1}v_{1})^{2}+\cdots+(v_{n}-\lambda_{n}v_{n})-(v_{1}-(v,v_{1})v_{1})^{2}-\cdots-(v_{n}-(v,v_{n})v_{n})^{2}
$$

This expression is not equal to

$$
\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^2-\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|^2.
$$

For instance, the first term in the sum is

$$ \left((v)_1 - \lambda_{1}(v_{1})_1\right)^2 = \left((v)_1 - \lambda_{1}\right)^2. $$

The original notation is somewhat unfortunate. Since we used ##v_1## to stand for an element of an orthonormal basis, if we want to denote the 1st component of the vector ##v##, we should write it in some other way like ##(v)_1## in order not to confuse it with ##v_1##.

Anyway, there's no reason to expand into components. Use inner products to compute the length of the vectors:

$$
\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^2 = (v,v) - 2\sum\limits_{i=1}^{n}\lambda_{i}(v,v_{i}) + \sum\limits_{i,j=1}^{n}\lambda_{i}\lambda_{j}(v_{i},v_{j}),
$$

with a similar expression for the other term.
 
Last edited:
  • #8
fzero said:
This expression is not equal to

$$
\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^2-\left|\left|v-\sum\limits_{i=1}^{n}(v,v_{i})v_{i}\right|\right|^2.
$$

For instance, the first term in the sum is

$$ \left((v)_1 - \lambda_{1}(v_{1})_1\right)^2 = \left((v)_1 - \lambda_{1}\right)^2. $$

The original notation is somewhat unfortunate. Since we used ##v_1## to stand for an element of an orthonormal basis, if we want to denote the 1st component of the vector ##v##, we should write it in some other way like ##(v)_1## in order not to confuse it with ##v_1##.

Anyway, there's no reason to expand into components. Use inner products to compute the length of the vectors:

$$
\left|\left|v-\sum\limits_{i=1}^{n}\lambda_{i}v_{i}\right|\right|^2 = (v,v) - \sum\limits_{i=1}^{n}\lambda_{i}(v,v_{i}) + \sum\limits_{i,j=1}^{n}\lambda_{i}\lambda_{j}(v_{i},v_{j}),
$$

with a similar expression for the other term.

Ahh! Yes. That does it. I got too bogged down with my notation before. Thanks.
 

What is the Gram-Schmidt process?

The Gram-Schmidt process is a mathematical method used to orthonormalize a set of vectors in an inner product space. This process is commonly used in linear algebra and serves as the basis for many other important mathematical concepts.

What is the significance of orthonormalization in the Gram-Schmidt process?

Orthonormalization is an important step in the Gram-Schmidt process because it ensures that the resulting set of vectors are orthogonal to each other (meaning they are perpendicular) and have a length of 1. This makes it easier to perform calculations and proofs using these vectors.

How does the Gram-Schmidt process ensure the shortest length?

The Gram-Schmidt process ensures the shortest length by first producing a set of orthonormal vectors from a given set of vectors. Then, the length of each vector is calculated and the shortest length is chosen as the new basis vector, ensuring that the resulting set of vectors are the shortest possible.

Can the Gram-Schmidt process be used for any set of vectors?

Yes, the Gram-Schmidt process can be used for any set of vectors in an inner product space. However, it is most commonly used for linearly independent sets of vectors.

Are there any limitations to the Gram-Schmidt process?

One limitation of the Gram-Schmidt process is that it does not always produce a unique set of orthonormal vectors. Depending on the initial set of vectors, there may be multiple ways to orthonormalize them. Another limitation is that it is computationally intensive and may not be suitable for large sets of vectors.

Similar threads

  • Calculus and Beyond Homework Help
Replies
1
Views
191
  • Calculus and Beyond Homework Help
Replies
6
Views
465
  • Calculus and Beyond Homework Help
Replies
1
Views
228
  • Calculus and Beyond Homework Help
Replies
4
Views
279
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
7
Views
400
  • Differential Equations
Replies
3
Views
1K
  • Linear and Abstract Algebra
Replies
7
Views
733
  • Introductory Physics Homework Help
Replies
6
Views
719
  • Calculus and Beyond Homework Help
Replies
8
Views
865
Back
Top