Two versions of Gram Schmidt orthogonalization method

  • MHB
  • Thread starter mathmari
  • Start date
  • Tags
    Method
In summary, the conversation is about proving the mathematical equivalence between two versions of the Gram Schmidt orthogonalization method. The difference between the two algorithms lies in the inner for loop and the definition of $r_{ji}$ and $s_{ji}$ respectively. To show that they are equivalent, there is a need to prove that $r_{ji}=s_{ji}$ for all $j\leq i$ and that $\tilde{q}_i=q_i$. The conversation also includes a discussion about calculating an orthonormal basis for given vectors using both versions of the algorithm. Induction is suggested as a method to prove the equivalence, as well as the use of a given accuracy $\epsilon=5\cdot 10^{-3
  • #1
mathmari
Gold Member
MHB
5,049
7
Hey! :eek:

We have the following two versions of Gram Schmidt orthogonalization method:

View attachment 8762

Show that both algorithms are mathematically equivalent, i.e. $s_{ji}=r_{ji}$ for all $j\leq i$ and $\tilde{q}_i=q_i, i=1, \ldots , n$ for the output vectors.
I have done the following:

The difference of the two algorithms is the inner for loop, the definition of $r_{ji}$ and $s_{ji}$ respectively, or not?

If we show that these two are the same then it follows that $\tilde{q}_i=q_i$, doesn't it?

But how can we show that $r_{ji}=s_{ji}$ ? Could you give me a hint?

(Wondering)
 

Attachments

  • GS.JPG
    GS.JPG
    10.2 KB · Views: 88
Mathematics news on Phys.org
  • #2
There is also a second question:

Calculate for the following vectors an orthonormal basis with both versions, with an accuracy of $\epsilon=5\cdot 10^{-3}$.
\begin{equation*}a_1=\begin{pmatrix}1 \\ 10^{-3} \\ 10^{-3}\end{pmatrix}, \ a_2=\begin{pmatrix}1 \\ 10^{-3} \\ 0\end{pmatrix}, \ a_3=\begin{pmatrix}1 \\ 0 \\ 10^{-3}\end{pmatrix}\end{equation*}
For the classic method I have done the following:
  • $i=1$ :

    \begin{equation*}q_1=a_1=\begin{pmatrix}1 \\ 10^{-3} \\ 10^{-3}\end{pmatrix}\end{equation*}

    We don't enter the inner for loop.

    Then $r_{11}=\|q_1\|_2=\sqrt{1+2\cdot 10^{-6}}$ and so $q_1=\frac{q_1}{r_1}=\frac{1}{\sqrt{1+2\cdot 10^{-6}}}\begin{pmatrix}1 \\ 10^{-3} \\ 10^{-3}\end{pmatrix}$.
  • $i=2$ :

    \begin{equation*}q_2=a_2=\begin{pmatrix}1 \\ 10^{-3} \\ 0\end{pmatrix}\end{equation*}

    At the inner loop we have for $j=1$ : \begin{equation*}r_{12}=q_1^T\cdot a_2=\frac{\sqrt{1+ 10^{-6}}}{\sqrt{1+2\cdot 10^{-6}}} \ \text{ and } \ q_2=q_2-r_{12}\cdot q_1=\frac{1}{1+2\cdot 10^{-6}}\begin{pmatrix}10^{-6} \\ 10^{-9} \\ -10^{-3}-10^{-9}\end{pmatrix}\end{equation*}
    After the inner for loop we have \begin{equation*}r_{22}=\|q_2\|_2=\frac{\sqrt{3\cdot 10^{-2}+2\cdot 10^{-18}+10^{-6}}}{1+2\cdot 10^{-6}}\end{equation*} and \begin{equation*}q_2=\frac{q_2}{r_{22}}=\frac{1}{\sqrt{3\cdot 10^{-2}+2\cdot 10^{-18}+10^{-6}}}\begin{pmatrix}10^{-6} \\ 10^{-9} \\ -10^{-3}-10^{-9}\end{pmatrix}\end{equation*}
  • $i=3$ :

    \begin{equation*}q_3=a_3=\begin{pmatrix}1 \\ 0 \\ 10^{-3}\end{pmatrix}\end{equation*}

    At the inner for loop we have $j=1$ and $j=2$.

    For $j=1$ we get:

    \begin{equation*}r_{13}=q_1^Ta_3=\frac{1+10^{-6}}{\sqrt{1+2\cdot 10^{-6}}} \ \text{ and } \ q_3=q_3-r_{13}q_1=\frac{1}{1+2\cdot 10^{-6}}\begin{pmatrix}10^{-6} \\ -10^{-3}-10^{-9} \\ 10^{-9}\end{pmatrix}\end{equation*}

    For $j=2$ we get:

    \begin{equation*}r_{23}=q_2^Ta_3=\frac{-10^{-12}}{\sqrt{3\cdot 10^{-12}+2\cdot 10^{-18}+10^{-6}}}\end{equation*} and \begin{equation*}q_3=q_3-r_{23}q_2=\frac{1}{(1+2\cdot 10^{-6})(3+2\cdot 10^{-6}+10^6)}\begin{pmatrix}4\cdot 10^{-6}+4\cdot 10^{-12}+1\\ -4\cdot 10^{-3}-4\cdot 10^{-9}-10^{-3} \\ 0\end{pmatrix}\end{equation*}

    After the inner for loop we have \begin{equation*}r_{33}=\|q_3\|_2=\frac{[4\cdot 10^{-6}+4\cdot 10^{-12}+1]^2+[ -4\cdot 10^{-3}-4\cdot 10^{-9}-10^{-3}]^2}{(1+2\cdot 10^{-6})(3+2\cdot 10^{-6}+10^6)}\end{equation*} and \begin{equation*}q_3=\frac{q_3}{r_{33}}\end{equation*}
Is everything correct? (Wondering) I applied also the modified algorithm and we get the same and the only difference is $r_{23}$ and $s_{23}$ at their calculations but their results are the same, or not? (Wondering)
 
Last edited by a moderator:
  • #3
mathmari said:
View attachment 8762

Show that both algorithms are mathematically equivalent, i.e. $s_{ji}=r_{ji}$ for all $j\leq i$ and $\tilde{q}_i=q_i, i=1, \ldots , n$ for the output vectors.
mathmari said:
I have done the following:

The difference of the two algorithms is the inner for loop, the definition of $r_{ji}$ and $s_{ji}$ respectively, or not?

If we show that these two are the same then it follows that $\tilde{q}_i=q_i$, doesn't it?

But how can we show that $r_{ji}=s_{ji}$ ? Could you give me a hint?

Hey mathmari! (Wave)

Suppose it's true up to $i-1$.
That is, for all $k<i$ and $j<i$ we have that $r_{jk}=s_{jk}$ and $\mathbf q_k=\mathbf {\tilde q}_k$.

Since we are constructing an orthonormal basis we also have $\mathbf q_j\cdot \mathbf q_k=\delta_{jk}$, don't we? (Wondering)When we enter the inner loop we have $\mathbf q_i^{(0)}=\mathbf a_i$ and $\mathbf {\tilde q}_i^{(0)}=\mathbf a_i$, which are equal.
Then we calculate $r_{1i}=\mathbf q_1\cdot \mathbf a_i$ and $s_{1i}=\mathbf{\tilde q}_1\cdot \mathbf{\tilde q}_i^{(0)}=\mathbf q_1\cdot \mathbf a_i$, which are equal again.
Then $\mathbf q_i^{(1)}=\mathbf q_i^{(0)}-r_{1i}\mathbf q_1=\mathbf a_i-r_{1i}\mathbf q_1$ and $\mathbf{\tilde q}_i^{(1)}=\mathbf{\tilde q}_i^{(0)}-s_{1j}\mathbf{\tilde q}_1=\mathbf a_i-r_{1j}\mathbf q_1$. Equal again.

Now comes the interesting part.
We calculate $r_{2i}=\mathbf q_2\cdot \mathbf a_i$ and $s_{2i}=\mathbf{\tilde q}_2\cdot \mathbf{\tilde q}_i^{(1)}=\mathbf q_2\cdot (\mathbf a_i-r_{1j}\mathbf q_1) = \mathbf q_2\cdot \mathbf a_i$.
See how we use that $\mathbf q_2\cdot \mathbf q_1=0$?
So the result is yet again the same. (Thinking)
 
  • #4
So, do we show that they are equivalent with induction?

Since we have that $r_{2i}$ and $s_{2i}$ are equal, does it follow that the values that we get at the next steps will be equal again?

(Wondering)
 
  • #5
mathmari said:
So, do we show that they are equivalent with induction?

Since we have that $r_{2i}$ and $s_{2i}$ are equal, does it follow that the values that we get at the next steps will be equal again?

Induction seems to be the way to go yes.

That is what we still have to prove. I have only shown that it holds for one step of the inner loop where it becomes interesting.
My hint: $\mathbf{\tilde q}_i^{(j)}$ is equal to $\mathbf a_i$ plus a linear combination of $\mathbf q_k$ with $k<j$. (Thinking)
 
  • #6
mathmari said:
Is everything correct?

I applied also the modified algorithm and we get the same and the only difference is $r_{23}$ and $s_{23}$ at their calculations but their results are the same, or not?

I didn't check everything but the calculations look correct.
Shouldn't we use the given accuracy $\epsilon=5\cdot 10^{-3}$ somewhere though? (Wondering)

Indeed, mathematically the methods are the same, so if we calculate the results exactly, the results must be the same.
But what if we make rounding errors? (Wondering)
 
  • #7
Klaas van Aarsen said:
I didn't check everything but the calculations look correct.
Shouldn't we use the given accuracy $\epsilon=5\cdot 10^{-3}$ somewhere though? (Wondering)

Indeed, mathematically the methods are the same, so if we calculate the results exactly, the results must be the same.
But what if we make rounding errors? (Wondering)

How can we use the given accuracy? It is given to use it in every step. For example how do we apply that for $$q_1=\frac{q_1}{r_1}=\frac{1}{\sqrt{1+2\cdot 10^{-6}}}\begin{pmatrix}1 \\ 10^{-3} \\ 10^{-3}\end{pmatrix}$$ ? (Wondering)
 
  • #8
mathmari said:
How can we use the given accuracy? It is given to use it in every step. For example how do we apply that for $$q_1=\frac{q_1}{r_1}=\frac{1}{\sqrt{1+2\cdot 10^{-6}}}\begin{pmatrix}1 \\ 10^{-3} \\ 10^{-3}\end{pmatrix}$$ ?

We have $1+2\cdot 10^{-6} = 1.000002$.
With an accuracy of $\epsilon=5\cdot 10^{-3} = 0.005$ it must be rounded to either $1.01$ or $1.00$, whichever is closest.
Because then the error is at most a factor $\epsilon$.

So after rounding the result becomes:
$$q_1=\begin{pmatrix}1 \\ 10^{-3} \\ 10^{-3}\end{pmatrix}$$
(Thinking)
 
  • #9
Klaas van Aarsen said:
Induction seems to be the way to go yes.

That is what we still have to prove. I have only shown that it holds for one step of the inner loop where it becomes interesting.
My hint: $\mathbf{\tilde q}_i^{(j)}$ is equal to $\mathbf a_i$ plus a linear combination of $\mathbf q_k$ with $k<j$. (Thinking)

Base Case: For $i=1$ we have $q_i=\tilde{q}_i=a_i$.

Inductive Hypothesis: We suppose that the methods are the same up tp $i-1$, i.e. for all $k<i$ and $j<i$ we have that $r_{jk}=s_{jk}$ and $\mathbf q_k=\mathbf {\tilde q}_k$.

Inductive Step: We want to show that $r_{ji}=s_{ji}$ and $\mathbf q_i=\mathbf {\tilde q}_i$, right?
We have that $$s_{ji}=\tilde{q}_j^T\left (\tilde{q}_i-s_{(j-1)i}\tilde{q}_{j-1}\right )$$ or not? (Wondering)
 
  • #10
Klaas van Aarsen said:
We have $1+2\cdot 10^{-6} = 1.000002$.
With an accuracy of $\epsilon=5\cdot 10^{-3} = 0.005$ it must be rounded to either $1.01$ or $1.00$, whichever is closest.
Because then the error is at most a factor $\epsilon$.

So after rounding the result becomes:
$$q_1=\begin{pmatrix}1 \\ 10^{-3} \\ 10^{-3}\end{pmatrix}$$
(Thinking)

So, do we round every result with two digits after the comma and check so that the difference of the rounded number and the exact number is less than $0.005$ ? (Wondering)
 
  • #11
mathmari said:
Base Case: For $i=1$ we have $q_i=\tilde{q}_i=a_i$.

Inductive Hypothesis: We suppose that the methods are the same up tp $i-1$, i.e. for all $k<i$ and $j<i$ we have that $r_{jk}=s_{jk}$ and $\mathbf q_k=\mathbf {\tilde q}_k$.

Inductive Step: We want to show that $r_{ji}=s_{ji}$ and $\mathbf q_i=\mathbf {\tilde q}_i$, right?
We have that $$s_{ji}=\tilde{q}_j^T\left (\tilde{q}_i-s_{(j-1)i}\tilde{q}_{j-1}\right )$$ or not?

Yes.
We will need a nested inductive proof though.
You have set it up for the outer loop.
We need an inductive proof for the inner loop as well. (Thinking)A complication is that $\mathbf q_i$ is not unique. It gets an initial value, gets changed multiple times in the inner loop, and gets a final value in the outer loop.
That is enough to make any proof for the inner loop confusing unless we somehow distinguish those values.
We might for instance add a superscript to identify which instance we're referring to.
That is, we can let it start at $\mathbf q_i^{(0)}=\mathbf a_i$ before we enter the inner loop.
In each iteration we will then get a new value $\mathbf q_i^{(j)}$ for $j=1$ to $j=i-1$.
And finally we will get $\mathbf q_i=\mathbf q_i^{(i)}$ at the end of the loop. (Nerd)Either way, we need to come up with an induction hypothesis for the inner loop.
We already had our induction hypothesis up to $i-1$.
Now we need additionally something that we assume to be true for $i$ itself and for all $k<j$.
And then prove it for $j$. (Thinking)
 
  • #12
mathmari said:
So, do we round every result with two digits after the comma and check so that the difference of the rounded number and the exact number is less than $0.005$ ?

Yes. (Nod)
 
  • #13
Klaas van Aarsen said:
Yes.
We will need a nested inductive proof though.
You have set it up for the outer loop.
We need an inductive proof for the inner loop as well. (Thinking)A complication is that $\mathbf q_i$ is not unique. It gets an initial value, gets changed multiple times in the inner loop, and gets a final value in the outer loop.
That is enough to make any proof for the inner loop confusing unless we somehow distinguish those values.
We might for instance add a superscript to identify which instance we're referring to.
That is, we can let it start at $\mathbf q_i^{(0)}=\mathbf a_i$ before we enter the inner loop.
In each iteration we will then get a new value $\mathbf q_i^{(j)}$ for $j=1$ to $j=i-1$.
And finally we will get $\mathbf q_i=\mathbf q_i^{(i)}$ at the end of the loop. (Nerd)Either way, we need to come up with an induction hypothesis for the inner loop.
We already had our induction hypothesis up to $i-1$.
Now we need additionally something that we assume to be true for $i$ itself and for all $k<j$.
And then prove it for $j$. (Thinking)

I got stuck right now... Do we have to so an induction for $i$ and a nested induction for $j$ ? (Wondering)
 
  • #14
mathmari said:
I got stuck right now... Do we have to so an induction for $i$ and a nested induction for $j$ ? (Wondering)

Yes.

Suppose that for $k\le j<i$ we have (outer loop):
  • $r_{kj} = s_{kj}$
  • $\mathbf q_j = \mathbf{\tilde q}_j$
  • $\mathbf q_k \cdot \mathbf q_j=\delta_{kj}$
and that for $k<j<i$ we have (inner loop):
  • $r_{ki} = s_{ki}$
  • $\mathbf{\tilde q}_i^{(j-1)} = \mathbf a_i - \sum_{k=1}^{j-2} s_{ki} \mathbf{\tilde q}_k$
Then we need to prove that we have (inner loop):
  • $r_{ji} = s_{ji}$
  • $\mathbf{\tilde q}_i^{(j)} = \mathbf a_i - \sum_{k=1}^{j-1} s_{ki} \mathbf{\tilde q}_k$
and we need to prove that (outer loop):
  • $r_{ii} = s_{ii}$
  • $\mathbf q_i = \mathbf{\tilde q}_i$
So we get that $s_{ji}=\tilde{q}_j^T \tilde{q}_i^{(j-1)} = \tilde{q}_j^T\left (a_i-\sum_{k=1}^{j-2} s_{ki}\tilde{q}_{k}\right )$
(Thinking)
 
  • #15
Klaas van Aarsen said:
Yes.

Suppose that for $k\le j<i$ we have (outer loop):
  • $r_{kj} = s_{kj}$
  • $\mathbf q_j = \mathbf{\tilde q}_j$
  • $\mathbf q_k \cdot \mathbf q_j=\delta_{kj}$
and that for $k<j<i$ we have (inner loop):
  • $r_{ki} = s_{ki}$
  • $\mathbf{\tilde q}_i^{(j-1)} = \mathbf a_i - \sum_{k=1}^{j-2} s_{ki} \mathbf{\tilde q}_k$
Then we need to prove that we have (inner loop):
  • $r_{ji} = s_{ji}$
  • $\mathbf{\tilde q}_i^{(j)} = \mathbf a_i - \sum_{k=1}^{j-1} s_{ki} \mathbf{\tilde q}_k$
and we need to prove that (outer loop):
  • $r_{ii} = s_{ii}$
  • $\mathbf q_i = \mathbf{\tilde q}_i$
So we get that $s_{ji}=\tilde{q}_j^T \tilde{q}_i^{(j-1)} = \tilde{q}_j^T\left (a_i-\sum_{k=1}^{j-2} s_{ki}\tilde{q}_{k}\right )$
(Thinking)

Ah ok! So from $s_{ji}=\tilde{q}_j^T \tilde{q}_i^{(j-1)} = \tilde{q}_j^T\left (a_i-\sum_{k=1}^{j-2} s_{ki}\tilde{q}_{k}\right )$ we get that this must be eqial to $r_{ji}$ and the remaining equalities follow then using that information, or not? (Wondering)
 
  • #16
mathmari said:
Ah ok! So from $s_{ji}=\tilde{q}_j^T \tilde{q}_i^{(j-1)} = \tilde{q}_j^T\left (a_i-\sum_{k=1}^{j-2} s_{ki}\tilde{q}_{k}\right )$ we get that this must be eqial to $r_{ji}$ and the remaining equalities follow then using that information, or not? (Wondering)

Yep. (Nod)
 
  • #17
Klaas van Aarsen said:
Yep. (Nod)

To see if I have understood that correctly:

Let's consider the inner for-loop. We suppose that the respective commands are equal, i.e. $r_{ki}=s_{ki}$ for all $k<j$ and $q_i=\tilde{q}_i$.
Then we want to prove that these equalities hold also at the $j$-th step.

Is this correct so far? (Wondering)
 
  • #18
mathmari said:
To see if I have understood that correctly:

Let's consider the inner for-loop. We suppose that the respective commands are equal, i.e. $r_{ki}=s_{ki}$ for all $k<j$ and $q_i=\tilde{q}_i$.
Then we want to prove that these equalities hold also at the $j$-th step.

Is this correct so far?

Yes. (Star)
 
  • #19
Klaas van Aarsen said:
Yes. (Star)

We have that $$s_{ji}=\tilde{q}_j^T \tilde{q}_i^{(j-1)} = \tilde{q}_j^T\left (a_i-\sum_{k=1}^{j-2} s_{ki}\tilde{q}_{k}\right )= \tilde{q}_j^Ta_i-\sum_{k=1}^{j-2} s_{ki}\tilde{q}_j^T\tilde{q}_{k}= \tilde{q}_j^Ta_i-\sum_{k=1}^{j-2} s_{ki}\delta_{kj}$$ and since $k\neq j$ we get $$s_{ji}= \tilde{q}_j^Ta_i-\sum_{k=1}^{j-2} s_{ki}\cdot 0=\tilde{q}_j^Ta_i=r_{ji}$$ Is this correct? (Wondering)

For the other relation of the inner for-loop:
$$\tilde{q}_i^{(j)}=\tilde{q}_i^{(j-1)}-s_{ji}\tilde{q}_j=\mathbf a_i - \sum_{k=1}^{j-2} s_{ki} \mathbf{\tilde q}_k-s_{ji}\tilde{q}_j$$ Is this correct so far or should the index of $s_{ji}\tilde{q}_j$ be $j-1$ instead of $j$ ? (Wondering)
 
  • #20
Looks good.

And I think I made a mistake. I think it should be $\tilde{q}_i^{(j-1)} = a_i-\sum_{k=1}^{j-1} s_{ki}\tilde{q}_{k}$. (Blush)
 
  • #21
Klaas van Aarsen said:
Looks good.

And I think I made a mistake. I think it should be $\tilde{q}_i^{(j-1)} = a_i-\sum_{k=1}^{j-1} s_{ki}\tilde{q}_{k}$. (Blush)

So, we have the following:

We have that $$s_{ji}=\tilde{q}_j^T \tilde{q}_i^{(j-1)} = \tilde{q}_j^T\left (a_i-\sum_{k=1}^{j-1} s_{ki}\tilde{q}_{k}\right )= \tilde{q}_j^Ta_i-\sum_{k=1}^{j-1} s_{ki}\tilde{q}_j^T\tilde{q}_{k}= \tilde{q}_j^Ta_i-\sum_{k=1}^{j-1} s_{ki}\delta_{kj}$$ and since $k\neq j$ we get $$s_{ji}= \tilde{q}_j^Ta_i-\sum_{k=1}^{j-1} s_{ki}\cdot 0=\tilde{q}_j^Ta_i=r_{ji}$$
For the other relation of the inner for-loop:
$$\tilde{q}_i^{(j)}=\tilde{q}_i^{(j-1)}-s_{ji}\tilde{q}_j=\mathbf a_i - \sum_{k=1}^{j-1} s_{ki} \mathbf{\tilde q}_k-s_{ji}\tilde{q}_j=\mathbf a_i - \sum_{k=1}^{j} s_{ki} \mathbf{\tilde q}_k$$

So the induction fo rthe inner for-loop is done. For the outer for-loop:
To show that $r_{ii}=s_{ii}$ we have to show that $q_i=\tilde{q}_i$, or not? (Wondering)
 
  • #22
mathmari said:
So the induction fo rthe inner for-loop is done.

For the outer for-loop:
To show that $r_{ii}=s_{ii}$ we have to show that $q_i=\tilde{q}_i$, or not?

Good, and yes. (Nod)
 
  • #23
Klaas van Aarsen said:
Good, and yes. (Nod)

This holds after the inner loop, so we don't have to anything further for the outer for-loop, do we? (Wondering)
 
  • #24
mathmari said:
This holds after the inner loop, so we don't have to anything further for the outer for-loop, do we? (Wondering)

Indeed. (Malthe)
 

FAQ: Two versions of Gram Schmidt orthogonalization method

What is the Gram Schmidt orthogonalization method?

The Gram Schmidt orthogonalization method is a mathematical algorithm used to transform a set of linearly independent vectors into a set of orthogonal vectors. It is commonly used in linear algebra and numerical analysis to simplify calculations and solve complex problems.

How does the Gram Schmidt orthogonalization method work?

The method involves a series of steps in which each vector in the set is projected onto the orthogonal complement of the previous vectors. This results in a sequence of orthogonal vectors that are also linearly independent.

What are the benefits of using the Gram Schmidt orthogonalization method?

One of the main benefits of this method is that it allows for the simplification of calculations involving linearly independent vectors. It also helps in solving systems of linear equations and can be used in various applications such as signal processing and data analysis.

Are there any limitations to the Gram Schmidt orthogonalization method?

One limitation of this method is that it may lead to numerical instability in certain cases. This can be avoided by using a modified version of the algorithm known as the Modified Gram Schmidt method. Additionally, the method may not be suitable for very large sets of vectors as it can become computationally intensive.

How does the Gram Schmidt orthogonalization method compare to other orthogonalization methods?

There are several other methods for orthogonalizing vectors, such as Householder transformations and Givens rotations. The Gram Schmidt method is generally easier to understand and implement, but may not always produce the most accurate results. The choice of method depends on the specific problem and the desired level of accuracy.

Back
Top