Orthogonal matrix construction

p4wp4w · Feb 13, 2018

Given a real-valued matrix ## \bar{B}_2=\begin{bmatrix}
\bar{B}_{21}\\
\bar{B}_{22}
\end{bmatrix}\in{R^{p \times m}}
##, I am looking for an orthogonal transformation matrix i.e., ##T^{-1}=T^T\in{R^{p \times p}}## that satisfies:
$$
\begin{bmatrix}
T_{11}^T & T_{21}^T\\
T_{12}^T & T_{22}^T
\end{bmatrix}\bar{B}_2=
\begin{bmatrix}
0\\
B_{2}
\end{bmatrix},
$$ where ##B_2\in{R^{m \times m}}## and non-singular. Assuming ##T=
\begin{bmatrix}
T_{11} & T_{12}\\
T_{21} & T_{22}
\end{bmatrix}##, and replacing the equation above, I can easily see ##T^{-1}## as
$$
\begin{bmatrix}
(T_{11}-T_{12}T_{22}^{-1}T_{21})^{-1} & -T_{11}^{-1}T_{12}(T_{22}-T_{21}T_{11}^{-1}T_{12})^{-1}\\
-T_{22}^{-1}T_{21}(T_{11}-T_{12}T_{22}^{-1}T_{21})^{-1} & (T_{22}-T_{21}T_{11}^{-1}T_{12})^{-1}
\end{bmatrix}=
\begin{bmatrix}
T_{11}^T & -T_{11}^T\bar{B}_{21}\bar{B}_{22}^{-1}\\
T_{12}^T & B_2\bar{B}_{22}^{-1}-T_{12}^T\bar{B}_{21}\bar{B}_{22}^{-1}
\end{bmatrix}.
$$ The literature says that there exists a transformation satisfying the last matrix equality. Accordingly, I wonder how can I synthesize ##T##. There should be a systematical method of obtaing ##T##, otherwise, I may carry out an exhaustive search.

As a background, in system theory, I am looking for a transformation that decomposes the system into the so-called regular form and ##\bar{B}_2## is a part of input matrix that defines the range space and causes the invariant RHS zeros.

StoneTemplePython · Feb 13, 2018

edit: had some ideas on polar decomposition but they ended up solving something similar but not quite what you were looking for.

round 2:

this is actually fairly straight forward.

##
\bar{B}_2## is p x m where ##p\gt m## (i.e. tall and skinny matrix) and its rank is ##p##. This means there are ##p-m## non-zero vectors in its left nullspace -- grab them. Now generate ##m## linearly independent vectors via random number generator over [-1,1], run gramm schmidt and you have your answer.
- - - -
An equivalent, more thorough, and faster process, would be to append ##p-m## randomly generated column vectors (and these will be linearly independent -- in abstraction: with probability 1, in practice with floats, absurdly close to probability 1) such that you now have the augmented square matrix

##C :=
\bigg[\begin{array}{c|c|c|c|c} \bar{B}_2 & \mathbf v_{m+1} &\cdots & \mathbf v_{p-1} &\mathbf v_{p}
\end{array}\bigg]##

run QR factorization,

##C = Q_C R =
\bigg[\begin{array}{c|c|c|c|c} Q_{\bar{B}_2} & \mathbf q_{m+1} &\cdots & \mathbf q_{p-1} &\mathbf q_{p}
\end{array}\bigg]R
##

permute the columns of ##Q## such that the vectors associated with the ones you generated at random are now to the left most

##Q_C P =
\bigg[\begin{array}{c|c|c|c|c} \mathbf q_{m+1} &\cdots & \mathbf q_{p-1} &\mathbf q_{p} &Q_{\bar{B}_2}
\end{array}\bigg] ##

where ## P## is a permutation matrix. You then find that

##T^{-1} = T^T = \big(Q_C P\big)^T##.

That should do it. I.e. if you work it through, you'll see

##
T^{-1} \bar{B}_2 = \big(Q_C P\big)^T \bar{B}_2 =
\bigg[\begin{array}{c|c|c|c|c} \mathbf q_{m+1} &\cdots & \mathbf q_{p-1} &\mathbf q_{p} &Q_{\bar{B}_2}
\end{array}\bigg]^T \bar{B}_2 =
\begin{bmatrix}
{ \mathbf q_{m+1}}^T \\
\vdots\\
{ \mathbf q}_{p}^T \\
Q_{\bar{B}_2}^T
\end{bmatrix} \bar{B}_2 =
\begin{bmatrix}
\mathbf 0_p \mathbf 0_m^T\\
R_{m}
\end{bmatrix}##

where ##R_m## is the top left m x m principal submatrix of ##R##, i.e. ##\bar{B}_2 = Q_{\bar{B}_2} R_{m}##.

If for some reason you don't like having the upper triangular matrix on the right hand side, you could further left multiply each side by

##\begin{bmatrix}
S_p^T & \mathbf 0_p \mathbf 0_m^T\\
\mathbf 0_m \mathbf 0_p^T & U_m^T
\end{bmatrix}##

where ##S_p## is a p x p orthogonal matrix and ##U_m## is an m x m orthogonal matrix, each of your choosing.

p4wp4w · Feb 14, 2018

I am a bit confused with your notation. Since you have used ##Q_{\bar{B}_2}## which comes out of the QR factorization as ##\bar{B}_2=Q_{\bar{B}_2}R_{m}##, and ##Q_C## in your formulation is defined as ##C=[\bar{B}_2|\textbf{v}_i|_{i=m+1:p}]=Q_CR##, shouldn't ##Q_C=Q_{\bar{B}_2}##? In that case, ##\textbf{q}_i|_{i=m+1:p}=\emptyset##? See this Example for ##p=2, m=1##:
$$\bar{B}_2=
\begin{bmatrix}
-4.072 \\
10.3
\end{bmatrix},$$
setting ##\textbf{v}_i=
\begin{bmatrix}
0.547 \\
0.957
\end{bmatrix},## hand the orthogonal $$Q_{\bar{B}_2}=Q_C=
\begin{bmatrix}
-0.367 & 0.93 \\
0.93 & 0.367
\end{bmatrix},$$ for upper triangle ##R_m=
\begin{bmatrix}
11.0767 \\
0
\end{bmatrix},## and ##R=
\begin{bmatrix}
11.0767 & 0.6894 \\
0 & 0.8606
\end{bmatrix}, ## respectively.
Apart from this, it is not clear for me what is the geometrical interpretation of the generated random vectors? Which degrees of freedom are we exploiting by defining the vectors in that way?

StoneTemplePython · Feb 14, 2018

p4wp4w said:

I am a bit confused with your notation. Since you have used ##Q_{\bar{B}_2}## which comes out of the QR factorization as ##\bar{B}_2=Q_{\bar{B}_2}R_{m}##, and ##Q_C## in your formulation is defined as ##C=[\bar{B}_2|\textbf{v}_i|_{i=m+1:p}]=Q_CR##, shouldn't ##Q_C=Q_{\bar{B}_2}##?

No. I thought I showed it quite clearly here:
##
C :=
\bigg[\begin{array}{c|c|c|c|c} \bar{B}_2 & \mathbf v_{m+1} &\cdots & \mathbf v_{p-1} &\mathbf v_{p}
\end{array}\bigg]##

That is, ##\bar{B}_2## is tall and skinny, while ##C## is square. Then run QR factorization on C:

##
C = Q_C R =
\bigg[\begin{array}{c|c|c|c|c} Q_{\bar{B}_2} & \mathbf q_{m+1} &\cdots & \mathbf q_{p-1} &\mathbf q_{p}
\end{array}\bigg]R##

If you really understand gramm schmidt and QR factorization, it should be clear the ##Q_C## is QR factorization on ##C## and I took advantage of overlapping subproblems here. Looking back, the issue relates to naming.

To be clear, if you work through the math, there are basically two ways to decompose ##A = QR## for some tall skinny ##A## (with full column rank, as done in day to day least squares problems). One has square ##Q## and tall skinny ##R## -- apparently wikipedia likes this version. It seems that this is how Scipy does the QR factorization by default.

The other approach is tall skinny ##Q## and square ##R##. This is how Numpy does the QR factorization by default. I believe I made it clear in my post that I was using the square ##R## case, e.g. when I stated ##R_m## was a m x m prinicipal submatrix, and when I said ##\bar{B}_2 = Q_{\bar{B}_2} R_{m}##, among other places. According to wikipedia it seems that some people call this the "thin QR factorization", but not only is a square R preferable, the second approach directly maps mathematically to running gramm schmidt on your matrix -- but the the first approach does not. Hence I used the second approach.

To be honest, I've never really used the first form of QR factorization. If you actually mathematically derive QR factorization via gramm schmidt on your tall skinny matrix, you get the "thin" Q, and the square R naturally results from book-keeping. If you are using QR factorization for least squares on an overdetermined system, which is overwhelmingly what it is used for, then you want the "thin" version.

If you take the "non-thin" case (i.e the first one), you have to answer the question: where did the final columns of fat ##Q## come from? You either extend the number of linearly independent number column in ##\bar{B}_2## (most likely via random number generation) and run gramm schdmit, or mumble about householder reflections or something. I have no qualms about using said reflections for final implementation (there are some even more powerful numerical tools being used) but gramm schmidt is really the right way to think about this mathematically-- this ties in with your final question about the underlying geometry.

Your problem was:
1) come up with a matrix who's first ##p-m## orthonormal columns-- which you then transpose and multiply on the left-- annihilate the columns of ##\bar{B}_2##.

2.) Come up with ##m## columns that are mutually orthonormal and complete the basis in ##\mathbb R^p##.

That's it. A careful consideration of the what length one columns are orthogonal to the columns of ##\bar{B}_2## and what orthonormal columns are in the span of the columns of ##\bar{B}_2## gives the answer. And mentally I use gramm schmidt while I do this. If you don't use the "thin" version of QR factorization, then, in my view, you lose the geometric interpretation.
- - - -

p4wp4w said:

See this Example for ##p=2, m=1##:
$$\bar{B}_2=
\begin{bmatrix}
-4.072 \\
10.3
\end{bmatrix},$$
setting ##\textbf{v}_i=
\begin{bmatrix}
0.547 \\
0.957
\end{bmatrix},## hand the orthogonal $$Q_{\bar{B}_2}=Q_C=
\begin{bmatrix}
-0.367 & 0.93 \\
0.93 & 0.367
\end{bmatrix},$$ for upper triangle ##R_m=
\begin{bmatrix}
11.0767 \\
0
\end{bmatrix},## and ##R=
\begin{bmatrix}
11.0767 & 0.6894 \\
0 & 0.8606
\end{bmatrix}, ## respectively.

note: in this example, if you use the permutation matrix ##
P = \begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}##,

Then:

##\big(Q_C P\big)^T \bar{B}_2 =
\begin{bmatrix}
0 \\
11.07570242
\end{bmatrix}##

This is the result you are looking for.

and note:

##Q_C =
\begin{bmatrix}
-0.367 & 0.93 \\
0.93 & 0.367
\end{bmatrix}
\neq
\begin{bmatrix}
-0.367 \\
0.93
\end{bmatrix} = Q_{\bar{B}_2}
##

also note

p4wp4w said:

##R_m=
\begin{bmatrix}
11.0767 \\
0
\end{bmatrix}##

This is not how I did it. In fact

##R_m=
\begin{bmatrix}
11.0767
\end{bmatrix}##

In my earlier post I said

"where ##R_m## is the top left m x m principal submatrix" hence it must be square if you follow the solution I provided.

- - - -

Final implementation details are your privilege of course. I tried hard to come up with a solution that is both computationally efficient and close to how I think about the underlying geometry.

I suppose I should have said I was using the "thin QR factorization" though I didn't realize anyone calls it that -- its just the regular QR factorization I've used a thousand times in numpy, mostly for least squares problems. (I was surprised to learn that scipy by default uses the other one though.) It's also the one I know a clean derivation for.

Orthogonal matrix construction

1. What is an orthogonal matrix?

2. How is an orthogonal matrix constructed?

3. What are the benefits of using orthogonal matrices?

4. Can any matrix be transformed into an orthogonal matrix?

5. In what fields are orthogonal matrices commonly used?

Similar threads

Hot Threads

Recent Insights