Orthogonal matrix construction

In summary: SVD.> Apart from this, it is not clear for me what is the geometrical interpretation of the generated random vectors? Which degrees of freedom are we exploiting by defining the vectors in that way?The "geometrical interpretation" is not quite what I'd call it. "A random matrix" isn't a particularly geometric interpretation, but if I were to try to describe it that way, the orthogonal transformation is breaking up the tall skinny matrix into a basis for its range space, and that's exactly what you are doing here. It is a basis for the range space, and you have chosen it to be orthogonal.It's hard to give an intuitive explanation. One could make one up
  • #1
p4wp4w
8
5
Given a real-valued matrix ## \bar{B}_2=\begin{bmatrix}
\bar{B}_{21}\\
\bar{B}_{22}
\end{bmatrix}\in{R^{p \times m}}
##, I am looking for an orthogonal transformation matrix i.e., ##T^{-1}=T^T\in{R^{p \times p}}## that satisfies:
$$
\begin{bmatrix}
T_{11}^T & T_{21}^T\\
T_{12}^T & T_{22}^T
\end{bmatrix}\bar{B}_2=
\begin{bmatrix}
0\\
B_{2}
\end{bmatrix},
$$ where ##B_2\in{R^{m \times m}}## and non-singular. Assuming ##T=
\begin{bmatrix}
T_{11} & T_{12}\\
T_{21} & T_{22}
\end{bmatrix}##, and replacing the equation above, I can easily see ##T^{-1}## as
$$
\begin{bmatrix}
(T_{11}-T_{12}T_{22}^{-1}T_{21})^{-1} & -T_{11}^{-1}T_{12}(T_{22}-T_{21}T_{11}^{-1}T_{12})^{-1}\\
-T_{22}^{-1}T_{21}(T_{11}-T_{12}T_{22}^{-1}T_{21})^{-1} & (T_{22}-T_{21}T_{11}^{-1}T_{12})^{-1}
\end{bmatrix}=
\begin{bmatrix}
T_{11}^T & -T_{11}^T\bar{B}_{21}\bar{B}_{22}^{-1}\\
T_{12}^T & B_2\bar{B}_{22}^{-1}-T_{12}^T\bar{B}_{21}\bar{B}_{22}^{-1}
\end{bmatrix}.
$$ The literature says that there exists a transformation satisfying the last matrix equality. Accordingly, I wonder how can I synthesize ##T##. There should be a systematical method of obtaing ##T##, otherwise, I may carry out an exhaustive search.

As a background, in system theory, I am looking for a transformation that decomposes the system into the so-called regular form and ##\bar{B}_2## is a part of input matrix that defines the range space and causes the invariant RHS zeros.
 
Physics news on Phys.org
  • #2
edit: had some ideas on polar decomposition but they ended up solving something similar but not quite what you were looking for.

round 2:

this is actually fairly straight forward.

##
\bar{B}_2## is p x m where ##p\gt m## (i.e. tall and skinny matrix) and its rank is ##p##. This means there are ##p-m## non-zero vectors in its left nullspace -- grab them. Now generate ##m## linearly independent vectors via random number generator over [-1,1], run gramm schmidt and you have your answer.
- - - -
An equivalent, more thorough, and faster process, would be to append ##p-m## randomly generated column vectors (and these will be linearly independent -- in abstraction: with probability 1, in practice with floats, absurdly close to probability 1) such that you now have the augmented square matrix

##C :=
\bigg[\begin{array}{c|c|c|c|c} \bar{B}_2 & \mathbf v_{m+1} &\cdots & \mathbf v_{p-1} &\mathbf v_{p}
\end{array}\bigg]##

run QR factorization,

##C = Q_C R =
\bigg[\begin{array}{c|c|c|c|c} Q_{\bar{B}_2} & \mathbf q_{m+1} &\cdots & \mathbf q_{p-1} &\mathbf q_{p}
\end{array}\bigg]R
##

permute the columns of ##Q## such that the vectors associated with the ones you generated at random are now to the left most

##Q_C P =
\bigg[\begin{array}{c|c|c|c|c} \mathbf q_{m+1} &\cdots & \mathbf q_{p-1} &\mathbf q_{p} &Q_{\bar{B}_2}
\end{array}\bigg] ##

where ## P## is a permutation matrix. You then find that

##T^{-1} = T^T = \big(Q_C P\big)^T##.

That should do it. I.e. if you work it through, you'll see

##
T^{-1} \bar{B}_2 = \big(Q_C P\big)^T \bar{B}_2 =
\bigg[\begin{array}{c|c|c|c|c} \mathbf q_{m+1} &\cdots & \mathbf q_{p-1} &\mathbf q_{p} &Q_{\bar{B}_2}
\end{array}\bigg]^T \bar{B}_2 =
\begin{bmatrix}
{ \mathbf q_{m+1}}^T \\
\vdots\\
{ \mathbf q}_{p}^T \\
Q_{\bar{B}_2}^T
\end{bmatrix} \bar{B}_2 =
\begin{bmatrix}
\mathbf 0_p \mathbf 0_m^T\\
R_{m}
\end{bmatrix}##

where ##R_m## is the top left m x m principal submatrix of ##R##, i.e. ##\bar{B}_2 = Q_{\bar{B}_2} R_{m}##.

If for some reason you don't like having the upper triangular matrix on the right hand side, you could further left multiply each side by

##\begin{bmatrix}
S_p^T & \mathbf 0_p \mathbf 0_m^T\\
\mathbf 0_m \mathbf 0_p^T & U_m^T
\end{bmatrix}##

where ##S_p## is a p x p orthogonal matrix and ##U_m## is an m x m orthogonal matrix, each of your choosing.
 
Last edited:
  • #3
I am a bit confused with your notation. Since you have used ##Q_{\bar{B}_2}## which comes out of the QR factorization as ##\bar{B}_2=Q_{\bar{B}_2}R_{m}##, and ##Q_C## in your formulation is defined as ##C=[\bar{B}_2|\textbf{v}_i|_{i=m+1:p}]=Q_CR##, shouldn't ##Q_C=Q_{\bar{B}_2}##? In that case, ##\textbf{q}_i|_{i=m+1:p}=\emptyset##? See this Example for ##p=2, m=1##:
$$\bar{B}_2=
\begin{bmatrix}
-4.072 \\
10.3
\end{bmatrix},$$
setting ##\textbf{v}_i=
\begin{bmatrix}
0.547 \\
0.957
\end{bmatrix},## hand the orthogonal $$Q_{\bar{B}_2}=Q_C=
\begin{bmatrix}
-0.367 & 0.93 \\
0.93 & 0.367
\end{bmatrix},$$ for upper triangle ##R_m=
\begin{bmatrix}
11.0767 \\
0
\end{bmatrix},## and ##R=
\begin{bmatrix}
11.0767 & 0.6894 \\
0 & 0.8606
\end{bmatrix}, ## respectively.
Apart from this, it is not clear for me what is the geometrical interpretation of the generated random vectors? Which degrees of freedom are we exploiting by defining the vectors in that way?
 
  • #4
p4wp4w said:
I am a bit confused with your notation. Since you have used ##Q_{\bar{B}_2}## which comes out of the QR factorization as ##\bar{B}_2=Q_{\bar{B}_2}R_{m}##, and ##Q_C## in your formulation is defined as ##C=[\bar{B}_2|\textbf{v}_i|_{i=m+1:p}]=Q_CR##, shouldn't ##Q_C=Q_{\bar{B}_2}##?

No. I thought I showed it quite clearly here:
##
C :=
\bigg[\begin{array}{c|c|c|c|c} \bar{B}_2 & \mathbf v_{m+1} &\cdots & \mathbf v_{p-1} &\mathbf v_{p}
\end{array}\bigg]##

That is, ##\bar{B}_2## is tall and skinny, while ##C## is square. Then run QR factorization on C:

##
C = Q_C R =
\bigg[\begin{array}{c|c|c|c|c} Q_{\bar{B}_2} & \mathbf q_{m+1} &\cdots & \mathbf q_{p-1} &\mathbf q_{p}
\end{array}\bigg]R##

If you really understand gramm schmidt and QR factorization, it should be clear the ##Q_C## is QR factorization on ##C## and I took advantage of overlapping subproblems here. Looking back, the issue relates to naming.

To be clear, if you work through the math, there are basically two ways to decompose ##A = QR## for some tall skinny ##A## (with full column rank, as done in day to day least squares problems). One has square ##Q## and tall skinny ##R## -- apparently wikipedia likes this version. It seems that this is how Scipy does the QR factorization by default.

The other approach is tall skinny ##Q## and square ##R##. This is how Numpy does the QR factorization by default. I believe I made it clear in my post that I was using the square ##R## case, e.g. when I stated ##R_m## was a m x m prinicipal submatrix, and when I said ##\bar{B}_2 = Q_{\bar{B}_2} R_{m}##, among other places. According to wikipedia it seems that some people call this the "thin QR factorization", but not only is a square R preferable, the second approach directly maps mathematically to running gramm schmidt on your matrix -- but the the first approach does not. Hence I used the second approach.

To be honest, I've never really used the first form of QR factorization. If you actually mathematically derive QR factorization via gramm schmidt on your tall skinny matrix, you get the "thin" Q, and the square R naturally results from book-keeping. If you are using QR factorization for least squares on an overdetermined system, which is overwhelmingly what it is used for, then you want the "thin" version.

If you take the "non-thin" case (i.e the first one), you have to answer the question: where did the final columns of fat ##Q## come from? You either extend the number of linearly independent number column in ##\bar{B}_2## (most likely via random number generation) and run gramm schdmit, or mumble about householder reflections or something. I have no qualms about using said reflections for final implementation (there are some even more powerful numerical tools being used) but gramm schmidt is really the right way to think about this mathematically-- this ties in with your final question about the underlying geometry.

Your problem was:
1) come up with a matrix who's first ##p-m## orthonormal columns-- which you then transpose and multiply on the left-- annihilate the columns of ##\bar{B}_2##.

2.) Come up with ##m## columns that are mutually orthonormal and complete the basis in ##\mathbb R^p##.

That's it. A careful consideration of the what length one columns are orthogonal to the columns of ##\bar{B}_2## and what orthonormal columns are in the span of the columns of ##\bar{B}_2## gives the answer. And mentally I use gramm schmidt while I do this. If you don't use the "thin" version of QR factorization, then, in my view, you lose the geometric interpretation.
- - - -

p4wp4w said:
See this Example for ##p=2, m=1##:
$$\bar{B}_2=
\begin{bmatrix}
-4.072 \\
10.3
\end{bmatrix},$$
setting ##\textbf{v}_i=
\begin{bmatrix}
0.547 \\
0.957
\end{bmatrix},## hand the orthogonal $$Q_{\bar{B}_2}=Q_C=
\begin{bmatrix}
-0.367 & 0.93 \\
0.93 & 0.367
\end{bmatrix},$$ for upper triangle ##R_m=
\begin{bmatrix}
11.0767 \\
0
\end{bmatrix},## and ##R=
\begin{bmatrix}
11.0767 & 0.6894 \\
0 & 0.8606
\end{bmatrix}, ## respectively.

note: in this example, if you use the permutation matrix ##
P = \begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}##,

Then:

##\big(Q_C P\big)^T \bar{B}_2 =
\begin{bmatrix}
0 \\
11.07570242
\end{bmatrix}##

This is the result you are looking for.

and note:

##Q_C =
\begin{bmatrix}
-0.367 & 0.93 \\
0.93 & 0.367
\end{bmatrix}
\neq
\begin{bmatrix}
-0.367 \\
0.93
\end{bmatrix} = Q_{\bar{B}_2}
##

also note

p4wp4w said:
##R_m=
\begin{bmatrix}
11.0767 \\
0
\end{bmatrix}##

This is not how I did it. In fact

##R_m=
\begin{bmatrix}
11.0767
\end{bmatrix}##

In my earlier post I said

"where ##R_m## is the top left m x m principal submatrix" hence it must be square if you follow the solution I provided.

- - - -

Final implementation details are your privilege of course. I tried hard to come up with a solution that is both computationally efficient and close to how I think about the underlying geometry.

I suppose I should have said I was using the "thin QR factorization" though I didn't realize anyone calls it that -- its just the regular QR factorization I've used a thousand times in numpy, mostly for least squares problems. (I was surprised to learn that scipy by default uses the other one though.) It's also the one I know a clean derivation for.
 
Last edited:

1. What is an orthogonal matrix?

An orthogonal matrix is a square matrix in which all the columns and rows are orthonormal, meaning that they are perpendicular to each other and have a length of 1. This property makes orthogonal matrices useful in many mathematical and scientific applications, such as in linear algebra and geometry.

2. How is an orthogonal matrix constructed?

An orthogonal matrix can be constructed in several ways, such as by using the Gram-Schmidt process, the Householder transformation, or the Givens rotation. These methods involve manipulating the columns or rows of a given matrix to create a new matrix that satisfies the orthonormal condition.

3. What are the benefits of using orthogonal matrices?

Orthogonal matrices have many benefits, including their ability to preserve the length and angles of vectors when multiplied by the matrix. This property makes them useful in solving linear equations, performing geometric transformations, and in data compression techniques such as Principal Component Analysis.

4. Can any matrix be transformed into an orthogonal matrix?

No, not all matrices can be transformed into orthogonal matrices. Only square matrices with linearly independent columns or rows can be transformed into orthogonal matrices. Additionally, non-square matrices can be transformed into matrices with orthogonal columns or rows, known as semi-orthogonal matrices.

5. In what fields are orthogonal matrices commonly used?

Orthogonal matrices have many applications in mathematics, engineering, and physics. They are used in image processing, signal processing, robotics, quantum mechanics, and many other fields. They are also widely used in computer graphics for 3D rotations and transformations.

Similar threads

  • Linear and Abstract Algebra
Replies
4
Views
975
  • Linear and Abstract Algebra
Replies
12
Views
2K
  • Linear and Abstract Algebra
Replies
10
Views
145
  • Linear and Abstract Algebra
Replies
5
Views
1K
  • Introductory Physics Homework Help
Replies
11
Views
311
Replies
27
Views
1K
Replies
24
Views
1K
  • Introductory Physics Homework Help
Replies
8
Views
1K
  • Linear and Abstract Algebra
Replies
8
Views
796
Replies
34
Views
2K
Back
Top