What are the properties of orthogonal matrices?

In summary: Ov| = |\lambda v| = |\lambda||v|.This is the quick way to prove (b) (this uses a result proven in a different thread).
  • #1
Jameson
Gold Member
MHB
4,541
13
Problem: Let $O$ be an $n \times n$ orthogonal real matrix, i.e. $O^TO=I_n$. Prove that:

a) Any entry in $O$ is between -1 and 1.
b) If $\lambda$ is an eigenvalue of $O$ then $|\lambda|=1$
c) $\text{det O}=1 \text{ or }-1$

Solution: I want to preface this with that although this is a 3-part question and our rules state we should only ask one question at a time, I believe that all parts use the same concepts so it's more efficient to put them together in one thread. If that proves to be untrue, then I'll gladly split the thread.

Right now I see the solution for part (c): It uses the fact that $\text{det AB}=\text{det A }\text{det B}$.

$1= \text{det } I_n=\text{det }OO^T=\text{det }O\text{ det }O^T=\text{det }O\text{ det }O=(\text{det }O)^2$, thus $\text{det }O$ must be 1 or -1.

Any ideas on (a) and (b)?
 
Physics news on Phys.org
  • #2
Jameson said:
Problem: Let $O$ be an $n \times n$ orthogonal real matrix, i.e. $O^TO=I_n$. Prove that:

a) Any entry in $O$ is between -1 and 1.
b) If $\lambda$ is an eigenvalue of $O$ then $|\lambda|=1$
c) $\text{det O}=1 \text{ or }-1$

Solution: I want to preface this with that although this is a 3-part question and our rules state we should only ask one question at a time, I believe that all parts use the same concepts so it's more efficient to put them together in one thread. If that proves to be untrue, then I'll gladly split the thread.

Right now I see the solution for part (c): It uses the fact that $\text{det AB}=\text{det A }\text{det B}$.

$1= \text{det } I_n=\text{det }OO^T=\text{det }O\text{ det }O^T=\text{det }O\text{ det }O=(\text{det }O)^2$, thus $\text{det }O$ must be 1 or -1.

Any ideas on (a) and (b)?

Hi Jameson!

What can you use?
Can you use that the length of each column vector in an orthogonal matrix is 1?
Can you use that an orthogonal matrix does not change the length of a vector?
These are basic properties of an orthogonal matrix.
 
  • #3
Alternatively, suppose $v$ is the eigenvector corresponding to the eigenvalue $\lambda$.
What can you say about $Ov$, $(Ov)^T$, and their product?
 
  • #4
Let \(\displaystyle v\) be an eigenvector for \(\displaystyle O\) with eigenvalue \(\displaystyle \lambda\).

Then:

\(\displaystyle |v| = |Ov| = |\lambda v| = |\lambda||v|\).

This is the quick way to prove (b) (this uses a result proven in a different thread).

To use the result that if \(\displaystyle O_j\) is a column vector of \(\displaystyle O, |O_j| = 1\), we ought to prove this first.

But, since:

\(\displaystyle |O_j| = \sqrt{\langle O_j,O_j\rangle} = \sqrt{(O_j)^TO_j}\)

and \(\displaystyle (O_j)^TO_j = (I_{jj}) = 1\) (here I mean the j,j-th coordinate of the matrix \(\displaystyle O^TO = I\)), clearly \(\displaystyle |O_j| = \sqrt{1} = 1\).

If \(\displaystyle u_i\) is the i-th coordinate of \(\displaystyle O_j = (u_{1j},u_{2j}\dots,u_{nj})\), this means that:

\(\displaystyle (u_i)^2 = (u_{ij})^2 \leq (u_{1j})^2 + (u_{2j})^2 + \cdots + (u_{nj})^2 = 1\)

from which it follows that \(\displaystyle |u_i| = \sqrt{(u_i)^2} \leq 1\).
 
Last edited:
  • #5
I like Serena said:
Hi Jameson!

What can you use?
Can you use that the length of each column vector in an orthogonal matrix is 1?
Can you use that an orthogonal matrix does not change the length of a vector?
These are basic properties of an orthogonal matrix.

Hi I like Serena! I'm not sure what properties I can use to be honest. This class is more applied and computationally based than theoretically based, so he hasn't taken much time to discuss this topic in detail. :(

I've been reading through the Wikipedia article on orthogonal matrices just as you said the rows and columns must be orthonormal. If it's part of the definition then surely I can use that property. I'm not sure how to generalize this properly for a proof but I'll try.

Let matrix $A$ be an $n \times n$ orthogonal real matrix. Let us also think of $A$ as comprising of column vectors $[a_1, a_2, ...a_n]$ where each $a_i$ is in $\mathbb{R}^n$. Assume that at least one entry in a row/column is greater than 1.

That should violate either $(a_i) \cdot (a_i)^T=0$ or $\sqrt{a_1 \cdot a_1}=1$ but I can't quite see how to get there.

How is this set up so far?
I like Serena said:
Alternatively, suppose $v$ is the eigenvector corresponding to the eigenvalue $\lambda$.
What can you say about $Ov$, $(Ov)^T$, and their product?

In this setup it follows by definition of an eigenvector that $Ov=\lambda v$, correct? I want to make sure I'm not mixing something up first.

Assuming that is true, then $(Ov)(Ov)^T=(Ov)(v^TO^T)=O(vv^T)O^T=$?
 
  • #6
Deveno said:
\(\displaystyle |O_j| = \sqrt{\langle O_j,O_j\rangle} = \sqrt{(O_j)^TO_j}\)

and \(\displaystyle (O_j)^TO_j = (I_{jj}) = 1\) (here I mean the j,j-th coordinate of the matrix \(\displaystyle O^TO = I\)), clearly \(\displaystyle |O_j| = \sqrt{1} = 1\).

This is all makes sense and follows from the definitions of the inner product space and an orthogonal matrix. (Yes)

If \(\displaystyle u_i\) is the i-th coordinate of \(\displaystyle O_j = (u_{1j},u_{2j}\dots,u_{nj})\), this means that:

\(\displaystyle (u_i)^2 = (u_{ij})^2 \leq (u_{1j})^2 + (u_{2j})^2 + \cdots + (u_{nj})^2 = 1\)

from which it follows that \(\displaystyle |u_i| = \sqrt{(u_i)^2} \leq 1\).

This part I don't get, especially the inequality. Could you explain more please or reference what I should read up on?
 
  • #7
You should have already proved if a square matrix has a left-inverse, it also has a right-inverse and the two are equal. In light of this, from:

\(\displaystyle OO^T = I\) we have \(\displaystyle O^TO = I\). The second equation is more useful.

Traditionally, vectors are written as COLUMNS, so one expresses an inner product as:

\(\displaystyle \langle u,v\rangle = v^Tu\) (row times column).

So \(\displaystyle \langle Ov,(Ov)^T\rangle = (Ov)^TOv = (v^TO^T)(Ov) = v^T(O^TO)v = v^TIv = v^Tv = \langle v,v\rangle\).

From this it follows that \(\displaystyle \|Ov\| = \|v\|\) (take the square root of both sides).

But what is each entry of the matrix product \(\displaystyle O^TO\)? isn't it the i-th ROW of \(\displaystyle O\) times (in the sense of an inner product) the j-th column of \(\displaystyle O\)?

Just consider the entries where the row and column have the same index (the diagonal ones). This is a sum of squares that sums up to 1. How can any such square be more than 1 (squares are all non-negative)?

(EDIT: the orthogonality of the columns of \(\displaystyle O\) follows from the definition, consider the inner product of any column with any other, if the two rows are not the same their inner product is the (matrix) product of the i-th row of \(\displaystyle O^T\) with the j-th column of \(\displaystyle O\), which is the i,j-th entry of the identity matrix, which is 0 if i ≠ j).
 
Last edited:
  • #8
Bear with me. :confused: I am trying to make sense of this, I promise!

I follow everything on this line.

$\displaystyle \langle Ov,(Ov)^T\rangle = (Ov)^TOv = (v^TO^T)(Ov) = v^T(O^TO)v = v^TIv = v^Tv = \langle v,v\rangle$

Since we are dealing with the real space isn't it true that $\langle Ov,(Ov)^T\rangle =\langle (Ov)^T,Ov\rangle$? In $\mathbb{R^n}$ this is equivalent to the dot product, which is commutative.

I have two main questions:

1) I agree that $\langle Ov,(Ov)^T\rangle =\langle v,v\rangle$ but if we state that $\displaystyle \|Ov\| = \|v\|$, doesn't that imply that both values are positive since we'll be taking the square-root? How do we justify those values are positive?

2) I still don't see how we claim that the sum of squares must be 1. Once that is established then obviously it follows that any entry in the sum must be less than or equal to 1, otherwise the sum would exceed 1. I just don't see the first part.
 
  • #9
1) The inner product is positive-definite (if \(\displaystyle u \neq 0, \langle u,u\rangle > 0\)).

2) I think you are still not seeing the core idea:

the inner product of the i-th and j-th columns of \(\displaystyle O\) IS the matrix product of the i-th row of \(\displaystyle O^T\) and the j-th row of \(\displaystyle O\). We know ahead of time what all such products are, the ij possible such rows times columns together form the identity matrix.
 
  • #10
Maybe this will help:

Suppose we have a 3x3 matrix:

\(\displaystyle A = \begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33} \end{bmatrix}\)

If we want to take the inner product of two columns of \(\displaystyle A\), we can do this two ways, say we want the inner product of column 1: \(\displaystyle u = (a_{11},a_{21},a_{31})\) and column 3: \(\displaystyle v = (a_{13},a_{23},a_{33})\).

Using the standard Euclidean dot-product, we have:

\(\displaystyle \langle u,v\rangle = a_{11}a_{13} + a_{21}a_{23} + a_{31}a_{33}\).

Or, we can form the matrix:

\(\displaystyle A^TA = \begin{bmatrix}a_{11}&a_{21}&a_{31}\\a_{12}&a_{22}&a_{32}\\ a_{13}&a_{23}&a_{33} \end{bmatrix}\begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33} \end{bmatrix}\)

in which case the first row of \(\displaystyle A^T\) times the third column of \(\displaystyle A\) (that is to say the 1,3-th entry of the matrix product) also gives the desired inner product: that is if:

\(\displaystyle A^TA = B = (b_{ij}) = \begin{bmatrix}b_{11}&b_{12}&b_{13}\\b_{21}&b_{22}&b_{23}\\b_{31}&b_{32}&b_{33} \end{bmatrix}\)

then:

\(\displaystyle b_{13} = a_{11}a_{13} + a_{21}a_{23} + a_{31}a_{33} = \langle u,v\rangle\).
 
  • #11
Jameson said:
Hi I like Serena! I'm not sure what properties I can use to be honest. This class is more applied and computationally based than theoretically based, so he hasn't taken much time to discuss this topic in detail. :(

I've been reading through the Wikipedia article on orthogonal matrices just as you said the rows and columns must be orthonormal. If it's part of the definition then surely I can use that property. I'm not sure how to generalize this properly for a proof but I'll try.

For an applied course I would expect you can use anything you can look up.
That's what an engineer does.
http://mathhelpboards.com/chat-room-9/jokes-5415-4.html#post28665, an engineer has a red ball book in which he can look up the properties of a red ball, rather than trying to deduce its properties from its chemical composition.

Note that these properties are not part of the definition of an orthogonal matrix, but they are consequences of the definition, that each can be proven.
Let matrix $A$ be an $n \times n$ orthogonal real matrix. Let us also think of $A$ as comprising of column vectors $[a_1, a_2, ...a_n]$ where each $a_i$ is in $\mathbb{R}^n$. Assume that at least one entry in a row/column is greater than 1.

That should violate either $(a_i) \cdot (a_i)^T=0$ or $\sqrt{a_1 \cdot a_1}=1$ but I can't quite see how to get there.

How is this set up so far?

That is the right direction.

Suppose $b$ is one of those column vectors $a_i$.
Then $b \cdot b$ = 1.
Writing it out, this is:
$$b \cdot b = b_1^2 + b_2^2 + ... + b_n^2 = 1$$
Note that each of the terms must be at least 0, since they are squares.
So what happens if any of the $b_j$ is either greater than $1$ or less than $-1$?
In this setup it follows by definition of an eigenvector that $Ov=\lambda v$, correct? I want to make sure I'm not mixing something up first.

Assuming that is true, then $(Ov)(Ov)^T=(Ov)(v^TO^T)=O(vv^T)O^T=$?

Let's try it like this:
$$(Ov)^T(Ov)=(v^TO^T)(Ov)=v^T(O^TO)v$$
Now combine it with what is given: $O^TO=I$ and $Ov=\lambda v$.
 
  • #12
I like Serena said:
That is the right direction.

Suppose $b$ is one of those column vectors $a_i$.
Then $b \cdot b$ = 1.
Writing it out, this is:
$$b \cdot b = b_1^2 + b_2^2 + ... + b_n^2 = 1$$
Note that each of the terms must be at least 0, since they are squares.
So what happens if any of the $b_j$ is either greater than $1$ or less than $-1$?

If $|b_j|>1$ for any $j$ then the square would be larger than 1, which contradicts the $b \cdot b = 1$. That I definitely understand. If we use the fact that any column is orthonormal then $b \cdot b = 1$ is justified by the following: $1=\|b\|=\sqrt{b \cdot b} \implies b \cdot b=1$. I don't see how to prove that without using the definition though.

Let's try it like this:
$$(Ov)^T(Ov)=(v^TO^T)(Ov)=v^T(O^TO)v$$
Now combine it with what is given: $O^TO=I$ and $Ov=\lambda v$.

Using Deveno's helpful posts, I know that $(Ov)^T(Ov)=(v^TO^T)(Ov)=v^T(O^TO)v$ leads to $v^T I v=v^Tv$. Using the definition of the dot product that can be rewritten as $(Ov) \cdot (Ov)= v \cdot v$. Since $a \cdot a \ge 0$ it follows that $|(Ov) \cdot (Ov)|= |v \cdot v|$. So we have justified that multiplying by matrix $O$ doesn't change the length of $v$.

I don't know how to go from this to using $\lambda$ though.

Thank you both so much! Deveno is right that I am missing some core concepts and I've found that it takes time for these things to sink in. Hopefully I'll have an "Aha!" moment in the next day or two where it all comes together.
 
  • #13
Jameson said:
If we use the fact that any column is orthonormal then $b \cdot b = 1$ is justified by the following: $1=\|b\|=\sqrt{b \cdot b} \implies b \cdot b=1$. I don't see how to prove that without using the definition though.

Prove what?
Using Deveno's helpful posts, I know that $(Ov)^T(Ov)=(v^TO^T)(Ov)=v^T(O^TO)v$ leads to $v^T I v=v^Tv$. Using the definition of the dot product that can be rewritten as $(Ov) \cdot (Ov)= v \cdot v$. Since $a \cdot a \ge 0$ it follows that $|(Ov) \cdot (Ov)|= |v \cdot v|$. So we have justified that multiplying by matrix $O$ doesn't change the length of $v$.

I don't know how to go from this to using $\lambda$ though.
Let's try substituting $Ov=\lambda v$ and $O^TO=I$:
\begin{array}{lcl}
(Ov)^T(Ov) &=& v^T(O^TO)v \\
(\lambda v)^T(\lambda v) &=& v^T I v \\
\lambda^2 v^Tv &=& v^T v \\
\lambda^2 &=& 1 \\
\lambda &=& \pm 1
\end{array}
 
  • #14
Some additional facts about orthogonal linear transformations, with a geometric interpretation:

Every column vector of an orthogonal nxn matrix lies on the unit (n-1)-sphere.

Now ask yourself, how can a vector which lies on a unit (n-1)-sphere, possibly have any coordinate > 1?

If we choose a particular vector on the unit (n-1)-sphere, the possible orthogonal vectors remaining lie on an (n-2)-sphere perpendicular to our chosen vector. For example, if we use a 2-sphere, using the Earth as a model, the perpendicular vectors to the vector represented by the north pole, lie on the equator (which is a 1-sphere, or circle).

Having chosen some vector on the equator, we now have to chose from one of two points on a 1-sphere, which lie on a perpendicular line to our equatorial vector co-planar with the equator.

It's sometimes easier to see what is going on in the special case n = 2:

Suppose we have an orthogonal matrix:

\(\displaystyle O = \begin{bmatrix}a&b\\c&d\end{bmatrix}\).

Working directly from the definition \(\displaystyle O^TO = I\), we get:

\(\displaystyle \begin{bmatrix}a&c\\b&d \end{bmatrix}\begin{bmatrix}a&b\\c&d \end{bmatrix} = \begin{bmatrix}1&0\\0&1 \end{bmatrix}\)

So that:

\(\displaystyle a^2 + c^2 = b^2 + d^2 = 1\)
\(\displaystyle ab + cd = 0\), that is:

\(\displaystyle \|(a,c)\|^2 = \|(b,d)\|^2 = 1\)
\(\displaystyle (a,c)\cdot (b,d) = 0\)
 
  • #15
I think I almost get it. :D That $2 \times 2$ example really helped something click.

In the identity matrix there are only 2 possible values by definition, 1 and 0. Whenever $i=j$ the row-column multiplication will sum to 1 and all for all other cases will result in 0 For all of the $i=j$ cases in the resultant matrix, that term is calculated by something in the form of $(a_r)^T(a_r)$ where $a_r$ is a column vector and $1 \le r \le n$. That product can be expressed as $(a_{1r})^2+(a_{2r})^2+...(a_{nr})^2$. If the magnitude of any of these terms is larger than 1 then the sum will be larger than 1 and we have a contradiction.

Ok, that part is good I think but I have one last question (for now). Using Deveno's $2 \times 2$ example, I agree that $ab+cd=0$ which is the same as $(a,c) \cdot (b,d)=0$. The only requirement for that to be true is that $ab=-cd$, which doesn't restrict the magnitudes of those for variables to be 1 or less.

EDIT: This can be justified by the fact that all columns are othoronormal. Thank you to I like Serena and Deveno for their patience and wonderful insight! :)
 
  • #16
I want to thank both I like Serena and Deveno for their help once more. Today was my first quiz in this course and there were questions related to this thread that I am sure I answered correctly only due to these two taking time to really help me understand the concepts. (Clapping)
 

What is an orthogonal matrix?

An orthogonal matrix is a square matrix whose columns and rows are all orthogonal unit vectors, meaning they are perpendicular and have a length of 1. This also means that the inverse of an orthogonal matrix is equal to its transpose.

What are the properties of an orthogonal matrix?

There are several properties of an orthogonal matrix, including:

  • All columns and rows are unit vectors
  • The dot product of any two columns or rows is 0
  • The inverse is equal to the transpose
  • The determinant is either 1 or -1
  • The product of two orthogonal matrices is also orthogonal

How can orthogonal matrices be used in linear algebra?

Orthogonal matrices are useful in linear algebra because they preserve vector lengths and angles. This makes them useful for transformations, projections, and rotations in 2D and 3D space. They also have applications in solving systems of linear equations and in data compression.

What is the difference between an orthogonal and a unitary matrix?

While both orthogonal and unitary matrices have columns and rows that are orthogonal unit vectors, the difference lies in the type of numbers they contain. Orthogonal matrices have real numbers, while unitary matrices have complex numbers. Additionally, the inverse of an orthogonal matrix is equal to its transpose, while the inverse of a unitary matrix is equal to its conjugate transpose.

Can any matrix be transformed into an orthogonal matrix?

Yes, any square matrix can be transformed into an orthogonal matrix through the process of orthogonalization. This involves using algorithms such as the Gram-Schmidt process or the QR decomposition to convert the columns of a matrix into orthogonal unit vectors. However, this process may not always be possible or practical depending on the matrix.

Similar threads

  • Linear and Abstract Algebra
Replies
1
Views
915
  • Linear and Abstract Algebra
Replies
10
Views
1K
  • Linear and Abstract Algebra
Replies
10
Views
2K
  • Linear and Abstract Algebra
Replies
11
Views
960
Replies
3
Views
1K
Replies
3
Views
2K
  • Math POTW for Graduate Students
Replies
6
Views
1K
  • Linear and Abstract Algebra
Replies
7
Views
815
Replies
2
Views
1K
  • Linear and Abstract Algebra
Replies
20
Views
2K
Back
Top