Linear least squares regression for model matrix identification

synMehdi · May 23, 2019

Summary: I need to Identify my linear model matrix using least squares . The aim is to approach an overdetermined system Matrix [A] by knowing pairs of [x] and [y] input data in the complex space.

I need to do a linear model identification using least squared method.
My model to identify is a matrix ##[A]##. My linear system in the complex space is:

$$[A]_{_{n \times m}} \cdot [x]_{_{m \times 1}} = [y]_{_{n \times 1}}$$

where ##n## and ##m## define the matrix sizes. in my notation I define my known arrays ##x## and ##y## as vectors.

To identify ##[A]## I have a set of ##p## equations:

$$[A] \cdot \vec{x}_1 = \vec{y}_1$$
$$[A] \cdot \vec{x}_2 = \vec{y}_2$$
$$...$$
$$[A] \cdot \vec{x}_p = \vec{y}_p$$

knowing that my system is overdetermined (##p>n,m##) and that each pair of ##\vec{x}## and ##\vec{y}##, is known, I want to identify my linear model matrix ##[A]## with least squares.

My approach:

I have aranged my known equations like above:

$$[A] \cdot [\vec{x}_1\>\vec{x}_2\>...\>\vec{x}_p]=[\vec{y}_1\>\vec{y}_2\>...\>\vec{y}_p]$$

My initial linear system becomes a matrix equation:

$$[A]_{_{n \times m}} \cdot [X]_{_{m \times p}} = [Y]_{_{n \times p}}$$

Is this the right thing to do to solve ##[A]## with the Moore-Penrose inverse of ##[X]##? What is the correct way to do this?

StoneTemplePython · May 23, 2019

It seems like an ok start to me. What is the rank of ##\mathbf X##?

Are you looking for a math or computational solution?

synMehdi · May 23, 2019

StoneTemplePython said:

It seems like an ok start to me. What is the rank of ##\mathbf X##?

Are you looking for a math or computational solution?

Thank you, but I'm not able to prove what I've done or if the approach is correct.

Well, ##[X] _{m \times p} ## is of rank ##m## because ##p>=m## and all ##p## equations are independent.

For the type of solution I would say mathematical for the rigorous understanding but the goal is to make a small python script out of it.

StoneTemplePython · May 23, 2019

synMehdi said:

Thank you, but I'm not able to prove what I've done or if the approach is correct.

Well, ##[X] _{m \times p} ## is of rank ##m## because ##p>=m## and all ##p## equations are independent.

Ok, let's tighten up the math first a bit then. (Technical nit: for ## p \gt m## it is impossible that all ##p## equations are linearly independent -- at most ##m## of them are due to the dimension.)

What do you know about 'regular' ordinary least squares? As it is you have

##\mathbf{AX} = \mathbf Y##

if you take the conjugate transpose of this, you have
##\mathbf{X}^* \mathbf A^* = \mathbf Y^*##

where ##^* ## denotes conjugate transpose. This is standard form for ordinary least squares. The approach of solving this that works nicely with scalars in ##\mathbb R## and ##\mathbb C## is to to make use of a well chosen collection of mutually orthonormal vectors.

The idea is you want

##\text{min}_{A^*} \big \Vert \mathbf{X}^* \mathbf A^* - \mathbf Y^*\big \Vert_F^2##
i.e. minimize the squared Frobenius norm of the difference between ##\big(\mathbf{X}^* \mathbf A^*\big)## and ## \mathbf Y^*##, and to do so by selecting the best ##A^*##

using the "thin" QR factorization you have

##\mathbf {QR} = \mathbf X^*####\mathbf Q =
\bigg[\begin{array}{c|c|c|c}
\mathbf q_1 & \mathbf q_2 &\cdots & \mathbf q_{m}
\end{array}\bigg]##

##\mathbf R =
\bigg[\begin{array}{c|c|c|c}
\mathbf r_1 & \mathbf r_m &\cdots & \mathbf r_{m}
\end{array}\bigg]##

where ##\mathbf Q## is tall and skinny (p x m) but ##\mathbf R## is square (m x m) and of course upper triangular. Since ##\mathbf X^*## has full column rank, you know that all diagonal entries of ##\mathbf R## are non-zero.

Why? an easy way is to consider that full column rank of ##\mathbf X^*## implies that ##\mathbf X\mathbf X^* \succ 0## i.e. it is hermitian positive definite hence
##0 \lt \det\big(\mathbf X\mathbf X^*\big) ##
but
##0 \lt \det\big(\mathbf X\mathbf X^*\big) = \Big \vert \det \big(\mathbf R\big)\Big \vert^2##
- - - -
Now consider the projector given by
##\mathbf P = \mathbf {QQ}^*##
it is obviously hermitian, and notice that ##\mathbf P^2 = \mathbf P## so it is idempotent and hence a projector-- and its complement is given by ##\big(\mathbf I - \mathbf P\big)##

you should convince yourself at this point that

##\mathbf P = \mathbf {QQ}^*## is p x p (with rank m)
but by direct multiplication
##\mathbf Q^* \mathbf Q = \mathbf I_m##

- - - -

In what amounts to the Pythagorean theorem you have
##\text{min}_{A^*} \Big \Vert \mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big \Vert_F^2##
## = \text{min}_{A^*} \Big \Vert \Big( \mathbf I\Big) \Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2##
## = \text{min}_{A^*} \Big \Vert \Big(\mathbf P+ \big(\mathbf I-\mathbf P\big)\Big)\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2##
## = \text{min}_{A^*} \Big \Vert \mathbf P\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)+ \big(\mathbf I-\mathbf P\big)\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2##
## = \text{min}_{A^*} \Big \Vert \mathbf P\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 + \Big \Vert \big(\mathbf I - \mathbf P\big)\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 ##
## = \text{min}_{A^*} \Big \Vert \mathbf P\Big(\mathbf{QR} \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 + \Big \Vert \big(\mathbf I - \mathbf P\big)\Big(\mathbf{QR} \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 ##
## \geq \text{min}_{A^*} \Big \Vert \big(\mathbf I - \mathbf P\big)\Big(\mathbf{QR} \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 ##
## = \text{min}_{A^*} \Big \Vert - \big(\mathbf I - \mathbf P\big) \mathbf Y^*\Big \Vert_F^2 ##
## = \Big \Vert \big(\mathbf I - \mathbf P\big) \mathbf Y^*\Big \Vert_F^2 ##

There is equality iff
##\Big \Vert \mathbf P\Big(\mathbf{QR} \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 = 0##
or equivalently,
## \mathbf P\mathbf{QR} \mathbf A^* = \mathbf P\mathbf Y^*##

Thankfully we can solve this the 'regular way' for QR factorization -- i.e. left multiply each side by ##\mathbf Q^*## to get

## \mathbf Q^*\mathbf P\mathbf{QR} \mathbf A^* ##
## =\mathbf Q^*\big(\mathbf Q \mathbf Q^*\big) \mathbf{QR} \mathbf A^* ##
## =\big(\mathbf Q^*\mathbf Q\big)\big( \mathbf Q^*\mathbf Q\big) \mathbf{R} \mathbf A^* ##
## = \mathbf{R} \mathbf A^* ##
and on the right hand side
##= \mathbf Q^* \mathbf P\mathbf Y^* ##
##= \mathbf Q^*\big(\mathbf Q \mathbf Q^*\big) \mathbf Y^*##
##= \mathbf Q^* \mathbf Y^*##
and select
##\mathbf A^* := \mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^*##

(though on your computer you'd not actually invert ##\mathbf R## but instead use something like scipy.linalg.solve_triangular here)

For avoidance of doubt, since ##\mathbf R^{-1}## exists and is an actual inverse (not a pseudo inverse), you can plug it back into your equation to verify that we actually have equality here:

## \mathbf P\mathbf{QR} \mathbf A^* ##
## = \mathbf P\mathbf{QR}\big( \mathbf A^*\big) ##
## = \mathbf P\mathbf{QR} \big(\mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^*\big) ##
## = \mathbf P\mathbf{Q} \mathbf Q^*\mathbf Y^* ##
## = \mathbf P^2\mathbf Y^* ##
##= \mathbf P\mathbf Y^*##
as required

- - - -
edit:
To belabor the issue of uniqueness of our solution:

The key idea is that after we've operated on a space by left multiplication of ##\mathbf P##, then the mapping given by left multiplication of ##\mathbf Q^*## is both injective and surjective so every step in our solution is invertible (again after using the Projector). That is ##\mathbf P \mathbf Y^* ## becomes ##\mathbf Q^*\mathbf P \mathbf Y^*## after left multiplication by ##\mathbf Q^*##. But this mapping is invertible -- from here we can left multiply by ##\mathbf Q## to get

##\mathbf Q\mathbf Q^*\mathbf P \mathbf Y^*##
##= \mathbf P\mathbf P \mathbf Y^*##
##= \mathbf P^2 \mathbf Y^*##
##= \mathbf P \mathbf Y^*##
as we originally had. Alternatively we can also directly verify uniqueness of the solution, as shown below:

Let's consider some other solution matrix for ##\mathbf A^*## given by ##\mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^* + \mathbf B## and prove that ##\mathbf B## must be zero. We get this by plugging it in as above

## \mathbf P\mathbf{QR} \mathbf A^* ##
## = \mathbf P\mathbf{QR}\big( \mathbf A^*\big) ##
## = \mathbf P\mathbf{QR} \big(\mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^* + \mathbf B\big) ##
##= \mathbf P\mathbf Y^* + \mathbf P\mathbf{QR} \mathbf B##
making use of the linearity and the prior result.

Now define ##\mathbf C := \mathbf {RB} ##
and observe that since ##\mathbf R## is invertible, we know ##\mathbf C = \mathbf 0## iff ##\mathbf B =\mathbf 0##. So we have

##= \mathbf P\mathbf Y^* + \mathbf P\mathbf Q \big(\mathbf R \mathbf B\big)##
##= \mathbf P\mathbf Y^* + \mathbf P\mathbf Q \big(\mathbf C \big)##
##= \mathbf P\mathbf Y^* + \big(\mathbf Q \mathbf Q^*\big) \mathbf Q \mathbf C ##
##= \mathbf P\mathbf Y^* + \mathbf Q \big(\mathbf Q^* \mathbf Q\big) \big(\mathbf C \big)##
##= \mathbf P\mathbf Y^* + \mathbf Q \big(\mathbf I_m\big) \big(\mathbf C \big)##
##= \mathbf P\mathbf Y^* + \mathbf Q \mathbf C ##
##= \mathbf P\mathbf Y^* ##
as required, iff
## \mathbf Q \mathbf C = \mathbf 0##
but ##\mathbf Q## has mutually orthonormal columns -- i.e. each column is linearly independent and hence looking at column k of the above, we see
## \mathbf Q \mathbf c_k = \mathbf 0##

iff ##\mathbf c_k = \mathbf 0 \longrightarrow \mathbf C = \mathbf 0 \longrightarrow \mathbf B =\mathbf 0##

which confirms uniqueness

- - - -

Hopefully it is obvious that this gives you ##\mathbf A^*## so you'll want to get take the conjugate transpose at the end to get ##\mathbf A##

synMehdi said:

For the type of solution I would say mathematical for the rigorous understanding but the goal is to make a small python script out of it.

The above gives you the solution. One way or another you want to lean on orthogonality.

The standard approaches are to use either Singular Value Decompositon or QR factorization. You may get a few more analytical insights with SVD, but QR is lighter weight machinery (i.e. less theory is needed to understand it and also it has a smaller coefficient/ faster run time on your computer so its what you'd use in practice for least squares problems). I showed the solution using QR factorization, accordingly.

A final note is that there are techcnically two 'standard' ways of doing QR factorization -- the standard way I know is sometimes called 'thin' QR factorization, but there is second way. Of particular interest: numpy uses this "thin" one by default --which you want-- but scipy by default actually uses the "full" factorization which you don't want. It's always a good idea to read the docs before implementing the code.

synMehdi · May 24, 2019

Great answer, I understand more my problem. let's see:

StoneTemplePython said:

Ok, let's tighten up the math first a bit then. (Technical nit: for ## p \gt m## it is impossible that all ##p## equations are linearly independent -- at most ##m## of them are due to the dimension.)

Hmm yes, but I may have not used the right word by saying independent, what I mean is that they are approximations. Let's say that the system is just ##a\cdot x=b## and i have:
$$a\cdot x_1 = y_1 \\ a\cdot x_2=y_2$$
I agree that if ##x_1=1##, ##y_1=1## and ##x_2=2##, ##y_2=2## both equations are not independent and ##a=1## is a solution.
I would not have that in practical, I would have maybe: ##x_1=1##, ##y_1=0.9## and ##x_2=2##, ##y_2=2.1## and I will get an approximate value of ##a## using least squares which will be ##\approx 1##.
The two equations are independent in the practical case. Am I right?

StoneTemplePython said:

What do you know about 'regular' ordinary least squares? As it is you have

AX=Y\mathbf{AX} = \mathbf Y

if you take the conjugate transpose of this, you have
X∗A∗=Y∗\mathbf{X}^* \mathbf A^* = \mathbf Y^*

where ∗^* denotes conjugate transpose. This is standard form for ordinary least squares. The approach of solving this that works nicely with scalars in R\mathbb R and C\mathbb C is to to make use of a well chosen collection of mutually orthonormal vectors.

Ok I understand, so if I use the same analogy as before with the simpler system:

$$a \cdot [x_1 \> x_2] = [y_1 \> y_2]$$
becomes:

$$
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix}
\cdot a =
\begin{bmatrix}
y_1 \\
y_2
\end{bmatrix}
$$

which is the standard form of least-squares and is equivalent. the solution would be:
$$
[x_1 \> x_2]
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix}
\cdot a =
[x_1 \> x_2]
\begin{bmatrix}
y_1 \\
y_2
\end{bmatrix}
$$

Right? For the rest part of your answer it is very clear what we are trying to do (get the minimum residual). but it is not clear to me the ##QR## decomposition. Wouldn't juste the Moore-Penrose inverse suffice?
$$ \mathbf{X}\cdot \mathbf{X}^* \mathbf A = \mathbf{X}\mathbf Y $$

StoneTemplePython · May 24, 2019

synMehdi said:

Hmm yes, but I may have not used the right word by saying independent, what I mean is that they are approximations. Let's say that the system is just ##a\cdot x=b## and i have:
$$a\cdot x_1 = y_1 \\ a\cdot x_2=y_2$$
I agree that if ##x_1=1##, ##y_1=1## and ##x_2=2##, ##y_2=2## both equations are not independent and ##a=1## is a solution.
I would not have that in practical, I would have maybe: ##x_1=1##, ##y_1=0.9## and ##x_2=2##, ##y_2=2.1## and I will get an approximate value of ##a## using least squares which will be ##\approx 1##.
The two equations are independent in the practical case. Am I right?

Yes -- as I said it was a technical nitpick -- I can't really tell whether you get the role of dimension here. What your orginal post should say is if you choose any ##m## of these ##p## equations, then that collection will be linearly independent.

synMehdi said:

Right? For the rest part of your answer it is very clear what we are trying to do (get the minimum residual). but it is not clear to me the ##QR## decomposition. Wouldn't juste the Moore-Penrose inverse suffice?
$$ \mathbf{X}\cdot \mathbf{X}^* \mathbf A = \mathbf{X}\mathbf Y $$

More care is needed -- what you have written is
## \mathbf{X}\cdot \mathbf{X}^* \mathbf A = \mathbf{X}\mathbf Y ##
this is wrong. It should say that we are working with conjugate transpose of A and Y
## \mathbf{X}\cdot \mathbf{X}^* \mathbf A^* = \mathbf{X}\mathbf Y^* ##

ok, working with that and using that ##\mathbf {QR} = \mathbf X^*##, you should be able to verify that they get to the same result. I.e. what I have shown in my prior post is the optimal (and unique) solution is given by
##\mathbf A^* := \mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^*##

As I've already shown, ##\big(\mathbf{X} \mathbf{X}^*\big)## is non-singular so the solution to your normal equations form of the problem exists and is unique. Proceeding from here.

## \mathbf{X}\cdot \mathbf{X}^* \mathbf A^* = \mathbf{X}\mathbf Y^* ##
## =\big(\mathbf{X}^*\big)^* \mathbf{X}^* \mathbf A^* = \big(\mathbf{X}^*\big)^*\mathbf Y^* ##
## =\big(\mathbf{QR}\big)^* \mathbf{QR} \mathbf A^* = \big(\mathbf{QR}\big)^*\mathbf Y^* ##
## =\mathbf R^* \mathbf{Q}^* \mathbf{QR} \mathbf A^* = \mathbf R^*\mathbf{Q}^*\mathbf Y^* ##
## =\mathbf R^* \big(\mathbf{Q}^* \mathbf{Q}\big) \mathbf R \mathbf A^* = \mathbf R^*\mathbf{Q}^*\mathbf Y^* ##
## =\mathbf R^* \big(\mathbf{I}_m\big) \mathbf R \mathbf A^* = \mathbf R^*\mathbf{Q}^*\mathbf Y^* ##
## =\mathbf R^* \mathbf R \mathbf A^* = \mathbf R^*\mathbf{Q}^*\mathbf Y^* ##

since ##\mathbf R## is invertible so is its conjugate transpose, so multiply each side by

##\big(\mathbf R^*\big)^{-1}## and get
## \mathbf R \mathbf A^* = \mathbf{Q}^*\mathbf Y \longrightarrow \mathbf A^* = \mathbf R^{-1}\mathbf{Q}^*\mathbf Y^* ##

which confirms the solutions are the same. Really I'd hope that you would have been able to compare the two solutions and verify they are the same (even if you didn't fully understand the derivation). All you need is a definition of orthogonal and standard rules of matrix algebra.

synMehdi said:

Right? For the rest part of your answer it is very clear what we are trying to do (get the minimum residual). but it is not clear to me the ##QR## decomposition. Wouldn't juste the Moore-Penrose inverse suffice?
$$ \mathbf{X}\cdot \mathbf{X}^* \mathbf A = \mathbf{X}\mathbf Y $$

so revisiting this and this

synMehdi said:

Thank you, but I'm not able to prove what I've done or if the approach is correct...

suggests 3 final questions from me.

1.) Do you know a clean mathematically rigorous way to justify the Normal Equations approach that you are suggesting here -- without using what I've done based on orthgonality? The standard calculus derivation for normal equations is in reals -- doing it with scalars in ##\mathbb C## muddies the waters are lot. I wouldn't recommend it. My personal feeling is the calculus approach doesn't give very much geometric insight either, whereas wielding orthgonality (and when I do it -- projectors) does tie in with geometry.

2.) Do you know what Gramm Schmidt is? If not I guess my suggested approach won't make that much sense, though it is a pre-req to understanding a modern approach to least squares. You'll need to learn it sometime as it is used all over the place in math, science and engineering. If you undertstand it well, you can derive a rudimentary QR factorization on your own.

synMehdi said:

For the type of solution I would say mathematical for the rigorous understanding but the goal is to make a small python script out of it.

3.) I want to make a computational point here. The Normal Equations are not numerically stable -- you should never use them to actually solve this problem. On the other hand, the QR factorization approach is essentially the gold standard for solving your problem on a computer.

Linear least squares regression for model matrix identification

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad About the existence of Hamel basis for vector spaces

Undergrad How to define a vector field?

Graduate Confusion about the Moyal-Weyl twist

Undergrad 2 interpretations of bra-ket expression: equal, & isomorphic, but...

Undergrad Erroneously finding discrepancy in transpose rule

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers