Linear least squares regression for model matrix identification

In summary: AX} = \mathbf Y##In summary, the speakers discuss the process of identifying a linear model matrix using least squares in a complex space. They mention arranging known equations in matrix form and using the Moore-Penrose inverse of the matrix of independent equations to solve for the linear model matrix. They also mention the use of a well-chosen collection of mutually orthonormal vectors and the "thin" QR factorization method to solve the equation. The speakers also address the issue of regular ordinary least squares and provide a mathematical solution using the transpose and conjugate of the matrix. They conclude by mentioning the use of scipy.linalg.solve_triangular in the computer implementation of the solution.
  • #1
synMehdi
38
0
TL;DR Summary
I need to Identify my linear model matrix using least squares . The aim is to approach an overdetermined system Matrix [A] by knowing pairs of [x] and [y] input data in the complex space.
Summary: I need to Identify my linear model matrix using least squares . The aim is to approach an overdetermined system Matrix [A] by knowing pairs of [x] and [y] input data in the complex space.

I need to do a linear model identification using least squared method.
My model to identify is a matrix ##[A]##. My linear system in the complex space is:

$$[A]_{_{n \times m}} \cdot [x]_{_{m \times 1}} = [y]_{_{n \times 1}}$$

where ##n## and ##m## define the matrix sizes. in my notation I define my known arrays ##x## and ##y## as vectors.

To identify ##[A]## I have a set of ##p## equations:

$$[A] \cdot \vec{x}_1 = \vec{y}_1$$
$$[A] \cdot \vec{x}_2 = \vec{y}_2$$
$$...$$
$$[A] \cdot \vec{x}_p = \vec{y}_p$$

knowing that my system is overdetermined (##p>n,m##) and that each pair of ##\vec{x}## and ##\vec{y}##, is known, I want to identify my linear model matrix ##[A]## with least squares.

My approach:

I have aranged my known equations like above:

$$[A] \cdot [\vec{x}_1\>\vec{x}_2\>...\>\vec{x}_p]=[\vec{y}_1\>\vec{y}_2\>...\>\vec{y}_p]$$

My initial linear system becomes a matrix equation:

$$[A]_{_{n \times m}} \cdot [X]_{_{m \times p}} = [Y]_{_{n \times p}}$$

Is this the right thing to do to solve ##[A]## with the Moore-Penrose inverse of ##[X]##? What is the correct way to do this?
 
Physics news on Phys.org
  • #2
It seems like an ok start to me. What is the rank of ##\mathbf X##?

Are you looking for a math or computational solution?
 
  • #3
StoneTemplePython said:
It seems like an ok start to me. What is the rank of ##\mathbf X##?

Are you looking for a math or computational solution?

Thank you, but I'm not able to prove what I've done or if the approach is correct.

Well, ##[X] _{m \times p} ## is of rank ##m## because ##p>=m## and all ##p## equations are independent.

For the type of solution I would say mathematical for the rigorous understanding but the goal is to make a small python script out of it.
 
  • #4
synMehdi said:
Thank you, but I'm not able to prove what I've done or if the approach is correct.

Well, ##[X] _{m \times p} ## is of rank ##m## because ##p>=m## and all ##p## equations are independent.
Ok, let's tighten up the math first a bit then. (Technical nit: for ## p \gt m## it is impossible that all ##p## equations are linearly independent -- at most ##m## of them are due to the dimension.)

What do you know about 'regular' ordinary least squares? As it is you have

##\mathbf{AX} = \mathbf Y##

if you take the conjugate transpose of this, you have
##\mathbf{X}^* \mathbf A^* = \mathbf Y^*##

where ##^* ## denotes conjugate transpose. This is standard form for ordinary least squares. The approach of solving this that works nicely with scalars in ##\mathbb R## and ##\mathbb C## is to to make use of a well chosen collection of mutually orthonormal vectors.

The idea is you want

##\text{min}_{A^*} \big \Vert \mathbf{X}^* \mathbf A^* - \mathbf Y^*\big \Vert_F^2##
i.e. minimize the squared Frobenius norm of the difference between ##\big(\mathbf{X}^* \mathbf A^*\big)## and ## \mathbf Y^*##, and to do so by selecting the best ##A^*##

using the "thin" QR factorization you have

##\mathbf {QR} = \mathbf X^*####\mathbf Q =
\bigg[\begin{array}{c|c|c|c}
\mathbf q_1 & \mathbf q_2 &\cdots & \mathbf q_{m}
\end{array}\bigg]##

##\mathbf R =
\bigg[\begin{array}{c|c|c|c}
\mathbf r_1 & \mathbf r_m &\cdots & \mathbf r_{m}
\end{array}\bigg]##

where ##\mathbf Q## is tall and skinny (p x m) but ##\mathbf R## is square (m x m) and of course upper triangular. Since ##\mathbf X^*## has full column rank, you know that all diagonal entries of ##\mathbf R## are non-zero.

Why? an easy way is to consider that full column rank of ##\mathbf X^*## implies that ##\mathbf X\mathbf X^* \succ 0## i.e. it is hermitian positive definite hence
##0 \lt \det\big(\mathbf X\mathbf X^*\big) ##
but
##0 \lt \det\big(\mathbf X\mathbf X^*\big) = \Big \vert \det \big(\mathbf R\big)\Big \vert^2##
- - - -
Now consider the projector given by
##\mathbf P = \mathbf {QQ}^*##
it is obviously hermitian, and notice that ##\mathbf P^2 = \mathbf P## so it is idempotent and hence a projector-- and its complement is given by ##\big(\mathbf I - \mathbf P\big)##

you should convince yourself at this point that

##\mathbf P = \mathbf {QQ}^*## is p x p (with rank m)
but by direct multiplication
##\mathbf Q^* \mathbf Q = \mathbf I_m##

- - - -

In what amounts to the Pythagorean theorem you have
##\text{min}_{A^*} \Big \Vert \mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big \Vert_F^2##
## = \text{min}_{A^*} \Big \Vert \Big( \mathbf I\Big) \Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2##
## = \text{min}_{A^*} \Big \Vert \Big(\mathbf P+ \big(\mathbf I-\mathbf P\big)\Big)\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2##
## = \text{min}_{A^*} \Big \Vert \mathbf P\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)+ \big(\mathbf I-\mathbf P\big)\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2##
## = \text{min}_{A^*} \Big \Vert \mathbf P\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 + \Big \Vert \big(\mathbf I - \mathbf P\big)\Big(\mathbf{X}^* \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 ##
## = \text{min}_{A^*} \Big \Vert \mathbf P\Big(\mathbf{QR} \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 + \Big \Vert \big(\mathbf I - \mathbf P\big)\Big(\mathbf{QR} \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 ##
## \geq \text{min}_{A^*} \Big \Vert \big(\mathbf I - \mathbf P\big)\Big(\mathbf{QR} \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 ##
## = \text{min}_{A^*} \Big \Vert - \big(\mathbf I - \mathbf P\big) \mathbf Y^*\Big \Vert_F^2 ##
## = \Big \Vert \big(\mathbf I - \mathbf P\big) \mathbf Y^*\Big \Vert_F^2 ##

There is equality iff
##\Big \Vert \mathbf P\Big(\mathbf{QR} \mathbf A^* - \mathbf Y^*\Big)\Big \Vert_F^2 = 0##
or equivalently,
## \mathbf P\mathbf{QR} \mathbf A^* = \mathbf P\mathbf Y^*##

Thankfully we can solve this the 'regular way' for QR factorization -- i.e. left multiply each side by ##\mathbf Q^*## to get

## \mathbf Q^*\mathbf P\mathbf{QR} \mathbf A^* ##
## =\mathbf Q^*\big(\mathbf Q \mathbf Q^*\big) \mathbf{QR} \mathbf A^* ##
## =\big(\mathbf Q^*\mathbf Q\big)\big( \mathbf Q^*\mathbf Q\big) \mathbf{R} \mathbf A^* ##
## = \mathbf{R} \mathbf A^* ##
and on the right hand side
##= \mathbf Q^* \mathbf P\mathbf Y^* ##
##= \mathbf Q^*\big(\mathbf Q \mathbf Q^*\big) \mathbf Y^*##
##= \mathbf Q^* \mathbf Y^*##
and select
##\mathbf A^* := \mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^*##

(though on your computer you'd not actually invert ##\mathbf R## but instead use something like scipy.linalg.solve_triangular here)

For avoidance of doubt, since ##\mathbf R^{-1}## exists and is an actual inverse (not a pseudo inverse), you can plug it back into your equation to verify that we actually have equality here:

## \mathbf P\mathbf{QR} \mathbf A^* ##
## = \mathbf P\mathbf{QR}\big( \mathbf A^*\big) ##
## = \mathbf P\mathbf{QR} \big(\mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^*\big) ##
## = \mathbf P\mathbf{Q} \mathbf Q^*\mathbf Y^* ##
## = \mathbf P^2\mathbf Y^* ##
##= \mathbf P\mathbf Y^*##
as required

- - - -
edit:
To belabor the issue of uniqueness of our solution:
The key idea is that after we've operated on a space by left multiplication of ##\mathbf P##, then the mapping given by left multiplication of ##\mathbf Q^*## is both injective and surjective so every step in our solution is invertible (again after using the Projector). That is ##\mathbf P \mathbf Y^* ## becomes ##\mathbf Q^*\mathbf P \mathbf Y^*## after left multiplication by ##\mathbf Q^*##. But this mapping is invertible -- from here we can left multiply by ##\mathbf Q## to get

##\mathbf Q\mathbf Q^*\mathbf P \mathbf Y^*##
##= \mathbf P\mathbf P \mathbf Y^*##
##= \mathbf P^2 \mathbf Y^*##
##= \mathbf P \mathbf Y^*##
as we originally had. Alternatively we can also directly verify uniqueness of the solution, as shown below:

Let's consider some other solution matrix for ##\mathbf A^*## given by ##\mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^* + \mathbf B## and prove that ##\mathbf B## must be zero. We get this by plugging it in as above

## \mathbf P\mathbf{QR} \mathbf A^* ##
## = \mathbf P\mathbf{QR}\big( \mathbf A^*\big) ##
## = \mathbf P\mathbf{QR} \big(\mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^* + \mathbf B\big) ##
##= \mathbf P\mathbf Y^* + \mathbf P\mathbf{QR} \mathbf B##
making use of the linearity and the prior result.

Now define ##\mathbf C := \mathbf {RB} ##
and observe that since ##\mathbf R## is invertible, we know ##\mathbf C = \mathbf 0## iff ##\mathbf B =\mathbf 0##. So we have

##= \mathbf P\mathbf Y^* + \mathbf P\mathbf Q \big(\mathbf R \mathbf B\big)##
##= \mathbf P\mathbf Y^* + \mathbf P\mathbf Q \big(\mathbf C \big)##
##= \mathbf P\mathbf Y^* + \big(\mathbf Q \mathbf Q^*\big) \mathbf Q \mathbf C ##
##= \mathbf P\mathbf Y^* + \mathbf Q \big(\mathbf Q^* \mathbf Q\big) \big(\mathbf C \big)##
##= \mathbf P\mathbf Y^* + \mathbf Q \big(\mathbf I_m\big) \big(\mathbf C \big)##
##= \mathbf P\mathbf Y^* + \mathbf Q \mathbf C ##
##= \mathbf P\mathbf Y^* ##
as required, iff
## \mathbf Q \mathbf C = \mathbf 0##
but ##\mathbf Q## has mutually orthonormal columns -- i.e. each column is linearly independent and hence looking at column k of the above, we see
## \mathbf Q \mathbf c_k = \mathbf 0##

iff ##\mathbf c_k = \mathbf 0 \longrightarrow \mathbf C = \mathbf 0 \longrightarrow \mathbf B =\mathbf 0##

which confirms uniqueness

- - - -

Hopefully it is obvious that this gives you ##\mathbf A^*## so you'll want to get take the conjugate transpose at the end to get ##\mathbf A##

synMehdi said:
For the type of solution I would say mathematical for the rigorous understanding but the goal is to make a small python script out of it.

The above gives you the solution. One way or another you want to lean on orthogonality.

The standard approaches are to use either Singular Value Decompositon or QR factorization. You may get a few more analytical insights with SVD, but QR is lighter weight machinery (i.e. less theory is needed to understand it and also it has a smaller coefficient/ faster run time on your computer so its what you'd use in practice for least squares problems). I showed the solution using QR factorization, accordingly.

A final note is that there are techcnically two 'standard' ways of doing QR factorization -- the standard way I know is sometimes called 'thin' QR factorization, but there is second way. Of particular interest: numpy uses this "thin" one by default --which you want-- but scipy by default actually uses the "full" factorization which you don't want. It's always a good idea to read the docs before implementing the code.
 
Last edited:
  • Like
Likes synMehdi
  • #5
Great answer, I understand more my problem. let's see:

StoneTemplePython said:
Ok, let's tighten up the math first a bit then. (Technical nit: for ## p \gt m## it is impossible that all ##p## equations are linearly independent -- at most ##m## of them are due to the dimension.)

Hmm yes, but I may have not used the right word by saying independent, what I mean is that they are approximations. Let's say that the system is just ##a\cdot x=b## and i have:
$$a\cdot x_1 = y_1 \\ a\cdot x_2=y_2$$
I agree that if ##x_1=1##, ##y_1=1## and ##x_2=2##, ##y_2=2## both equations are not independent and ##a=1## is a solution.
I would not have that in practical, I would have maybe: ##x_1=1##, ##y_1=0.9## and ##x_2=2##, ##y_2=2.1## and I will get an approximate value of ##a## using least squares which will be ##\approx 1##.
The two equations are independent in the practical case. Am I right?

StoneTemplePython said:
What do you know about 'regular' ordinary least squares? As it is you have

AX=Y\mathbf{AX} = \mathbf Y

if you take the conjugate transpose of this, you have
X∗A∗=Y∗\mathbf{X}^* \mathbf A^* = \mathbf Y^*

where ∗^* denotes conjugate transpose. This is standard form for ordinary least squares. The approach of solving this that works nicely with scalars in R\mathbb R and C\mathbb C is to to make use of a well chosen collection of mutually orthonormal vectors.

Ok I understand, so if I use the same analogy as before with the simpler system:

$$a \cdot [x_1 \> x_2] = [y_1 \> y_2]$$
becomes:

$$
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix}
\cdot a =
\begin{bmatrix}
y_1 \\
y_2
\end{bmatrix}
$$

which is the standard form of least-squares and is equivalent. the solution would be:
$$
[x_1 \> x_2]
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix}
\cdot a =
[x_1 \> x_2]
\begin{bmatrix}
y_1 \\
y_2
\end{bmatrix}
$$

Right? For the rest part of your answer it is very clear what we are trying to do (get the minimum residual). but it is not clear to me the ##QR## decomposition. Wouldn't juste the Moore-Penrose inverse suffice?
$$ \mathbf{X}\cdot \mathbf{X}^* \mathbf A = \mathbf{X}\mathbf Y $$
 
  • #6
synMehdi said:
Hmm yes, but I may have not used the right word by saying independent, what I mean is that they are approximations. Let's say that the system is just ##a\cdot x=b## and i have:
$$a\cdot x_1 = y_1 \\ a\cdot x_2=y_2$$
I agree that if ##x_1=1##, ##y_1=1## and ##x_2=2##, ##y_2=2## both equations are not independent and ##a=1## is a solution.
I would not have that in practical, I would have maybe: ##x_1=1##, ##y_1=0.9## and ##x_2=2##, ##y_2=2.1## and I will get an approximate value of ##a## using least squares which will be ##\approx 1##.
The two equations are independent in the practical case. Am I right?
Yes -- as I said it was a technical nitpick -- I can't really tell whether you get the role of dimension here. What your orginal post should say is if you choose any ##m## of these ##p## equations, then that collection will be linearly independent.

synMehdi said:
Right? For the rest part of your answer it is very clear what we are trying to do (get the minimum residual). but it is not clear to me the ##QR## decomposition. Wouldn't juste the Moore-Penrose inverse suffice?
$$ \mathbf{X}\cdot \mathbf{X}^* \mathbf A = \mathbf{X}\mathbf Y $$
More care is needed -- what you have written is
## \mathbf{X}\cdot \mathbf{X}^* \mathbf A = \mathbf{X}\mathbf Y ##
this is wrong. It should say that we are working with conjugate transpose of A and Y
## \mathbf{X}\cdot \mathbf{X}^* \mathbf A^* = \mathbf{X}\mathbf Y^* ##

ok, working with that and using that ##\mathbf {QR} = \mathbf X^*##, you should be able to verify that they get to the same result. I.e. what I have shown in my prior post is the optimal (and unique) solution is given by
##\mathbf A^* := \mathbf{R} ^{-1} \mathbf Q^*\mathbf Y^*##

As I've already shown, ##\big(\mathbf{X} \mathbf{X}^*\big)## is non-singular so the solution to your normal equations form of the problem exists and is unique. Proceeding from here.

## \mathbf{X}\cdot \mathbf{X}^* \mathbf A^* = \mathbf{X}\mathbf Y^* ##
## =\big(\mathbf{X}^*\big)^* \mathbf{X}^* \mathbf A^* = \big(\mathbf{X}^*\big)^*\mathbf Y^* ##
## =\big(\mathbf{QR}\big)^* \mathbf{QR} \mathbf A^* = \big(\mathbf{QR}\big)^*\mathbf Y^* ##
## =\mathbf R^* \mathbf{Q}^* \mathbf{QR} \mathbf A^* = \mathbf R^*\mathbf{Q}^*\mathbf Y^* ##
## =\mathbf R^* \big(\mathbf{Q}^* \mathbf{Q}\big) \mathbf R \mathbf A^* = \mathbf R^*\mathbf{Q}^*\mathbf Y^* ##
## =\mathbf R^* \big(\mathbf{I}_m\big) \mathbf R \mathbf A^* = \mathbf R^*\mathbf{Q}^*\mathbf Y^* ##
## =\mathbf R^* \mathbf R \mathbf A^* = \mathbf R^*\mathbf{Q}^*\mathbf Y^* ##

since ##\mathbf R## is invertible so is its conjugate transpose, so multiply each side by

##\big(\mathbf R^*\big)^{-1}## and get
## \mathbf R \mathbf A^* = \mathbf{Q}^*\mathbf Y \longrightarrow \mathbf A^* = \mathbf R^{-1}\mathbf{Q}^*\mathbf Y^* ##

which confirms the solutions are the same. Really I'd hope that you would have been able to compare the two solutions and verify they are the same (even if you didn't fully understand the derivation). All you need is a definition of orthogonal and standard rules of matrix algebra.

synMehdi said:
Right? For the rest part of your answer it is very clear what we are trying to do (get the minimum residual). but it is not clear to me the ##QR## decomposition. Wouldn't juste the Moore-Penrose inverse suffice?
$$ \mathbf{X}\cdot \mathbf{X}^* \mathbf A = \mathbf{X}\mathbf Y $$
so revisiting this and this
synMehdi said:
Thank you, but I'm not able to prove what I've done or if the approach is correct...

suggests 3 final questions from me.

1.) Do you know a clean mathematically rigorous way to justify the Normal Equations approach that you are suggesting here -- without using what I've done based on orthgonality? The standard calculus derivation for normal equations is in reals -- doing it with scalars in ##\mathbb C## muddies the waters are lot. I wouldn't recommend it. My personal feeling is the calculus approach doesn't give very much geometric insight either, whereas wielding orthgonality (and when I do it -- projectors) does tie in with geometry.

2.) Do you know what Gramm Schmidt is? If not I guess my suggested approach won't make that much sense, though it is a pre-req to understanding a modern approach to least squares. You'll need to learn it sometime as it is used all over the place in math, science and engineering. If you undertstand it well, you can derive a rudimentary QR factorization on your own.

synMehdi said:
For the type of solution I would say mathematical for the rigorous understanding but the goal is to make a small python script out of it.

3.) I want to make a computational point here. The Normal Equations are not numerically stable -- you should never use them to actually solve this problem. On the other hand, the QR factorization approach is essentially the gold standard for solving your problem on a computer.
 
Last edited:

1. What is linear least squares regression?

Linear least squares regression is a statistical method used to identify the relationship between a dependent variable and one or more independent variables. It is a commonly used technique for predicting the values of a dependent variable based on the values of independent variables.

2. How does linear least squares regression work?

In linear least squares regression, the model aims to minimize the sum of squared errors between the predicted values and the actual values of the dependent variable. This is done by finding the line of best fit that minimizes the vertical distance between the data points and the line. The coefficients of the line represent the relationship between the independent and dependent variables.

3. What is a model matrix?

A model matrix is a matrix that contains the values of the independent variables used in a regression model. It is used to represent the relationship between the independent and dependent variables in a mathematical form.

4. How is a model matrix identified in linear least squares regression?

In linear least squares regression, the model matrix is identified by selecting the independent variables that are most relevant to the dependent variable. This can be done through various methods such as stepwise regression or by using domain knowledge to select the most important variables.

5. What are the assumptions of linear least squares regression?

The main assumptions of linear least squares regression include linearity, normality, independence, and homoscedasticity. Linearity assumes that the relationship between the dependent and independent variables is linear. Normality assumes that the errors of the model are normally distributed. Independence assumes that the errors of the model are not correlated with each other. Homoscedasticity assumes that the variance of the errors is constant across all values of the independent variables.

Similar threads

Replies
27
Views
1K
  • Linear and Abstract Algebra
Replies
1
Views
923
  • Linear and Abstract Algebra
Replies
4
Views
866
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Linear and Abstract Algebra
Replies
20
Views
3K
  • Introductory Physics Homework Help
Replies
4
Views
541
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Linear and Abstract Algebra
Replies
9
Views
1K
  • Linear and Abstract Algebra
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
Back
Top