# Strange data fitting problem

I have two matrices, A and D, with same numbers of rows and different numbers of columns (A has many more columns than D), I want to find x and y such that ||Ax-Dy||_2 is minimized. I.e., I want to find the closest vectors in span{A} and span{D}. Seems like a simple problem, but couldn't figure it out. Any suggestions? (A and D are linearly independent, so that span{A} and span{D} have no nonzero intersection)

Stephen Tashi
First, let's try to state the problem clearly. Your statement about finding 'x' and 'y' isn't clear because it isn't clear whether "Ax" is supposed to represent a column vector or whether it represents the matrix "A" times a vector "x".

We could try it this way first:

I have two sets of n dimensional vectors A and D. Set D has greater cardinality that set A. The span of set A and the span of set D are vector spaces whose only intersection is the zero vector. How do I find vectors x and y such that x is in the span of A and y is in the span of D and the distance between x and y (i.e. $|| x- y||_2$) is minimal?

The answer, of course, is to set both x and y equal to the zero vector. Assuming that's not what you want to do, how do we modify the statement of the problem to say what you want?

Thank you for correcting the problem statement, following your statement what I want to minimize is is the angle between x and y, i.e., maximize $\frac{<x,y>}{||x||_2||y||_2}$, and I don't want the trivial 0 solution. Where do I go from here?

Stephen Tashi
maximize $\frac{<x,y>}{||x||_2||y||_2}$,

I don't know an easy way to do this. You may as well consider only unit vectors, so the problem becomes to maximize $<\hat{x},\hat{y}>$. As far as I can see this problem falls under the heading of a "bilinear optimization problem" or, more generally, a "multilinear optimization problem".

My intuition is that if you have two vector subspaces that only intersect at the zero vector, then you should be able to find a set of vectors ${e_1,e_2,..,e_n, f_1,f_2,...,f_m}$ such that this set is a (non-orthogonal) basis for the parent n+m dimensional space, the $e_i$ are an orthonormal basis for the first subspace and the $f_i$ are an orthonormal basis for the second subspace.

If that inutition is correct then let $\hat{x} = \sum_1^n \alpha_i e_i$ and $\hat{y} = \sum_1^m \beta_j f_j$. Let $c_{i,j} = <e_i, f_j>$.

The problem is to maximize the function $\sum_{i=1}^n \sum_{j=1}^m c_{i,j} \alpha_i \beta_j$ subject to the constraints $\sum_1^n \alpha_i^2 = 1$ and $\sum_1^n \beta_j^2 = 1$.

I wonder if there is a simpler formulation.

It seems that
$$<x,y>=(Aa)'(Db)=a'A'Db=a'U'SVb=(Ua)'S(Vb)$$
where $A'D=U'SV$ is the SVD, since both U and V are orthonormal, the minimum angle occurs at the largest singular value in S. Does that sound right?

Yes but never needed to implement the details. But my problem does not seem to formulate as an LP problem, does it? It's quadratic.

objective function is the dot product of two unknown vectors x and y.

Stephen Tashi
It seems that
$$<x,y>=(Aa)'(Db)=a'A'Db=a'U'SVb=(Ua)'S(Vb)$$
where $A'D=U'SV$ is the SVD, since both U and V are orthonormal, the minimum angle occurs at the largest singular value in S. Does that sound right?

What do you mean by "at" the largest singular value? Do you mean we set all the entries of vector $a$ equal to zero except for one of them?

If $A = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}$, $B = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}$ then $A'D = \begin{pmatrix} 1 & 1 \end{pmatrix}$.

$A'D$ is equal to the same thing if $A = \begin{pmatrix} 1 \\ 1 \\ 2 \end{pmatrix}$ $B = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}$

What do you mean by "at" the largest singular value? Do you mean we set all the entries of vector $a$ equal to zero except for one of them?

If $A = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}$, $B = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}$ then $A'D = \begin{pmatrix} 1 & 1 \end{pmatrix}$.

$A'D$ is equal to the same thing if $A = \begin{pmatrix} 1 \\ 1 \\ 2 \end{pmatrix}$ $B = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}$

Assuming A and B are both orthnormal, then the coefficient vectors a and b are both unit vectors so that x and y are also unit vectors. Then maximization of <x,y> correspond to minimal angles between x and y. In order to maximize <x,y>=(Ua)'*S*(Vb), given that ||Ua||=1 and ||Vb||=1, I want to choose Ua and Vb to be 1 at the largest singular values and 0 elsewhere, not that a and b are 1 at one place and 0 everywhere else. Is the logic correct?

Stephen Tashi
I want to choose Ua and Vb to be 1 at the largest singular values and 0 elsewhere

Are you saying that vector 'a' will be chosen so that the vector Ua will be 1 at the jth component iff the largest singular value occurs in S at location S[j][j] and the vector Ua will be zero elsewhere?

In these two examples, do we have the same matrix for A'D but different answers for the maximum angle? (My 4-D intuition isn't good, so I'm not sure.)

Example 1: $A = \begin{pmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}}\\ \frac{2}{\sqrt{15}}\\ \frac{1}{\sqrt{15}} \end{pmatrix} ,\ D = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ 0 & 0 \end{pmatrix}$

Example 2: $A = \begin{pmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{6}}\\ \frac{1}{\sqrt{6}} \end{pmatrix} ,\ D = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ 0 & 0 \end{pmatrix}$

Last edited:
Are you saying that vector 'a' will be chosen so that the vector Ua will be 1 at the jth component iff the largest singular value occurs in S at location S[j][j] and the vector Ua will be zero elsewhere?

In these two examples, do we have the same matrix for A'D but different answers for the maximum angle? (My 4-D intuition isn't good, so I'm not sure.)

Example 1: $A = \begin{pmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}}\\ \frac{2}{\sqrt{15}}\\ \frac{1}{\sqrt{15}} \end{pmatrix} ,\ D = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ 0 & 0 \end{pmatrix}$

Example 2: $A = \begin{pmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{6}}\\ \frac{1}{\sqrt{6}} \end{pmatrix} ,\ D = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ 0 & 0 \end{pmatrix}$

As I solved this example, since A'*D is the same, so are a and b, actually a=1 and b=[1/sqrt(2),1/sqrt(2)]. But x1=A1≠x2=A2. However the angle is the same, since <x,y>=(Ua)'*S*(Vb) is totally determined by A'*D. Seemingly logical.

Stephen Tashi
As I solved this example, since A'*D is the same, so are a and b, actually a=1 and b=[1/sqrt(2),1/sqrt(2)]. But x1=A1≠x2=A2. However the angle is the same, since <x,y>=(Ua)'*S*(Vb) is totally determined by A'*D. Seemingly logical.

As I solved this example, since A'*D is the same, so are a and b, actually a=1 and b=[1/sqrt(2),1/sqrt(2)]. But x1=A1≠x2=A2. However the angle is the same, since <x,y>=(Ua)'*S*(Vb) is totally determined by A'*D. Seemingly logical.

I see what you mean. $A'D = \begin{pmatrix} \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \end{pmatrix}$

$A'D = \begin{pmatrix} 1 \end{pmatrix}$ $\begin{pmatrix}\frac{\sqrt{2}}{\sqrt{3}} & 0 \end{pmatrix}$ $\begin{pmatrix}\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{2}} \end{pmatrix}$

You have shown that <x,y> is determined by A'D. This isn't a result I have seen before.

You haven't explained why the maximum possible <x,y> is equal to the largest singular value or why vectors a and b must exist that produce this value. (Are we maximizing <x,y> or maximizing the absolute value of <x,y>?)

Last edited:
chiro
objective function is the dot product of two unknown vectors x and y.

All valid dot, inner, or scalar (in this context) functions are always bi-linear. If you are trying to maximize only an inner product, especially in a flat space, you will always get a bilinear problem. I'm assuming it's flat because you mentioned fot product which is usually associated with cartesian space: if it's not then please post your inner product definition.

Since you want to minimize ||Ax-Dy||_2 then just minimize <Ax-Dy,Ax-Dy> which is

<Ax-Dy,Ax-Dy> = <Ax,Ax> - 2<Ax,Dy> + <Dy,Dy>

Now if x and y are vectors, Ax will be bilinear in each component as will Dy which means the whole thing will be a multilinear expression.

Also minimizing the square of the norm is equivalent to minimizing the norm itself as both are purely increasing monotonic functions and since the answer is always greater than or equal to zero.