# Matrix derivative of quadratic form?

1. Dec 11, 2014

### perplexabot

1. The problem statement, all variables and given/known data
Find the derivative of f(X).
f(X) = transpose(a) * X * b

where:
X is nxn
a and b are n x 1
ai is the i'th element of a
Xnm is the element in row n and column m
let transpose(a) = aT
let transpose(b) = bT

2. Relevant equations
I tried using the product rule, which I assume is wrong.
I know the answer to be a*bT (but I have not the slightest clue how)

3. The attempt at a solution
I tried many things, to the point where punching a whole through my screen doesn't really seem like a bad idea anymore.

My last attempt was to use the product rule along with some matrix properties, here is what I did:
d(f)/dX = [d(aT*X)/dX]*b + (aT*X)*[d(b)/dX] = [d(aT*X)/dX]*b = (d/dX)[Σai*X1i Σai*X2i ⋅ ⋅ ⋅ Σai*Xni]*b

I have no idea what to do next. I have a feeling using the product rule doesn't apply to matrices.

2. Dec 12, 2014

### Stephen Tashi

As an example take $n = 2$

$a = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix}$

$b = \begin {pmatrix} b_1 \\ b_2 \end{pmatrix}$

$X = \begin{pmatrix} x_{1\ 1} & x_{1\ 2} \\ x_{2\ 1} & x_{2\ 2} \end{pmatrix}$

Then $f(X) = a^T X b$ is a single number. ( We could say it is a 1x1 matrix.)

Then the answer would be $\begin{pmatrix} a_1 \\ a_2 \end{pmatrix} \begin{pmatrix} b_1 & b_2 \end{pmatrix}$ but what kind of multiplication does that represent? It can be worked as ordinary matrix multiplication to produce a 2x2 matrix.

$ab^t = \begin{pmatrix} a_1b_1 & a_1b_2 \\ a_2b_1 & a_2 b_2 \end{pmatrix}$

I don't know the details of your class materials, so I must guess about how "the derivative" of f(X) is defined.

One guess is that the derivative of $f$ with respect to $X$ is:

$\begin{pmatrix} \frac{\partial f}{\partial x_{1\ 1}} &\frac{\partial f}{\partial x_{1\ 2}} \\ \frac{\partial f}{\partial x_{2\ 1}} & \frac{\partial f}{\partial x_{2\ 2}} \end{pmatrix}$

Is that the definition you use?

Last edited: Dec 12, 2014
3. Dec 12, 2014

### RUber

Looking at the derivative with respect to the first term (1,1), you could use the limit definition to see what happens in the matrix multiplication.
$\lim_{h\to 0} \frac{f(X+\begin{pmatrix} h & 0 \\ 0 & 0 \end{pmatrix})-f(X)}{h} = ?$

4. Dec 12, 2014

### RUber

And to take a stab at why the product rule isn't working the way you had it above...
You are treating b like a constant, where really you have a composition of functions of X. g(X) = Xb, h(X) = aX, so f(X) = h(g(x)). You should use the chain rule instead of the product rule.

5. Dec 12, 2014

### Ray Vickson

He should not use any of those things; it is just a straightforward matter, like saying $(d/dx) (cx) = c$ for constant $c$. In fact,
$$f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j$$

Last edited: Dec 12, 2014
6. Dec 12, 2014

### perplexabot

Yes! However I would like to solve it assuming I don't know what the answer is to be.
I know you are sort of using the definition of a derivative but I don't get why you have a matrix with h in the top left corner.
I have a couple questions about what you wrote, if I may.

$(d/dx) (cx) = x$ for constant $c$ should this not be $(d/dx) (cx) = c$ for constant $c$ ?
For your equation of f(x): $$f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j$$
shouldn't the subscripts of x be reversed (ji instead of ij)?
Also how did the x go away : ( ??

Thank you so much!

7. Dec 12, 2014

### Ray Vickson

Yes, it should have been $(d/dx) (cx) = c$; I have edited out the error.

I don't understand the second question: reverse i and j where? What I wrote was $a^T X b$ in expanded form. And, I don't see why you ask why/how the $x$ went away; it didn't---it is still there. Perhaps you wonder where the $x$ went at the end of the displayed equation? Well, when I said $c_{ij} = a_i b_j$, that was just the definition of $c_{ij}$. In other words, I wrote the sum with a $c_{ij}$ in it, so I have to define $c_{ij}$ somewhere. Perhaps I should have said " ... where $c_{ij} = a_i b_j$".

8. Dec 12, 2014

### perplexabot

sorry! my last question is wrong. I read your equation as f(X) = aibj, so it is my fault.
Ok. I think I understand your equation then.

But what next? Product rule and chain rule? Or do I simply take the derivative of $c_{ij}x_{ij}$ with respect to $x_{ij}$? If i do the latter procedure, I just get the sum of $c_{ij}$ terms.
EDIT: Actually I am wrong once again! You don't get the sum of $c_{ij}$. You get a column vector with each row being a derivative of $c_{ij}x_{ij}$ with respect to an $x_{ij}$, right?

Thank you for your patience : )

9. Dec 13, 2014

### perplexabot

I finally was able to do this. I was trying to solve it without considering the elements of the matrix, when i think that is not possible. Here is my solution, for anyone that may be interested in the future. Thanks for the help from everyone.