# Matrix derivative of quadratic form?

Gold Member

## Homework Statement

Find the derivative of f(X).
f(X) = transpose(a) * X * b

where:
X is nxn
a and b are n x 1
ai is the i'th element of a
Xnm is the element in row n and column m
let transpose(a) = aT
let transpose(b) = bT

## Homework Equations

I tried using the product rule, which I assume is wrong.
I know the answer to be a*bT (but I have not the slightest clue how)

## The Attempt at a Solution

[/B]
I tried many things, to the point where punching a whole through my screen doesn't really seem like a bad idea anymore.

My last attempt was to use the product rule along with some matrix properties, here is what I did:
d(f)/dX = [d(aT*X)/dX]*b + (aT*X)*[d(b)/dX] = [d(aT*X)/dX]*b = (d/dX)[Σai*X1i Σai*X2i ⋅ ⋅ ⋅ Σai*Xni]*b

I have no idea what to do next. I have a feeling using the product rule doesn't apply to matrices.

Related Calculus and Beyond Homework Help News on Phys.org
Stephen Tashi
a and b are n x 1
As an example take $n = 2$

$a = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix}$

$b = \begin {pmatrix} b_1 \\ b_2 \end{pmatrix}$

$X = \begin{pmatrix} x_{1\ 1} & x_{1\ 2} \\ x_{2\ 1} & x_{2\ 2} \end{pmatrix}$

Then $f(X) = a^T X b$ is a single number. ( We could say it is a 1x1 matrix.)

I know the answer to be a*bT

Then the answer would be $\begin{pmatrix} a_1 \\ a_2 \end{pmatrix} \begin{pmatrix} b_1 & b_2 \end{pmatrix}$ but what kind of multiplication does that represent? It can be worked as ordinary matrix multiplication to produce a 2x2 matrix.

$ab^t = \begin{pmatrix} a_1b_1 & a_1b_2 \\ a_2b_1 & a_2 b_2 \end{pmatrix}$

I don't know the details of your class materials, so I must guess about how "the derivative" of f(X) is defined.

One guess is that the derivative of $f$ with respect to $X$ is:

$\begin{pmatrix} \frac{\partial f}{\partial x_{1\ 1}} &\frac{\partial f}{\partial x_{1\ 2}} \\ \frac{\partial f}{\partial x_{2\ 1}} & \frac{\partial f}{\partial x_{2\ 2}} \end{pmatrix}$

Is that the definition you use?

Last edited:
RUber
Homework Helper
Looking at the derivative with respect to the first term (1,1), you could use the limit definition to see what happens in the matrix multiplication.
$\lim_{h\to 0} \frac{f(X+\begin{pmatrix} h & 0 \\ 0 & 0 \end{pmatrix})-f(X)}{h} = ?$

RUber
Homework Helper
And to take a stab at why the product rule isn't working the way you had it above...
You are treating b like a constant, where really you have a composition of functions of X. g(X) = Xb, h(X) = aX, so f(X) = h(g(x)). You should use the chain rule instead of the product rule.

Ray Vickson
Homework Helper
Dearly Missed
And to take a stab at why the product rule isn't working the way you had it above...
You are treating b like a constant, where really you have a composition of functions of X. g(X) = Xb, h(X) = aX, so f(X) = h(g(x)). You should use the chain rule instead of the product rule.
He should not use any of those things; it is just a straightforward matter, like saying $(d/dx) (cx) = c$ for constant $c$. In fact,
$$f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j$$

Last edited:
Gold Member
Is that the definition you use?
Yes! However I would like to solve it assuming I don't know what the answer is to be.
Looking at the derivative with respect to the first term (1,1), you could use the limit definition to see what happens in the matrix multiplication.
$\lim_{h\to 0} \frac{f(X+\begin{pmatrix} h & 0 \\ 0 & 0 \end{pmatrix})-f(X)}{h} = ?$
I know you are sort of using the definition of a derivative but I don't get why you have a matrix with h in the top left corner.
He should not use any of those things; it is just a straightforward matter, like saying $(d/dx) (cx) = x$ for constant $c$. In fact,
$$f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j$$
I have a couple questions about what you wrote, if I may.

$(d/dx) (cx) = x$ for constant $c$ should this not be $(d/dx) (cx) = c$ for constant $c$ ?
For your equation of f(x): $$f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j$$
shouldn't the subscripts of x be reversed (ji instead of ij)?
Also how did the x go away : ( ??

Thank you so much!

Ray Vickson
Homework Helper
Dearly Missed
Yes! However I would like to solve it assuming I don't know what the answer is to be.

I know you are sort of using the definition of a derivative but I don't get why you have a matrix with h in the top left corner.

I have a couple questions about what you wrote, if I may.

$(d/dx) (cx) = x$ for constant $c$ should this not be $(d/dx) (cx) = c$ for constant $c$ ?
For your equation of f(x): $$f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j$$
shouldn't the subscripts of x be reversed (ji instead of ij)?
Also how did the x go away : ( ??

Thank you so much!
Yes, it should have been $(d/dx) (cx) = c$; I have edited out the error.

I don't understand the second question: reverse i and j where? What I wrote was $a^T X b$ in expanded form. And, I don't see why you ask why/how the $x$ went away; it didn't---it is still there. Perhaps you wonder where the $x$ went at the end of the displayed equation? Well, when I said $c_{ij} = a_i b_j$, that was just the definition of $c_{ij}$. In other words, I wrote the sum with a $c_{ij}$ in it, so I have to define $c_{ij}$ somewhere. Perhaps I should have said " ... where $c_{ij} = a_i b_j$".

Gold Member
Yes, it should have been $(d/dx) (cx) = c$; I have edited out the error.

I don't understand the second question: reverse i and j where? What I wrote was $a^T X b$ in expanded form. And, I don't see why you ask why/how the $x$ went away; it didn't---it is still there. Perhaps you wonder where the $x$ went at the end of the displayed equation? Well, when I said $c_{ij} = a_i b_j$, that was just the definition of $c_{ij}$. In other words, I wrote the sum with a $c_{ij}$ in it, so I have to define $c_{ij}$ somewhere. Perhaps I should have said " ... where $c_{ij} = a_i b_j$".
sorry! my last question is wrong. I read your equation as f(X) = aibj, so it is my fault.
Ok. I think I understand your equation then.

But what next? Product rule and chain rule? Or do I simply take the derivative of $c_{ij}x_{ij}$ with respect to $x_{ij}$? If i do the latter procedure, I just get the sum of $c_{ij}$ terms.
EDIT: Actually I am wrong once again! You don't get the sum of $c_{ij}$. You get a column vector with each row being a derivative of $c_{ij}x_{ij}$ with respect to an $x_{ij}$, right?

Thank you for your patience : )

Gold Member
I finally was able to do this. I was trying to solve it without considering the elements of the matrix, when i think that is not possible. Here is my solution, for anyone that may be interested in the future. Thanks for the help from everyone. 