Matrix derivative of quadratic form?

perplexabot
Gold Member
Messages
328
Reaction score
5

Homework Statement


Find the derivative of f(X).
f(X) = transpose(a) * X * b

where:
X is nxn
a and b are n x 1
ai is the i'th element of a
Xnm is the element in row n and column m
let transpose(a) = aT
let transpose(b) = bT

Homework Equations


I tried using the product rule, which I assume is wrong.
I know the answer to be a*bT (but I have not the slightest clue how)

The Attempt at a Solution

[/B]
I tried many things, to the point where punching a whole through my screen doesn't really seem like a bad idea anymore.

My last attempt was to use the product rule along with some matrix properties, here is what I did:
d(f)/dX = [d(aT*X)/dX]*b + (aT*X)*[d(b)/dX] = [d(aT*X)/dX]*b = (d/dX)[Σai*X1i Σai*X2i ⋅ ⋅ ⋅ Σai*Xni]*b

I have no idea what to do next. I have a feeling using the product rule doesn't apply to matrices.
PLEASE HELP ME!

Thanks for reading...
 
Physics news on Phys.org
perplexabot said:
a and b are n x 1

As an example take n = 2

a = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix}

b = \begin {pmatrix} b_1 \\ b_2 \end{pmatrix}

X = \begin{pmatrix} x_{1\ 1} & x_{1\ 2} \\ x_{2\ 1} & x_{2\ 2} \end{pmatrix}

Then f(X) = a^T X b is a single number. ( We could say it is a 1x1 matrix.)
I know the answer to be a*bT
Then the answer would be \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} \begin{pmatrix} b_1 & b_2 \end{pmatrix} but what kind of multiplication does that represent? It can be worked as ordinary matrix multiplication to produce a 2x2 matrix.

ab^t = \begin{pmatrix} a_1b_1 & a_1b_2 \\ a_2b_1 & a_2 b_2 \end{pmatrix}

I don't know the details of your class materials, so I must guess about how "the derivative" of f(X) is defined.

One guess is that the derivative of f with respect to X is:

\begin{pmatrix} \frac{\partial f}{\partial x_{1\ 1}} &\frac{\partial f}{\partial x_{1\ 2}} \\ \frac{\partial f}{\partial x_{2\ 1}} & \frac{\partial f}{\partial x_{2\ 2}} \end{pmatrix}

Is that the definition you use?
 
Last edited:
Looking at the derivative with respect to the first term (1,1), you could use the limit definition to see what happens in the matrix multiplication.
## \lim_{h\to 0} \frac{f(X+\begin{pmatrix} h & 0 \\ 0 & 0 \end{pmatrix})-f(X)}{h} = ? ##
 
And to take a stab at why the product rule isn't working the way you had it above...
You are treating b like a constant, where really you have a composition of functions of X. g(X) = Xb, h(X) = aX, so f(X) = h(g(x)). You should use the chain rule instead of the product rule.
 
RUber said:
And to take a stab at why the product rule isn't working the way you had it above...
You are treating b like a constant, where really you have a composition of functions of X. g(X) = Xb, h(X) = aX, so f(X) = h(g(x)). You should use the chain rule instead of the product rule.

He should not use any of those things; it is just a straightforward matter, like saying ##(d/dx) (cx) = c## for constant ##c##. In fact,
f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j
 
Last edited:
Stephen Tashi said:
Is that the definition you use?
Yes! However I would like to solve it assuming I don't know what the answer is to be.
RUber said:
Looking at the derivative with respect to the first term (1,1), you could use the limit definition to see what happens in the matrix multiplication.
## \lim_{h\to 0} \frac{f(X+\begin{pmatrix} h & 0 \\ 0 & 0 \end{pmatrix})-f(X)}{h} = ? ##
I know you are sort of using the definition of a derivative but I don't get why you have a matrix with h in the top left corner.
Ray Vickson said:
He should not use any of those things; it is just a straightforward matter, like saying ##(d/dx) (cx) = x## for constant ##c##. In fact,
f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j
I have a couple questions about what you wrote, if I may.

##(d/dx) (cx) = x## for constant ##c## should this not be ##(d/dx) (cx) = c## for constant ##c## ?
For your equation of f(x): f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j
shouldn't the subscripts of x be reversed (ji instead of ij)?
Also how did the x go away : ( ??

Thank you so much!
 
perplexabot said:
Yes! However I would like to solve it assuming I don't know what the answer is to be.

I know you are sort of using the definition of a derivative but I don't get why you have a matrix with h in the top left corner.

I have a couple questions about what you wrote, if I may.

##(d/dx) (cx) = x## for constant ##c## should this not be ##(d/dx) (cx) = c## for constant ##c## ?
For your equation of f(x): f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j
shouldn't the subscripts of x be reversed (ji instead of ij)?
Also how did the x go away : ( ??

Thank you so much!

Yes, it should have been ##(d/dx) (cx) = c##; I have edited out the error.

I don't understand the second question: reverse i and j where? What I wrote was ##a^T X b## in expanded form. And, I don't see why you ask why/how the ##x## went away; it didn't---it is still there. Perhaps you wonder where the ##x## went at the end of the displayed equation? Well, when I said ##c_{ij} = a_i b_j##, that was just the definition of ##c_{ij}##. In other words, I wrote the sum with a ##c_{ij}## in it, so I have to define ##c_{ij}## somewhere. Perhaps I should have said " ... where ##c_{ij} = a_i b_j##".
 
Ray Vickson said:
Yes, it should have been ##(d/dx) (cx) = c##; I have edited out the error.

I don't understand the second question: reverse i and j where? What I wrote was ##a^T X b## in expanded form. And, I don't see why you ask why/how the ##x## went away; it didn't---it is still there. Perhaps you wonder where the ##x## went at the end of the displayed equation? Well, when I said ##c_{ij} = a_i b_j##, that was just the definition of ##c_{ij}##. In other words, I wrote the sum with a ##c_{ij}## in it, so I have to define ##c_{ij}## somewhere. Perhaps I should have said " ... where ##c_{ij} = a_i b_j##".

sorry! my last question is wrong. I read your equation as f(X) = aibj, so it is my fault.
Ok. I think I understand your equation then.

But what next? Product rule and chain rule? Or do I simply take the derivative of ##c_{ij}x_{ij}## with respect to ##x_{ij}##? If i do the latter procedure, I just get the sum of ##c_{ij}## terms.
EDIT: Actually I am wrong once again! You don't get the sum of ##c_{ij}##. You get a column vector with each row being a derivative of ##c_{ij}x_{ij}## with respect to an ##x_{ij}##, right?

Thank you for your patience : )
 
I finally was able to do this. I was trying to solve it without considering the elements of the matrix, when i think that is not possible. Here is my solution, for anyone that may be interested in the future. Thanks for the help from everyone.

gotIt.png
 

Similar threads

Replies
3
Views
2K
Replies
28
Views
3K
Replies
2
Views
2K
Replies
4
Views
1K
Replies
10
Views
9K
Replies
2
Views
3K
Replies
5
Views
1K
Back
Top