Matrix derivative of quadratic form?

Click For Summary
The discussion focuses on finding the derivative of the quadratic form f(X) = a^T X b, where X is an n x n matrix and a, b are n x 1 vectors. The initial attempts to use the product rule were deemed incorrect, leading to confusion about the application of matrix calculus. It was suggested that the chain rule should be applied instead, treating the components of the matrices appropriately. Ultimately, the correct derivative is identified as a*b^T, which results in a matrix where each element is derived from the product of the respective elements of a and b. The conversation concludes with the participant successfully deriving the solution after clarifying their understanding of matrix elements.
perplexabot
Gold Member
Messages
328
Reaction score
5

Homework Statement


Find the derivative of f(X).
f(X) = transpose(a) * X * b

where:
X is nxn
a and b are n x 1
ai is the i'th element of a
Xnm is the element in row n and column m
let transpose(a) = aT
let transpose(b) = bT

Homework Equations


I tried using the product rule, which I assume is wrong.
I know the answer to be a*bT (but I have not the slightest clue how)

The Attempt at a Solution

[/B]
I tried many things, to the point where punching a whole through my screen doesn't really seem like a bad idea anymore.

My last attempt was to use the product rule along with some matrix properties, here is what I did:
d(f)/dX = [d(aT*X)/dX]*b + (aT*X)*[d(b)/dX] = [d(aT*X)/dX]*b = (d/dX)[Σai*X1i Σai*X2i ⋅ ⋅ ⋅ Σai*Xni]*b

I have no idea what to do next. I have a feeling using the product rule doesn't apply to matrices.
PLEASE HELP ME!

Thanks for reading...
 
Physics news on Phys.org
perplexabot said:
a and b are n x 1

As an example take n = 2

a = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix}

b = \begin {pmatrix} b_1 \\ b_2 \end{pmatrix}

X = \begin{pmatrix} x_{1\ 1} & x_{1\ 2} \\ x_{2\ 1} & x_{2\ 2} \end{pmatrix}

Then f(X) = a^T X b is a single number. ( We could say it is a 1x1 matrix.)
I know the answer to be a*bT
Then the answer would be \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} \begin{pmatrix} b_1 & b_2 \end{pmatrix} but what kind of multiplication does that represent? It can be worked as ordinary matrix multiplication to produce a 2x2 matrix.

ab^t = \begin{pmatrix} a_1b_1 & a_1b_2 \\ a_2b_1 & a_2 b_2 \end{pmatrix}

I don't know the details of your class materials, so I must guess about how "the derivative" of f(X) is defined.

One guess is that the derivative of f with respect to X is:

\begin{pmatrix} \frac{\partial f}{\partial x_{1\ 1}} &\frac{\partial f}{\partial x_{1\ 2}} \\ \frac{\partial f}{\partial x_{2\ 1}} & \frac{\partial f}{\partial x_{2\ 2}} \end{pmatrix}

Is that the definition you use?
 
Last edited:
Looking at the derivative with respect to the first term (1,1), you could use the limit definition to see what happens in the matrix multiplication.
## \lim_{h\to 0} \frac{f(X+\begin{pmatrix} h & 0 \\ 0 & 0 \end{pmatrix})-f(X)}{h} = ? ##
 
And to take a stab at why the product rule isn't working the way you had it above...
You are treating b like a constant, where really you have a composition of functions of X. g(X) = Xb, h(X) = aX, so f(X) = h(g(x)). You should use the chain rule instead of the product rule.
 
RUber said:
And to take a stab at why the product rule isn't working the way you had it above...
You are treating b like a constant, where really you have a composition of functions of X. g(X) = Xb, h(X) = aX, so f(X) = h(g(x)). You should use the chain rule instead of the product rule.

He should not use any of those things; it is just a straightforward matter, like saying ##(d/dx) (cx) = c## for constant ##c##. In fact,
f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j
 
Last edited:
Stephen Tashi said:
Is that the definition you use?
Yes! However I would like to solve it assuming I don't know what the answer is to be.
RUber said:
Looking at the derivative with respect to the first term (1,1), you could use the limit definition to see what happens in the matrix multiplication.
## \lim_{h\to 0} \frac{f(X+\begin{pmatrix} h & 0 \\ 0 & 0 \end{pmatrix})-f(X)}{h} = ? ##
I know you are sort of using the definition of a derivative but I don't get why you have a matrix with h in the top left corner.
Ray Vickson said:
He should not use any of those things; it is just a straightforward matter, like saying ##(d/dx) (cx) = x## for constant ##c##. In fact,
f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j
I have a couple questions about what you wrote, if I may.

##(d/dx) (cx) = x## for constant ##c## should this not be ##(d/dx) (cx) = c## for constant ##c## ?
For your equation of f(x): f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j
shouldn't the subscripts of x be reversed (ji instead of ij)?
Also how did the x go away : ( ??

Thank you so much!
 
perplexabot said:
Yes! However I would like to solve it assuming I don't know what the answer is to be.

I know you are sort of using the definition of a derivative but I don't get why you have a matrix with h in the top left corner.

I have a couple questions about what you wrote, if I may.

##(d/dx) (cx) = x## for constant ##c## should this not be ##(d/dx) (cx) = c## for constant ##c## ?
For your equation of f(x): f(X) = \sum_{i=1}^n \sum_{j=1}^n a_i x_{ij} b_j = \sum_{i,j=1}^n c_{ij} x_{ij}, \;\; c_{ij} = a_i b_j
shouldn't the subscripts of x be reversed (ji instead of ij)?
Also how did the x go away : ( ??

Thank you so much!

Yes, it should have been ##(d/dx) (cx) = c##; I have edited out the error.

I don't understand the second question: reverse i and j where? What I wrote was ##a^T X b## in expanded form. And, I don't see why you ask why/how the ##x## went away; it didn't---it is still there. Perhaps you wonder where the ##x## went at the end of the displayed equation? Well, when I said ##c_{ij} = a_i b_j##, that was just the definition of ##c_{ij}##. In other words, I wrote the sum with a ##c_{ij}## in it, so I have to define ##c_{ij}## somewhere. Perhaps I should have said " ... where ##c_{ij} = a_i b_j##".
 
Ray Vickson said:
Yes, it should have been ##(d/dx) (cx) = c##; I have edited out the error.

I don't understand the second question: reverse i and j where? What I wrote was ##a^T X b## in expanded form. And, I don't see why you ask why/how the ##x## went away; it didn't---it is still there. Perhaps you wonder where the ##x## went at the end of the displayed equation? Well, when I said ##c_{ij} = a_i b_j##, that was just the definition of ##c_{ij}##. In other words, I wrote the sum with a ##c_{ij}## in it, so I have to define ##c_{ij}## somewhere. Perhaps I should have said " ... where ##c_{ij} = a_i b_j##".

sorry! my last question is wrong. I read your equation as f(X) = aibj, so it is my fault.
Ok. I think I understand your equation then.

But what next? Product rule and chain rule? Or do I simply take the derivative of ##c_{ij}x_{ij}## with respect to ##x_{ij}##? If i do the latter procedure, I just get the sum of ##c_{ij}## terms.
EDIT: Actually I am wrong once again! You don't get the sum of ##c_{ij}##. You get a column vector with each row being a derivative of ##c_{ij}x_{ij}## with respect to an ##x_{ij}##, right?

Thank you for your patience : )
 
I finally was able to do this. I was trying to solve it without considering the elements of the matrix, when i think that is not possible. Here is my solution, for anyone that may be interested in the future. Thanks for the help from everyone.

gotIt.png
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 28 ·
Replies
28
Views
3K
  • · Replies 24 ·
Replies
24
Views
3K
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 10 ·
Replies
10
Views
10K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 5 ·
Replies
5
Views
1K