Solving Matrix Derivatives: Theorems & Examples

In summary, the conversation discusses difficulties with understanding the derivatives of matrices and vectors, specifically the proof for the formula d tr(AXB)/ dX = A^T B^T. The conversation also mentions the use of indices and the summation of repeated indices in order to obtain the trace of a matrix. The conversation ends with a request for theorems related to taking derivatives of vectors and matrices.
  • #1
olds442
6
0
I have encountered some problems that have to do with the derivatives of matrices... I have NO experience with these and had little luck finding any theorems... I looked on wikipedia for some help and found a few definitions, but I am still unclear about how this is proven or attained... here is an example from wikipedia:

d tr(AXB)/ dX = A^T B^T

my question is... how are they getting that?!? I seem to be having a big mind block with this..

Any theorems about how to take derivatives of vectors or matrices would be great!

any help would be appreciated!
 
Physics news on Phys.org
  • #2
Think indices. tr(AXB)=A_ij*X_jk*B_ki. The lm component of the derivative matrix is the derivative of that with respect to X_lm. The only terms that contribute to that are terms where j=l and k=m. Removing the X_lm since it is differentiated leaves A_il*B_mi. That's (A^T)_li*(B^T)_im=(A^T*B^T)_lm. So the lm component of d(tr(AXB))/dX is the same as (A^T*B^T). So they are equal as matrices.
 
  • #3
hmmm.. if they are defined as square matrices, the tr(AXB) would be given by A_ii*X_ii*B_ii so that tr(AXB) is a square matrix whose diagonal elements are all AXB correct? If not, there is definitely something here that I am missing...
 
  • #4
AXB_ij=A_ik*X_kl*B_lj. Repeated indices are summed over (I don't think I emphasized that). To get the trace, just set i=j and sum over it. Leave the summed dummy indices alone!
 

1. What are matrix derivatives?

Matrix derivatives are a way of calculating the rate of change of a matrix with respect to one of its components. They are useful in many fields of science, particularly in statistics and machine learning.

2. Why are matrix derivatives important?

Matrix derivatives are important because they allow us to find the most optimal values for the components of a matrix, which can then be used to solve complex problems and make predictions in various fields. They also help us understand the relationship between different components of a matrix.

3. What are the main theorems used in solving matrix derivatives?

The main theorems used in solving matrix derivatives are the Product Rule, Quotient Rule, Chain Rule, and the Transpose Rule. These theorems are used to calculate the derivatives of different types of matrix operations, such as addition, multiplication, and inversion.

4. Can you provide an example of solving a matrix derivative?

Sure, let's say we have a matrix A = [[2, 4], [6, 8]] and we want to find the derivative of A with respect to x. Using the Product Rule, we can calculate dA/dx = [[2x, 4x], [6x, 8x]].

5. Are there any common mistakes when solving matrix derivatives?

Yes, some common mistakes when solving matrix derivatives include forgetting to apply the correct rule, not considering the dimensions of the matrices involved, and not simplifying the final result. It is important to carefully follow the steps and double check the final answer to avoid these mistakes.

Similar threads

  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • Linear and Abstract Algebra
Replies
8
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
2K
  • Calculus and Beyond Homework Help
Replies
6
Views
960
  • Calculus and Beyond Homework Help
Replies
5
Views
877
  • Calculus and Beyond Homework Help
Replies
3
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
519
  • Calculus and Beyond Homework Help
Replies
4
Views
938
  • Calculus and Beyond Homework Help
Replies
1
Views
904
  • Calculus and Beyond Homework Help
Replies
3
Views
566
Back
Top