Solving Quadratic Form Derivative: A Frustrating Challenge

Click For Summary
SUMMARY

The discussion centers on the differentiation of a quadratic form represented by the equation J = 1/2 (z - Xθ)ᵀ(z - Xθ). The derivative with respect to θ is given as ∂J/∂θ = -Xᵀz + XᵀXθ. Participants highlight the application of the product rule in matrix differentiation, specifically addressing the confusion surrounding the transpose operation and the properties of symmetric matrices. Key insights include the necessity for z and θ to be vectors and the importance of understanding matrix transpose properties in achieving the correct derivative expression.

PREREQUISITES
  • Understanding of matrix calculus, specifically matrix differentiation.
  • Familiarity with quadratic forms in linear algebra.
  • Knowledge of the product rule in differentiation.
  • Concept of symmetric matrices and their properties.
NEXT STEPS
  • Study matrix differentiation techniques, focusing on the product rule.
  • Explore quadratic forms and their applications in optimization problems.
  • Learn about the properties of symmetric matrices and their implications in linear algebra.
  • Review vector calculus, particularly differentiation with respect to vector variables.
USEFUL FOR

Mathematicians, data scientists, and machine learning practitioners who require a solid understanding of matrix calculus and its applications in optimization and statistical modeling.

Cyrus
Messages
3,237
Reaction score
17
I'm not sure how you apply the rules of a derivative on a quadratic form. I've been trying to find the solution on google but no luck:

Basicallly:

J=\frac{1}{2} (z-X \theta)^T (z-X \theta)

and

\frac{ \partial J}{\partial \theta}= -X^T z +X^TX\theta

I can't for the life of me figure out how they got from the upper equation to the lower equation. The reason is that the transpose is really screwing things up in terms of the deriatives. There is some rule being applied to matrix differentiation of a transpose of a quadratic form that I am ignorant of, which won't let me get to the same expression on the second line...

Every time I try to expand the top line out I end up with 2*cross product term that doesn't drop out, but is clearly not shown in the second line.
 
Physics news on Phys.org
Hi Cyrus! :smile:

It's the usual product rule: (fg)' = f'g + fg',

which here is (fTg)' = f'Tg + fTg'

Since f = g, that comes out as f'Tf + fTf'

and since z (i assume that mean zI) and θ commute with anything, you should get the given result :wink:
 
tiny-tim said:
Hi Cyrus! :smile:

It's the usual product rule: (fg)' = f'g + fg',

which here is (fTg)' = f'Tg + fTg'

Since f = g, that comes out as f'Tf + fTf'

and since z (i assume that mean zI) and θ commute with anything, you should get the given result :wink:

This isn't working. To be clear z and theta are vectors, not scalars.

After expanding I am getting

(-X^Tz+X^TX\theta) -(Xz^T + X\theta^TX^T)


If the two things in brackets could equal twice each term, then the one half would knock out the two and make things right.

Basically, Xz^T = X^T z
 
Got the same thing

Xz^T = X^T z


if this is to be true, then Xz must be symmetric

A symmetric matrix is when it's equal to its transpose

A = A^T
 
Cyrus said:
This isn't working. To be clear z and theta are vectors, not scalars.

After expanding I am getting

(-X^Tz+X^TX\theta) -(Xz^T + X\theta^TX^T)

If the two things in brackets could equal twice each term, then the one half would knock out the two and make things right.

Basically, Xz^T = X^T z

No, the T must always come first: XT z = zT X.

I'm honestly not followng this …

if θ is a vector, how can you differentiate with respect to it?

and you originally called it a quadratic … what's quadratic about it unless each bracket is a vector? :confused:

What is the context of this?
 
I am studying the mathematical formalism behind non-commutative geometry approach to quantum gravity. I was reading about Hopf algebras and their Drinfeld twist with a specific example of the Moyal-Weyl twist defined as F=exp(-iλ/2θ^(μν)∂_μ⊗∂_ν) where λ is a constant parametar and θ antisymmetric constant tensor. {∂_μ} is the basis of the tangent vector space over the underlying spacetime Now, from my understanding the enveloping algebra which appears in the definition of the Hopf algebra...

Similar threads

  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 8 ·
Replies
8
Views
1K
Replies
2
Views
2K
Replies
4
Views
1K
  • · Replies 20 ·
Replies
20
Views
19K
  • · Replies 9 ·
Replies
9
Views
3K
Replies
8
Views
2K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K