Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Cost Function

  1. Mar 24, 2009 #1
    I'm not sure how you apply the rules of a derivative on a quadratic form. I've been trying to find the solution on google but no luck:


    [tex]J=\frac{1}{2} (z-X \theta)^T (z-X \theta)[/tex]


    [tex]\frac{ \partial J}{\partial \theta}= -X^T z +X^TX\theta[/tex]

    I cant for the life of me figure out how they got from the upper equation to the lower equation. The reason is that the transpose is really screwing things up in terms of the deriatives. There is some rule being applied to matrix differentiation of a transpose of a quadratic form that I am ignorant of, which wont let me get to the same expression on the second line.......

    Every time I try to expand the top line out I end up with 2*cross product term that doesnt drop out, but is clearly not shown in the second line.
  2. jcsd
  3. Mar 25, 2009 #2


    User Avatar
    Science Advisor
    Homework Helper

    Hi Cyrus! :smile:

    It's the usual product rule: (fg)' = f'g + fg',

    which here is (fTg)' = f'Tg + fTg'

    Since f = g, that comes out as f'Tf + fTf'

    and since z (i assume that mean zI) and θ commute with anything, you should get the given result :wink:
  4. Mar 26, 2009 #3
    This isnt working. To be clear z and theta are vectors, not scalars.

    After expanding Im getting

    [tex] (-X^Tz+X^TX\theta) -(Xz^T + X\theta^TX^T)[/tex]

    If the two things in brackets could equal twice each term, then the one half would knock out the two and make things right.

    Basically, [tex]Xz^T = X^T z[/tex]
  5. Mar 26, 2009 #4
    Got the same thing

    [tex]Xz^T = X^T z[/tex]

    if this is to be true, then Xz must be symmetric

    A symmetric matrix is when it's equal to its transpose

    A = A^T
  6. Mar 27, 2009 #5


    User Avatar
    Science Advisor
    Homework Helper

    No, the T must always come first: XT z = zT X.

    I'm honestly not followng this …

    if θ is a vector, how can you differentiate with respect to it?

    and you originally called it a quadratic … what's quadratic about it unless each bracket is a vector? :confused:

    What is the context of this?
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook