Differentiation with matrices/vectors

  1. Hello,

    I'm trying to understand this proof:


    Can someone quickly talk me through the differentiation step, bearing in mind I've never learn how to differentiate with respect to a vector?

    Most confusing for me is:

    1. why are they differentiating with respect to the transpose b' rather than just b?
    2. where does the -2X'y term come from?
    3. is there any assumption here that X is square?

    Thanks for any help,
