I'm trying to understand this proof:

http://en.wikipedia.org/wiki/Proofs...st_squares#Least_squares_estimator_for_.CE.B2

Can someone quickly talk me through the differentiation step, bearing in mind I've never learn how to differentiate with respect to a vector?

Most confusing for me is:

1. why are they differentiating with respect to the transpose b' rather than just b?

2. where does the -2X'y term come from?

3. is there any assumption here that X is square?

Thanks for any help,

Mike

# Differentiation with matrices/vectors

