Differentiation with matrices/vectors

  • Context: Graduate 
  • Thread starter Thread starter mikeph
  • Start date Start date
  • Tags Tags
    Differentiation
Click For Summary
SUMMARY

The discussion focuses on the differentiation of the least squares estimator in the context of linear regression, specifically using the Leibniz rule. The key points include differentiating the function S(β) = (y - Xβ)ᵀ(y - Xβ) with respect to the vector β, leading to the expression -2Xᵀy + 2XᵀXβ. The confusion arises from differentiating with respect to the transpose b' instead of b and the origin of the -2X'y term. The assumption that X is square is not explicitly stated but is implied in the context of the least squares solution.

PREREQUISITES
  • Understanding of linear regression and least squares estimation
  • Familiarity with matrix calculus and differentiation rules
  • Knowledge of the Leibniz rule for differentiation
  • Basic concepts of matrix transposition and multiplication
NEXT STEPS
  • Study matrix calculus, focusing on differentiation with respect to vectors
  • Explore the derivation of the least squares estimator in detail
  • Learn about the properties of matrix transposition and its implications in regression analysis
  • Investigate the assumptions underlying linear regression models, particularly regarding matrix dimensions
USEFUL FOR

Students and professionals in statistics, data science, and machine learning who are working with linear regression models and need to understand the mathematical foundations of least squares estimation.

mikeph
Messages
1,229
Reaction score
18
Hello,

I'm trying to understand this proof:

http://en.wikipedia.org/wiki/Proofs...st_squares#Least_squares_estimator_for_.CE.B2

Can someone quickly talk me through the differentiation step, bearing in mind I've never learn how to differentiate with respect to a vector?

Most confusing for me is:

1. why are they differentiating with respect to the transpose b' rather than just b?
2. where does the -2X'y term come from?
3. is there any assumption here that X is square?

Thanks for any help,
Mike
 
Physics news on Phys.org
It is easier with the Leibniz rule: ##S(\beta)=(y-X\beta)^\tau(y-X\beta)##. Hence differentiation with respect to ##\beta## is
\begin{align*}
S(\beta)'&=[(y-X\beta)^\tau]'\cdot (y-X\beta) + (y-X\beta)^\tau \cdot (y-X\beta)'\\
&=-X^\tau\cdot (y-X\beta) + (y-X\beta)^\tau\cdot (-X)\\
&=-X^\tau y + X^\tau X\beta -y^\tau X + \beta^\tau X^\tau X\\
&=-2X^\tau y +2X^\tau X\beta
\end{align*}
as matrix times column vector equals row vector times transpose matrix and at the evaluation point ##\beta=\hat \beta## we get
$$
\dfrac{dS}{d\beta}(\hat \beta) = S(\beta)'|_{\beta=\hat \beta} = -2X^\tau y +2X^\tau X \hat \beta
$$
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
Replies
2
Views
1K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 65 ·
3
Replies
65
Views
9K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 0 ·
Replies
0
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 9 ·
Replies
9
Views
7K