Layout Notation for matrix calculus

Click For Summary
SUMMARY

This discussion centers on the confusion surrounding "numerator layout notation" and "denominator layout notation" in matrix calculus, particularly when differentiating scalars with respect to vectors. The forum participants clarify that while the denominator layout notation interprets differentiation as a column vector, the numerator layout can yield a row vector, leading to potential discrepancies. They emphasize that matrix layouts are arbitrary representations and suggest that understanding the underlying scalar functions and their derivatives is more critical than adhering strictly to one notation. The conversation highlights the importance of clarity in representation, especially when dealing with higher dimensions.

PREREQUISITES
  • Understanding of matrix calculus concepts, specifically differentiation of scalars with respect to vectors.
  • Familiarity with Jacobian matrices and their representations.
  • Knowledge of Einstein summation convention in tensor calculus.
  • Basic proficiency in linear algebra and vector operations.
NEXT STEPS
  • Study the differences between numerator and denominator layout notations in matrix calculus.
  • Learn how to compute Jacobians for vector functions using both layout notations.
  • Explore the implications of Einstein summation convention in matrix calculus.
  • Investigate practical applications of matrix differentiation in machine learning algorithms.
USEFUL FOR

Mathematicians, data scientists, and machine learning practitioners who require a deeper understanding of matrix calculus and its applications in optimization and algorithm development.

Dethrone
Messages
716
Reaction score
0
Hi,

I guess this could be a rather silly question, but I got a bit confused about the "numerator layout notation" and "denominator layout notation" when working with matrix differentiation: ="http://https://en.wikipedia.org/w...a.org/wiki/Matrix_calculus#Layout_conventions

It says that with the denominator layout notation, we interpret differentiation of a scalar with respect to a vector as such: $\frac{\mathrm{d}L}{\mathrm{d}w_1}=[\frac{\mathrm{d}L}{\mathrm{d}w_{11}}\frac{\mathrm{d}L}{\mathrm{d}w_{12}} ... \frac{\mathrm{d}L}{\mathrm{d}w_{1n}}]^T$, $L$ a scalar and $w_1$ an $n$ x $1$ vector.

But what if we represent the scalar $L$ differently? e.g $L=w^Tx$, where $w$, $x \in \Bbb{R}^{n \times1}$.
Then we get $\frac{\mathrm{d}L}{\mathrm{d}w}=\frac{\mathrm{d}(w^Tx)}{\mathrm{d}w}=\frac{\mathrm{d}(x^Tw)}{\mathrm{d}w}=x^T$, which is a $1$ by $n$ vector. Doesn't this result disagree with the denominator layout notation? I read somewhere on the wiki that says one should stick to one type of notation, but if certain types of calculations favors one type of notation over the other, wouldn't that be problematic or confusing?

I came across this when trying to calculate $\frac{\mathrm{d}L}{\mathrm{d}W}=[\frac{\mathrm{d}L}{\mathrm{d}w_1}\frac{\mathrm{d}L}{\mathrm{d}w_2}...\frac{\mathrm{d}L}{\mathrm{d}w_c}]$, where $W$ is $n$ by $c$, and each $\frac{\mathrm{d}L}{\mathrm{d}w_i}$ is the derivative of $L$ with respect to the column vector $w_i$. As you can see fairly quickly, I started off with what wiki calls the the "denominator layout notation" but since each $\frac{\mathrm{d}L}{\mathrm{d}w_i}$'s ended up being $1$ by $n$, it didn't make much sense. Basically what I'm trying to say is that writing the scalar $L$ as $w^Tx$ caused my result to use numerator notation, but since I started off using denominator notation my answer gets messed up.
 
Last edited:
Physics news on Phys.org
Hey Rido12! (Smile)

Indeed, the matrix layouts of derivatives tend to be confusing.
The problem as I see it, is that the matrix layout is an arbitrary representation.
And it already more or less fails if we have more than 2 dimensions, since then we can't properly represent it in a rectangular matrix.
When we take the derivative of a vector function with respect to a vector, what we actually have is a set of scalar functions:
$$\pd {\mathbf f}{\mathbf x} = \left(\pd {f_i}{x_j}\right)$$
That is, forget about the matrix layout.
And when we want to multiply it with a vector $\mathbf v$ to find a directional derivative, which is really an application of the chain rule, what we need to do is:
$$\pd {\mathbf f}{\mathbf x} \cdot \mathbf v = \sum_{i=1}^n \pd {f_i}{x_j} v_j \mathbf e_i$$
where $\mathbf e_i$ is the $i$-th unit vector.

Since as humans we like to represent that in something we can write down, and that fits into how we usually do matrix manipulations, the most natural way that fits in our conventions is:
$$\begin{bmatrix}\pd {f_1}{x_1} & ... & \pd {f_1}{x_n} \\ \vdots & & \vdots \\ \pd {f_n}{x_1} & ... & \pd {f_n}{x_n} \end{bmatrix}
\begin{bmatrix}v_1 \\ \vdots \\ v_n\end{bmatrix}$$
This is the Jacobian form, or numerator layout.
The thing to realize, is that whenever we do something like this, we need to ensure that the elements get multiplied and summed with the right elements.
So if we choose to pick the denominator layout instead, to ensure our conventional matrix product works out, we need to write it as:
$$\begin{bmatrix}v_1 & \dots & v_n\end{bmatrix}\begin{bmatrix}\pd {f_1}{x_1} & ... & \pd {f_n}{x_1} \\ \vdots & & \vdots \\ \pd {f_1}{x_n} & ... & \pd {f_n}{x_n} \end{bmatrix}
$$

Or we can choose to forget about conventional matrix layouts and products, and just write:
$$\sum_{i=1}^n \pd {f_i}{x_j} v_j \mathbf e_i$$
or for short:
$$\pd {f_i}{x_j} v_j$$
following Einstein summation convention.
 

Similar threads

  • · Replies 33 ·
2
Replies
33
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
31
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K