How to compute the derivative with respect to a vector in index notation?

  • Context: Graduate 
  • Thread starter Thread starter EngWiPy
  • Start date Start date
  • Tags Tags
    Differentiate Vectors
Click For Summary

Discussion Overview

The discussion revolves around computing the derivative of a scalar function with respect to a vector, specifically in index notation. Participants explore the notation, clarify definitions, and propose methods for differentiation, focusing on the mathematical formulation and assumptions involved.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant presents the derivative expression and suggests writing the function explicitly in terms of vector components.
  • Another participant emphasizes that the derivative of a scalar function with respect to a vector can be represented as a matrix, specifically a "1 by 3" matrix in the context of three variables.
  • Clarifications are made regarding the notation used, including the meaning of E and E[\mathbf{s}], and the nature of the vectors involved.
  • One participant proposes writing the function in index notation and derives the components of the derivative, leading to a specific expression for the derivative with respect to the vector.
  • There is confusion regarding the notation of E[\mathbf{s}] and its distinction from E, prompting further clarification from the original poster.
  • Participants discuss the implications of differentiating a scalar with respect to a vector and the role of the Kronecker delta in the derivation process.

Areas of Agreement / Disagreement

Participants express varying levels of understanding regarding the notation and the mathematical approach to the derivative. While some agree on the method of differentiation, others highlight the need for clearer definitions and notation. The discussion remains unresolved on certain aspects of the notation and its implications.

Contextual Notes

There are limitations in the clarity of notation and definitions, particularly concerning the terms E and E[\mathbf{s}]. The discussion also reflects assumptions about the dimensionality of the vectors involved and the nature of the function being differentiated.

EngWiPy
Messages
1,361
Reaction score
61
Dear all;

I need to find this derivative:

[tex]\frac{\partial}{\partial \mathbf{a}}\left(E-2\,\mathbf{a}^{\text{T}}\,E[\mathbf{s}]+\|\mathbf{a}\|^2\right)[/tex]

where boldface letters indicates column vectors, the superscript T indicates transpose, and ||.|| is the norm of a vector.

Thanks in advance
 
Physics news on Phys.org
The notation is unusual. I have never seen a definition of a derivative with respect to a vector. The closest thing I can think of is the del operator.
 
I think what is meant is
[tex] \frac{\partial}{\partial\vec a}f(\vec a) = \left(\frac{\partial}{\partial a_1}f(\vec a),\frac{\partial}{\partial a_2}f(\vec a),\frac{\partial}{\partial a_3}f(\vec a)\right)[/tex]
which is of course very similar to the del operator.

So I suggest you write your function
[tex] f(\vec a) =E-2\,\vec a^{\text{T}}\,E[\vec s]+\|\vec a\|^2[/tex]
explicitly in terms of the components [itex]a_1, a_2, a_3[/itex] (I assume its a three-dimensional vector, it wouldn't make a difference if it were not) and then compute the partial derivatives.
 
The vectors are N*1 column vectors. At the end, I need to find the derivative with respect to [tex]\mathbf{a}[/tex]. How can I do that?

Thanks in advance
 
In general, the derivative of a function f from [itex]R^n[/itex] to [itex]R^m[/itex] is the linear function from [itex]R^n[/itex] that "best" approximates the function f around the given point. "best" is made precise in the definition of the derivative.

Given a coordinate system (basis) in each space, we can then write the derivative as an n by m matrix. In this problem you are differentiating a scalar, in R, with respect to a vector in [itex]R^3[/itex] so this would be a "1 by 3" matrix which we would interpret as a 3-vector. Essentially, you treat the components of the vector as the three variables and this derivative is the same as taking the gradient, [itex]\nabla f[/itex], of a numerical function of three variables.
 
S_David said:
Dear all;

I need to find this derivative:

[tex]\frac{\partial}{\partial \mathbf{a}}\left(E-2\,\mathbf{a}^{\text{T}}\,E[\mathbf{s}]+\|\mathbf{a}\|^2\right)[/tex]

where boldface letters indicates column vectors, the superscript T indicates transpose, and ||.|| is the norm of a vector.

The vectors are N*1 column vectors.

Then in that case I'd assume that the you are to find a column vector of partial derivates, as in :

[tex]\frac{\partial f}{\partial \mathbf{a}} = \left[\frac{\partial f}{\partial a_1}, \frac{\partial f}{\partial a_2}, \frac{\partial f}{\partial a_3}, ... \right]^{\text{T}} = 2 [ \mathbf{a} - E \mathbf{s}][/tex]

Assuming that "E" is a scalar then "f" is just a scalar (quadratic) function of [itex]a_1,a_2, ... a_n[/itex] right. So you should be able to verify the above should be easily enough.
 
Last edited:
S_David said:
Dear all;

I need to find this derivative:
You have been asked to clarify what your notation means. Please do so, rather than force people to guess what notational conventions you are using.

Also, I suggest you specify what E and [s] mean. (And if they have any functional dependence on a)


p.s. ||a||2 = aT a
 
Hurkyl said:
You have been asked to clarify what your notation means. Please do so, rather than force people to guess what notational conventions you are using.

Also, I suggest you specify what E and [s] mean. (And if they have any functional dependence on a)


p.s. ||a||2 = aT a

I thought it is obvious, sorry. The terms are:
1- [tex]\mathbf{a}[/tex] is an [tex]N\times 1[/tex] column vector.
2-[tex]E[/tex] is a constant that does not depend on [tex]\mathbf{a}[/tex].
3- [tex]E[\mathbf{s}][/tex] is another [tex]N\times 1[/tex] vector, that does not depend on [tex]\mathbf{a}[/tex].
4- [tex]\|\mathbf{a}\|^2[/tex] is as you clarified.

Thank you all, now I get it.

Regards
 
S_David said:
I thought it is obvious, sorry.
I, in particular, was confused by the square brackets around s. (i.e. why not write Es?)
 
  • #10
Hurkyl said:
I, in particular, was confused by the square brackets around s. (i.e. why not write Es?)

Yes, you are right, it is confusing, because the terms [tex]E[/tex] and [tex]E[\mathbf{s}][/tex] are different. Let me re-write the original equation:

[tex]\mathcal{E}-2\,\mathbf{a}^{\text{T}}\,E[\mathbf{s}]+\|\mathbf{a}\|^2[/tex]

and the term [tex]E[\mathbf{s}][/tex] is written in this way, because in the context in which I am working in it is the statistical average of a number of vectors [tex]\left\{\mathbf{s}_i\right\}_{i=1}^{M}[/tex]

Regards
 
  • #11
Perhaps it would help simplify things if I pointed out that the norm of anything is a number ie a scalar.
 
  • #12
Studiot said:
Perhaps it would help simplify things if I pointed out that the norm of anything is a number ie a scalar.
Yes, so the function to be differentiated is a number, not a vector. But it is being differentiated with respect to a vector. That is the whole point.
 
  • #13
S_David said:
The vectors are N*1 column vectors. At the end, I need to find the derivative with respect to [tex]\mathbf{a}[/tex]. How can I do that?

Thanks in advance

Write it in index notation (repeated indices are assumed to be summed over):

[tex]f(\{a\}) = \mathcal E - 2 a_i E[\mathbf{s}]_i + a_i a_i[/tex]

Then, what you want is

[tex]\frac{\partial f}{\partial a_\ell} = -2 \frac{\partial a_i}{\partial a_\ell} E[\mathbf{s}]_i + \frac{\partial a_i}{\partial a_\ell} a_i + a_i \frac{\partial a_i}{\partial a_\ell} = (-2 E[\mathbf{s}]_i + 2a_i )\delta_{i\ell} = -2E[\mathbf{s}]_\ell +2a_\ell[/tex].
where I used the fact that [itex]\partial a_i/\partial a_i = \delta_{i\ell}[/itex], the Kronecker delta.
So, the ith component of the vector [itex]\partial f/\partial \mathbf{a}[/itex] is

[tex]\frac{\partial f}{\partial a_i} = -2E[\mathbf{s}]_i +2a_i[/tex]
which means

[tex]\frac{\partial f}{\partial \mathbf{a}} = - 2\mathbf{E}[\mathbf{s}] + 2\mathbf{a}[/tex].

(Since E is a vector that depends on the vector s, I have made E bold in vector notation).
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
6K
  • · Replies 1 ·
Replies
1
Views
993
  • · Replies 27 ·
Replies
27
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K