# How to Differentiate Vectors?

1. Jun 29, 2010

### EngWiPy

Dear all;

I need to find this derivative:

$$\frac{\partial}{\partial \mathbf{a}}\left(E-2\,\mathbf{a}^{\text{T}}\,E[\mathbf{s}]+\|\mathbf{a}\|^2\right)$$

where boldface letters indicates column vectors, the superscript T indicates transpose, and ||.|| is the norm of a vector.

2. Jun 29, 2010

### mathman

The notation is unusual. I have never seen a definition of a derivative with respect to a vector. The closest thing I can think of is the del operator.

3. Jun 29, 2010

### Pere Callahan

I think what is meant is
$$\frac{\partial}{\partial\vec a}f(\vec a) = \left(\frac{\partial}{\partial a_1}f(\vec a),\frac{\partial}{\partial a_2}f(\vec a),\frac{\partial}{\partial a_3}f(\vec a)\right)$$
which is of course very similar to the del operator.

So I suggest you write your function
$$f(\vec a) =E-2\,\vec a^{\text{T}}\,E[\vec s]+\|\vec a\|^2$$
explicitly in terms of the components $a_1, a_2, a_3$ (I assume its a three-dimensional vector, it wouldn't make a difference if it were not) and then compute the partial derivatives.

4. Jun 30, 2010

### EngWiPy

The vectors are N*1 column vectors. At the end, I need to find the derivative with respect to $$\mathbf{a}$$. How can I do that?

5. Jun 30, 2010

### HallsofIvy

In general, the derivative of a function f from $R^n$ to $R^m$ is the linear function from $R^n$ that "best" approximates the function f around the given point. "best" is made precise in the definition of the derivative.

Given a coordinate system (basis) in each space, we can then write the derivative as an n by m matrix. In this problem you are differentiating a scalar, in R, with respect to a vector in $R^3$ so this would be a "1 by 3" matrix which we would interpret as a 3-vector. Essentially, you treat the components of the vector as the three variables and this derivative is the same as taking the gradient, $\nabla f$, of a numerical function of three variables.

6. Jun 30, 2010

### uart

Then in that case I'd assume that the you are to find a column vector of partial derivates, as in :

$$\frac{\partial f}{\partial \mathbf{a}} = \left[\frac{\partial f}{\partial a_1}, \frac{\partial f}{\partial a_2}, \frac{\partial f}{\partial a_3}, ... \right]^{\text{T}} = 2 [ \mathbf{a} - E \mathbf{s}]$$

Assuming that "E" is a scalar then "f" is just a scalar (quadratic) function of $a_1,a_2, ... a_n$ right. So you should be able to verify the above should be easily enough.

Last edited: Jun 30, 2010
7. Jun 30, 2010

### Hurkyl

Staff Emeritus
You have been asked to clarify what your notation means. Please do so, rather than force people to guess what notational conventions you are using.

Also, I suggest you specify what E and [s] mean. (And if they have any functional dependence on a)

p.s. ||a||2 = aT a

8. Jun 30, 2010

### EngWiPy

I thought it is obvious, sorry. The terms are:
1- $$\mathbf{a}$$ is an $$N\times 1$$ column vector.
2-$$E$$ is a constant that does not depend on $$\mathbf{a}$$.
3- $$E[\mathbf{s}]$$ is another $$N\times 1$$ vector, that does not depend on $$\mathbf{a}$$.
4- $$\|\mathbf{a}\|^2$$ is as you clarified.

Thank you all, now I get it.

Regards

9. Jun 30, 2010

### Hurkyl

Staff Emeritus
I, in particular, was confused by the square brackets around s. (i.e. why not write Es?)

10. Jul 1, 2010

### EngWiPy

Yes, you are right, it is confusing, because the terms $$E$$ and $$E[\mathbf{s}]$$ are different. Let me re-write the original equation:

$$\mathcal{E}-2\,\mathbf{a}^{\text{T}}\,E[\mathbf{s}]+\|\mathbf{a}\|^2$$

and the term $$E[\mathbf{s}]$$ is written in this way, because in the context in which I am working in it is the statistical average of a number of vectors $$\left\{\mathbf{s}_i\right\}_{i=1}^{M}$$

Regards

11. Jul 1, 2010

### Studiot

Perhaps it would help simplify things if I pointed out that the norm of anything is a number ie a scalar.

12. Jul 1, 2010

### HallsofIvy

Yes, so the function to be differentiated is a number, not a vector. But it is being differentiated with respect to a vector. That is the whole point.

13. Jul 1, 2010

### Mute

Write it in index notation (repeated indices are assumed to be summed over):

$$f(\{a\}) = \mathcal E - 2 a_i E[\mathbf{s}]_i + a_i a_i$$

Then, what you want is

$$\frac{\partial f}{\partial a_\ell} = -2 \frac{\partial a_i}{\partial a_\ell} E[\mathbf{s}]_i + \frac{\partial a_i}{\partial a_\ell} a_i + a_i \frac{\partial a_i}{\partial a_\ell} = (-2 E[\mathbf{s}]_i + 2a_i )\delta_{i\ell} = -2E[\mathbf{s}]_\ell +2a_\ell$$.
where I used the fact that $\partial a_i/\partial a_i = \delta_{i\ell}$, the Kronecker delta.
So, the ith component of the vector $\partial f/\partial \mathbf{a}$ is

$$\frac{\partial f}{\partial a_i} = -2E[\mathbf{s}]_i +2a_i$$
which means

$$\frac{\partial f}{\partial \mathbf{a}} = - 2\mathbf{E}[\mathbf{s}] + 2\mathbf{a}$$.

(Since E is a vector that depends on the vector s, I have made E bold in vector notation).