# I How is a vector a directional derivative?

Tags:
1. Jan 16, 2017

### mp6250

I'm going through a basic introduction to tensors, specifically https://web2.ph.utexas.edu/~jcfeng/notes/Tensors_Poor_Man.pdf and I'm confused by the author when he defines vectors as directional derivatives at the bottom of page 3.

He defines a simple example in which
$$ƒ(x^j) = x^1$$
and then goes on to write the directional derivative along a vector $v$ as:
$$v ⋅ ∇ ƒ(x^j) = v ⋅ ∇ x^1 = v^i ⋅ \delta^1_i = v^1$$
Next the author says that all we need to do to get the vector out is to feed the corresponding component into the directional derivative
$$v^i = v ⋅ ∇ x^i$$
I don't understand what is being said here at all. This formula only works for the first component and only when the function is is $f(x^j) = x^1$. If we wanted to get the second component out of the vector we would end up with 0 no matter what the vector actually was. If our function was $f(x^j) = (x^1)^2$ then we would have
$$v ⋅ ∇f(x^j) = v ⋅ (2x^1, 0, 0)$$
and the first component of our vector would depend on the point of evaluation.

Directional derivatives are scalars, I don't understand how you could equate them with vectors. Directional derivatives need context (a function, a point of evaluation, and a direction from that point) but a vector alone needs none of that.

I'm very confused by this section and I think i completely mis-interpret what is trying to be said. What is meant by all this?

2. Jan 16, 2017

### Ibix

I struggle with this, too. The following is what I think I understand, but please be aware that any criticism laid by other members is probably warranted.

The thing to look at is at the bottom of page 3, where he writes $v\cdot\nabla=v^i\partial/\partial x^i$. Remember the summation convention - the right hand side of that is $v^0\partial_0+v^1\partial_1+ v^2\partial_2+v^3\partial_3$, where I've shortened $\partial/\partial x^i$ to $\partial_i$. Note the similarity to the basis vector notation for a vector - $\vec v =v_x\mathbf i+v_y\mathbf j+v_z\mathbf k$.

So what he is saying is that the four numbers $v^i$ are just four numbers. They don't mean anything vector-esque unless you associate each number with a direction. And one way to do that is to talk about "the direction in which exactly one of the coordinates changes". Thus $v^i\partial_i$ (implicit summation!) is a plausible candidate for a vector in a notation derived from the i,j,k notation. The example with $f(x^i)=x^1$ is just a trivial case to show that this does, indeed, make sense. We know what a derivative operator in some direction ought to do to that function - return a value proportional to how much the direction points along the direction in which $x^1$ (only) changes. And hey presto.

So $v^i$ is not a vector. $v^i\partial_i$ is. We often get lazy and don't bother to write out the partials because "everybody knows they're there".

As I say, be wary of the above until it has the nod from proper experts...

Last edited: Jan 16, 2017
3. Jan 16, 2017

### pervect

Staff Emeritus
The abstract properties of vector space are that you can add vectors together and multiply them by scalars. Directional derivatives have these same abstract properties, so formally they are vector spaces. That's probably the most important thing to note, and a good place to start your learning and thinking.

Suppose you have a plane with cartesian coordinates x and y. Then you can interpret $\partial / \partial x$ as the direction you go in when you change x and hold y constant.

It's somewhat of a convention, but the usual convention is that you visualize directional derivatives as little lines with arrows on them that indicate a direction, and the length of the line indicates the magnitude.

There are some more interesting related things, but it would probably be a confusing digression to get into them too much. Probably the least confusing thing is to stay focussed on vectors, but to slowly introduce different coordinate systems and understand how these vectors transform.

To test your understanding, you might try drawing the graphical "arrow" representation of $\partial / \partial r$ and $\partial / \partial \theta$ in polar coordinates. And think about what happens under other simple coordinate transformations, for instance suppose you define a primed coordinate system x'=x and y' = (x+y)/sqrt(2). What would the metric tensor be in terms of x' and y', and how would you graphicall represent $\partial / \partial x'$ and $\partial / \partial y'$ in the "lines with arrows" notation? Work it out with diagrams, and then also do the associated math with the metric tensor, the tensor trasnformation rules.

If you happen to have Penrose, it could be worth reading his remarks on "the second fundamental confusion of calculus", but it'd be too long and too much of a digression to get into it on this short post. Plus I need to run.

4. Jan 16, 2017

### strangerep

I, too, was puzzled for ages by this "vectors-are-derivatives" notion.

I will say that I think those notes are not as helpful as they could be.
Can you get access to a copy of Wald's "General Relativity" textbook? In section 2.2, when explaining this stuff, he mentions some important insights:

1) When we think of a vector (i.e., anchored direction) on a flat surface, the vector lies wholly within the flat surface. But a direction anchored at a point of a curved surface points out into "space", off the surface. We understand this intuitively only by tacitly embedding the surface and the vector in a larger Euclidean space. E.g., Wald's fig 2.2 shows a tangent plane touching a point on a sphere, and the whole thing is obviously embedded in $R^3$ (which I'll call the "ambient space"). But in physics, where we work with a 4D spacetime, any larger ambient space we might consider is unphysical. It's just a figment of our imagination. Thus the question arises: how can we generalize the definition of "vector at a point on a manifold" such that the definition refers only to the intrinsic structure of the manifold itself, with no reference to any unphysical ambient space?

[Aside: this is similar to the motivation for Riemannian curvature, which gives a generalized definition of curvature of a manifold, relying only on intrinsic properties of the manifold itself. (But Riemannian geometry properly comes later.)]

2) To find such a generalized definition of "vector on a manifold" (which coincides with our usual intuition when the manifold is flat), we appeal to the fact that "direction" only has useful meaning in reference to the change of value of any (arbitrarily-often differentiable) field defined on the manifold, which I'll write generically as $f(x^i)$ where the $x^i$ coordinatize points on the manifold. (I'll ignore the subtleties of coordinate patches here.)

We can Taylor-expand the field around any given $x^i$ as $$f(x^i + \alpha^i) ~=~ f(x^i) ~+~ \alpha^k \frac{\partial f}{\partial x^k} ~+~ \cdots ~.$$The important insight here is that the operator $$\alpha^k \frac{\partial}{\partial x^k}$$ (called a "directional derivative") can be used to find out how any physical field changes along a specific infinitesimal interval on the manifold, starting from any given point on the manifold.

3) From there, it is a short step to realize that the operators $$\left. \frac{\partial}{\partial x^k} \right|_x$$ span an abstract vector space, associated with each point $x$ of the manifold.

There's a few more technicalities to work through, but the above is enough to show the basic idea: we have generalized the ordinary notion of "vector" in flat space to a corresponding concept on a curved manifold, such that the definition relies only on intrinsic features of the manifold (i.e., its points and continuous differentiability properties).

Last edited: Jan 17, 2017
5. Jan 18, 2017

### mp6250

The reference from Wald was helpful, thank you. While the details of the proof in 2.2 are a little difficult to follow, what I gather is that vectors which span the mapped space in $ℝ^n$ will also span the open set $O$ which they come from.

I think I'm beginning to understand, is the following statement correct:

vectors are directional derivatives of the function which describes the surface they live in ?

Would it also be fair to say that in euclidean space, the gradient of this surface evaluates to 1 for every direction? In that case, for euclidean space, it is not any different from assigning the $\hat i, \hat j, \hat k$ vectors to the $v^i$ components as Ibix describes in their response. In other words, the gradient operator dotted into the components is what gives a vector it's direction. Without it, the vector is just a collection of numbers.

6. Jan 18, 2017

### geometrodynamix

As the author of those notes, I concur with strangerep--they aren't as good as they can be. Someday, I'd like to update the tensor notes (and convert them to TeX), but I must first attend to my dissertation.

I hope I can address a point of confusion in your original post. To get the other components of the vector, $v^2$ and $v^3$, you feed different functions into the directional derivative. For instance, if I wanted to obtain the component $v^2$ from $v^i \partial_i$, I would use $g(x^i)=x^2$. To get the component $v^3$, I would use the function $h(x^i)=x^3$. To clarify, the point I was making at the time was simply that directional derivative operators $v^i \partial_i$ contain the same amount of information as a vector, and the (poorly executed) example was meant to demonstrate how one can explicitly pick out a component of a vector from the directional derivative operator.

With regard to your second post, I'm not sure your statement is quite right (assuming I understand it correctly); in elementary vector calculus on Euclidean space, the gradient of a function yields a vector that is normal/orthogonal to the level surfaces of the function, and if the directional derivative is the dot product of a vector and a gradient, then a vector tangent to a level surface will be orthogonal to the gradient. This is just the statement that the directional derivative vanishes in a direction tangent to the level surface.

Perhaps the following comments (valid in Euclidean space) will provide some extra intuition--if not, you can ignore them. If I take the gradient of $f(x^i)=x^1$, I get the components of a vector $E_1=\nabla f$ that points in a direction that is normal/orthogonal to a surface of constant $x^1$ (the level surface of $f(x^i)=x^1$). Doing the same for $g(x^i)=x^2$ and $h(x^i)=x^3$, I get the vectors $E_2=\nabla g$ and $E_3=\nabla h$, which respectively point in directions that are normal to surfaces of constant $x^2$ and constant $x^3$. It shouldn't be too difficult to convince oneself that the vectors $E_1$, $E_2$ and $E_3$ form an orthonormal basis. When replacing the basis vectors with partial derivatives, all I'm doing is to get rid of the trivial functions $f$, $g$, and $h$. In doing so, I turn the vector into an operator, but as I argued earlier, it still contains the same amount of information as a list of the components of the vector in some basis.

A final comment: I must emphasize that as a directional derivative, the vector $v=v^i \partial_i$ should be thought of as an operator, without reference to any function. If you feed it a function $\phi (x^i)$, what you get is a scalar quantity, which in Euclidean space can be interpreted as the dot product of $v$ with the gradient $\nabla \phi$, and as argued earlier, one can feed it the appropriate functions to pick out the components.

Hope this helps.

Last edited: Jan 18, 2017
7. Jan 18, 2017

### Orodruin

Staff Emeritus
A word of warning here, the gradient of the coordinate functions would define something which is the equivalent of the dual basis $dx^i$. The coordinate functions are not the redundant object here - the Cartesian basis used to define the gradient as $\vec e_i \partial_i$ in Cartesian coordinates is. Instead, the vectors corresponding to the tangent vector basis would be the partial derivatives of the Euclidean position vector $\vec x$ with respect to the different coordinates, i.e., $\vec E_i = \partial \vec x/\partial x^i$. Now, in a manifold, the position vector is not well defined (and not really of relevance for the definition of the tangent vector basis) and so we are left with the partial derivatives as the basis for the tangent vectors. This is also why you find that $\vec v \cdot dx^i = v(x^i) = dx^i(v^j\partial_j) = v^i$.

Of course, if you stick to a Cartesian coordinate system, it does not matter whether you take the gradient of the coordinate functions or if you differentiate the position vector with respect to the coordinates.

8. Jan 18, 2017

### pervect

Staff Emeritus
I've got a little more time, I'll make some more remakrs that I hope will help, though they may digress a little.

Suppose we have the 2d surface of a 3d sphere. Consider a point that's not at the poles. At any such point we have a notion of "north" and "east". On a plane, we would represent this notion of direction via a pair of vectors, a unit vector pointing north, and a unit vector pointing east. We wish to do the same thing on a sphere. But how do we define this exactly?

On a plane, we are used to conflating vectors with displacement operators. We run into some difficulties when we try to apply this to a sphere, though. On a plaine, if we go 100 units east, and 100 units north, in a straight line, we wind up at the same point as if we go 100 units north first, then 100 units east. But unfortunately, not so on a globe :(.

There are several ways we might try to get around this difficulty, but the way that works is to consider a tangent plane to the sphere, then the order doesn't matter. In order to have a vector space, the order can't matter - I don't believe there is any way to "patch up" a notion of a vector space to make it match the behavior of displacements on a sphere, the idea that vectors must commute is just too fundamental to modify.

We note that if we use small enough displacements, a lot of these issues go away - for instance, we can make a small street map that covers a small section of the globe and use it to navigate, and it will be sufficiently accurate for our purposes. The reason this works is basically that the tangent plane is a good represnetion of a small section of the sphere, though it doesn't work for larger areas. For instance, if we try to draw a large map of the globe on a flat sheet of paper, we just can't make the drawing "to scale" - a well known problem of cartography, there are various solutions as to how to "project" the surface of the globe on a flat sheet of paper, but none of them are perfect, there are several popular ones each of which has some things it represents well, but none of the projections can be exactly right.

The notion of small displacements leads us perhaps to the notion - if we make the diplacements so small they are infinitesimal (which involves the process of taking a limit), can we consider a small change in longitude $\lambda$ and a small change in lattitude $\varphi$ as a vector? The answer is basicaly "yes, but". The "yes" part is that infinitesimal coordinate changtes d$\lambda$ and d$\varphi$ can indeed be interpreted as a vector space. The "but" part is that these are not the familiar vectors, lines with arrows that we usually draw, but the duals of these vectors.

In the usual langauage, d$\lambda$ is a map from a vector to a scalar. This has various names, a "one-form" is a common term. We also say that it's in the "dual space". Some study of the concept of duality might be needed, but that won't be a waste, we'll be using that concept with tensors, too. So it may be a digression, but it will ultimately be a useful one.

The leap we need to make is that a vector is not an infinitesimal displacement, but the dual of such a displacement. This turns out to be a directional derivative, though this is not particularly obvious.

When the dust all settles, we have vectors, which are differential operators $\partial / \partial \lambda$ and the duals of vectors, which are d$\lambda$, and we have the important duality relationship between them.

The duality relationship says that for every vector (regarded as an input), we have a "dual vector" or one-form that is a linear map from the input vector to a scalar. And we can turn this around - if we have a one-form that we regard as an input, the dual of this one form is a vector, which is a linear map from the one-form to a scalar.

And those are the two things we need. Well, there is a third thing we need, and that's how we find the length of a vector. This is done via the metric tensor, which we can regard as various ways, which I won't get into. This has some implications, when we were talking about "unit vectors". To define a unit vector, we need some definition of length - and it turns out that while d$\lambda$ and d$\varphi$ do have the necessar properties to be vectors, they don't both turn out to be unit vectors when we introduce the appropriate concepts of length via the metric tensor. One of them does, I think, and one of them doesn't.

Unfortunately we wind up needing to tie together several disparate concepts to get this particular "big picture", but once you have a notion of vectors, their duals, and the metric tensor, you have everything you need.

Last edited by a moderator: Jan 19, 2017
9. Jan 18, 2017

### geometrodynamix

Huh, I thought I included a caveat about that. I must have deleted it when I got rid of some redundant text. In any case, you're entirely right--thanks for pointing this out. On a manifold, or even in curvilinear coordinates, the gradient cannot be thought of as a vector of the same type as the directional derivative--the components of the gradient $\nabla \phi$ must transform differently than the components $v^i$ that appear in the directional derivative $v^i\partial_i$. This must be the case if the quantity $v^i\partial_i \phi$ transforms as a scalar.

I should also add that the terms "covariant vector" (or dual vector) and "contravariant vector" are used to distinguish these two types of vectors; the former refers to vectors whose components transform like gradients, and the latter refers to vectors whose components transform like the components $v^i$ of directional derivatives.

10. Jan 18, 2017

### strangerep

I think your statement is not correct. You can use any function defined over the manifold. I.e., any function which has the manifold (or at least an open subset thereof) as its domain.

BTW, if you haven't already done so, it's worthwhile to read a bit further in Wald, on p18, where he makes precise the relationship between tangent vectors and infinitesimal displacements, in terms of 1-parameter group(s) of diffeomorphisms of the manifold.

Last edited: Jan 18, 2017
11. Oct 29, 2017

### haushofer

Each with length one, I'd say, using the standard Euclidean metric.