What Are Covariance and Contravariance in Tensors?

marschmellow · Aug 3, 2010

Hello all. I am a 17-year-old high school student in the United States. I have taken BC Calculus and will be taking Multi-variable Calculus next year (also known as Calc III in the states), but I already know a lot of it for various reasons. I also have mild experience in topology and abstract algebra. This is to give you an idea of where I am regarding my math skills. I have little difficulty understanding new math concepts EXCEPT when there are undefined characters and strange notation floating around.

I understand the basic concept of a tensor, and I find it fascinating. To me, what constitutes cool math is taking something you already get on one level and generalizing it to see what new cool stuff can come from it, so the idea of generalizing the notation of a scalar, vector and matrix to a quantity with n indices seems really cool to me. I just have a few questions slash statements:
1. My understanding of the difference between the tensor product and the dot product is that the dot product is matrix multiplication and the tensor product is laying out each component of two tensors and multiplying them together as though they were terms of separate polynomials. Is this correct?
2. It seems to me like a particle in motion in Euclidian (or any) space could be modeled with a second-order tensor: one index would tell you which vector you are interested in (position, velocity, etc.) and the other index would tell you which spatial component of that vector you are interested in. Yet I have never encountered a particle being modeled in this way. Is this because the math involved with tensor analysis would never be useful to model a particle in this way, or am I being stupid?
3. Could someone explain in words the concepts of covariant and contravariant indices? I have not taken a course in linear algebra, and I understand that some (or all) of it relates to linear algebra, so "shut up about tensors until you've taken a course in linear algebra" is a completely acceptable answer to this question. But if there is a way to explain it in words to someone that hasn't taken a course in linear algebra, I would appreciate hearing it. I have seen the formal definition that involves the word "transformation," a product of partial derivatives, some undefined x's, bars, and indices, all of which mean nothing to me, so repeating the formal definition found anywhere online would not be helpful.

Thanks.

7thSon · Aug 3, 2010

marschmellow said:

I understand the basic concept of a tensor, and I find it fascinating. To me, what constitutes cool math is taking something you already get on one level and generalizing it to see what new cool stuff can come from it, so the idea of generalizing the notation of a scalar, vector and matrix to a quantity with n indices seems really cool to me. I just have a few questions slash statements:
1. My understanding of the difference between the tensor product and the dot product is that the dot product is matrix multiplication and the tensor product is laying out each component of two tensors and multiplying them together as though they were terms of separate polynomials. Is this correct?
2. It seems to me like a particle in motion in Euclidian (or any) space could be modeled with a second-order tensor: one index would tell you which vector you are interested in (position, velocity, etc.) and the other index would tell you which spatial component of that vector you are interested in. Yet I have never encountered a particle being modeled in this way. Is this because the math involved with tensor analysis would never be useful to model a particle in this way, or am I being stupid?
3. Could someone explain in words the concepts of covariant and contravariant indices? I have not taken a course in linear algebra, and I understand that some (or all) of it relates to linear algebra, so "shut up about tensors until you've taken a course in linear algebra" is a completely acceptable answer to this question. But if there is a way to explain it in words to someone that hasn't taken a course in linear algebra, I would appreciate hearing it. I have seen the formal definition that involves the word "transformation," a product of partial derivatives, some undefined x's, bars, and indices, all of which mean nothing to me, so repeating the formal definition found anywhere online would not be helpful.

Thanks.

First, props to you, it's good to know that someone is out there carrying the torch.

I'm no expert on this stuff but i'll give it a crack

1. Inner product is a contraction operation. It turns a first-order tensor (vector) or second-order tensor (rep. by matrix) into a scalar value. The inner product of a second order tensor such as A*B is just sum of A_ij * B_ij (might be sum of A_ij * B_ji, I don't remember). Tensor product I believe is the same thing as dyadic product or outer product, and creates a higher order vector space than its two components. Hence, the tensor product of two first-order tensors is a second-order tensor.

2. A tensor indicates a meaningful physical quantity. As such, you can create a matrix such as the one you describe, but it's not attributable to any meaningful geometric object so it is not a tensor but just a second-order "tuple" of physical values. Of course, each of the individual objects like velocity, strain, etc. are tensors so they are represented separately.

3. You won't find anyone who has been exposed to contravariance and covariance but not linear algebra. There is a series of great posts by llarsen under "inverse function theorem for surfaces" that talk about the fundamental concepts, but in my opinion trying to follow this discussion is probably futile at this point for you.

I would for now simply accept that a consequence of any non-orthogonal coordinate system (curvilinear for example) there is more than one way to represent components of vectors, each corresponding to how the basis vectors are chosen. The idea of two "dual" sets of basis vectors makes no sense for Cartesian coordinates. The two sets of basis vectors are described as covariant or contravariant. The components used when covariant bases are used are the contravariant components. With contravariant bases, the covariant components are used to represent the vector.

Fredrik · Aug 3, 2010

marschmellow said:

1. My understanding of the difference between the tensor product and the dot product is that the dot product is matrix multiplication and the tensor product is laying out each component of two tensors and multiplying them together as though they were terms of separate polynomials. Is this correct?

The dot product is defined by

\vec x\cdot\vec y=\sum_{i=1}^n x_iy_i

If we define

x=\begin{pmatrix}x_1\\ \vdots\\ x_n\end{pmatrix},\quad y=\begin{pmatrix}y_1\\ \vdots\\ y_n\end{pmatrix}

we also have

x^Ty=\begin{pmatrix}x_1 & \cdots & x_n\end{pmatrix}\begin{pmatrix}y_1\\ \vdots\\ y_n\end{pmatrix}=\sum_{i=1}^n x_iy_i=\vec x\cdot\vec y

The term "tensor product" refers to a way to construct a new vector space from two given vector spaces, so you might want to use that term cautiously. What you probably have in mind are expressions like A_{ijk}B_{lmn}. Here A is a tensor, but A_{ijk} is the ijk component of A in a specific coordinate system, so A_{ijk}B_{lmn} is just the product of two real numbers. Now, A_{ijk}B_{lmn} can also be interpreted the components of a six-index tensor, and that fact has something to do with tensor products, but it's too complicated to explain it here.

marschmellow said:

2. It seems to me like a particle in motion in Euclidian (or any) space could be modeled with a second-order tensor: one index would tell you which vector you are interested in (position, velocity, etc.) and the other index would tell you which spatial component of that vector you are interested in. Yet I have never encountered a particle being modeled in this way. Is this because the math involved with tensor analysis would never be useful to model a particle in this way, or am I being stupid?

A tensor with two indices (running from 1 to 3) has nine components (in each coordinate system). The number of components isn't really a problem, since you can take the diagonal components to be zero. The problem is that the definition of "tensor" ensures that the components of the tensor in a coordinate system S are related to the components in another coordinate system S' in a certain way, and you have specified the components of a tensor in all inertial coordinate system without checking if they satisfy this relationship. And it turns out they don't.

However, in the context of special relativity (where indices run from 0 to 3), it is possible to do something similar to the electric and magnetic fields. Link.

marschmellow said:

3. Could someone explain in words the concepts of covariant and contravariant indices? I have not taken a course in linear algebra,

I don't think that's possible. You need to at least earn about vector spaces and linear operators first. This is what I can tell you:

Fredrik said:

Forget about manifolds for a moment and let V be an arbitrary finite-dimensional vector space over the real numbers. Now define V* as the set of all linear functions from V into \mathbb R. Then define the sum of two members of V*, and the product of a member of \mathbb R and a member of V* by

(f+g)(v)=f(v)+g(v)

(kf)(v)=k(f(v))

These definitions give V* the structure of a vector space. It's called the dual space of V. Since V* is a vector space, the members of V* are vectors. However, when V is the tangent space of a manifold, the members of V are called "tangent vectors" and the members of V* are called "cotangent vectors". This is often shortened to the misleading "vectors" and "covectors", or worse (much worse actually) "covariant vectors" and "contravariant vectors".

If you want to see the rest of that post, click the > link at the start of the quote.

marschmellow said:

I have seen the formal definition that involves the word "transformation,"

I consider that "definition" stupid and obsolete, and to make things worse, it's almost always presented in a way that's beyond horrible.

I posted the modern definition here if you're interested. This post contains a few comments about the stupid definition.

marschmellow · Aug 4, 2010

I was already pretty confident in my understanding of the dot product, and your responses have made me more so. Thank you.

As to my second question, the answer I seem to be getting is "sure, you can create a matrix like that, but it would be meaningless and/or pointless." I couldn't follow the details on exactly why that is true, but I will take your word for it.

I think I know what a vector space is (similar to a group or ring in that it involves a certain kind of object, in this case vectors or maybe higher order tensors, and operations defined for them, but also involving a specific coordinate system slash manifold in which the vectors and their components live?), and I get the basic idea of a manifold (an n-dimensional surface ambient in a higher dimensional Euclidian space defined by a one-to-one mapping from every point in n-dimensional Euclidian space to n+1-dimensional Euclidian space?), but I don't understand at all how these concepts work together, especially when it comes to terms like tangent spaces and such, so I think the answer to my third question is way over my head. I might use MIT open courseware to learn about linear algebra if I get impatient or I might just wait for college.

Thank you both for your help. If you have any suggestions for what the next best thing I could do for improving my math knowledge from where I am now, I would appreciate it greatly.

Fredrik · Aug 4, 2010

I think the best thing to do is to study linear algebra as soon as you're able to. My favorite book is Axler.

A vector space is less complicated than you think. The definition involves a set and a couple of functions called "addition" and "scalar multiplication" (not to be confused with scalar products), which are required to satisfy a list of eight specific axioms. Check out the Wikipedia definition (and pretend that it says "the real numbers" instead of "a field F").

The definition of "manifold" is very complicated, and probably not something you should worry about for a few more years. The connection between manifolds and vector spaces is that there's a vector space associated with each point of the manifold, called the tangent space at that point.

llarsen · Aug 4, 2010

marschmellow said:

3. Could someone explain in words the concepts of covariant and contravariant indices?
...
I have seen the formal definition that involves the word "transformation," a product of partial derivatives, some undefined x's, bars, and indices, all of which mean nothing to me, so repeating the formal definition found anywhere online would not be helpful.

Let me try and give you a quick explanation of covariant and contravariant indicies and what they mean. First it is useful to understand first order tensors, by which I mean tensors that have one index. These are called vectors and covectors. Vectors are represented by a contravariant (upper) index and covectors by a covariant (lower) index. So where do these come from:

Suppose you want to identify a point in space. There are two ways you can think of identifying the point. One is to think of the intersection of the coordinate functions. For example, in x, y, z space, for the point (5,1,2) you might think of the constant surface x=5, intersecting with the constant surfaces y=1 and z=2. Typically however, we tend to think of this in terms of coordinate lines. The coordinate lines for x falls along constant surfaces of y and z. It is parameterized by x. So, typically rather than identify the point (5,1,2) by the intersection of constant surfaces, you walk along the coordinate line x out to 5, then you walk along the coordinate line y out to 1, then the z coordinate line out to 2.

The coordinate functions and coordinate lines are duals of one another. If I define the coordinate functions, I automatically defined a set of coordinate lines. However, these objects have different properties. A function is characterized by constant surfaces. It has a direction and rate of change, but there is no 1-D path to characterize a function. A parameterized line has a specific direction along a 1-D path and a rate of change along that path.

A vector (or contravariant tensor) comes from taking the derivative of a parameterized line. For example, the coordinate line for x is (x,0,0), with x being the parameter. If we take the derivative with respect to the parameter we get the coordinate vector (1,0,0). A contravariant vector characterizes a coordinate line and is represented in matrix notation as a column vector. In index notation it is represented as a variable with an upper index. Graphically it is an infinitesimal quantity that points in the direction of the parameterized line. The length of the vector characterizes the rate of change of the parameter along the curve. You can also simply plot the parameterized curve (with marks to represent the parameter) to characterize the vector field, since this conveys the same information. A more interesting parameterized curve is (t, t^2, t^3) with parameter t. The tensor associated with this curve is (1, 2t, 3t^2). At t=2, the point on the curve is (2, 2^2, 2^3), and the vector (covariant tensor) that characterizes the current direction the curve is pointing in space and the rate of change along the curve is (1, 2*2, 3*2^2).

A covector (or covariant tensor) comes from taking the differential of a function. For example, taking the differential of the coordinate function f = x, you get the differential (\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}) = (1,0,0). A covariant vector characterizes the direction the function is changing in and the rate of change. In matrix notation, this is written as a column vector. In index notation, this is written as a variable with a lower index. In graphical form, this is represented by two parallel surfaces who's distance appart matches that of constant surfaces of the function, usually with some sort of arrow indicating the direction of increase. However, I often just prefer to plot the constant surfaces of the function to represent the covectors.

I don't have time at the moment to explain further, but hopefully this gives some sense of what the difference is between the upper and lower indices. The derivatives of functions and parameterizes lines are the building blocks of higher order tensors. Hopefully this gives you a taste at least of what they mean. Unfortunately, it doesn't necessarily help you get a feel for how they are used or why they are important, but having a picture of what they represent is a start.

unmasked · Aug 4, 2010

marschmellow said:

3. Could someone explain in words the concepts of covariant and contravariant indices? I have not taken a course in linear algebra, and I understand that some (or all) of it relates to linear algebra, so "shut up about tensors until you've taken a course in linear algebra" is a completely acceptable answer to this question. But if there is a way to explain it in words to someone that hasn't taken a course in linear algebra, I would appreciate hearing it. I have seen the formal definition that involves the word "transformation," a product of partial derivatives, some undefined x's, bars, and indices, all of which mean nothing to me, so repeating the formal definition found anywhere online would not be helpful.

Thanks.

I will take a shot at this. Hopefully it is helpful.

Consider a scalar such as temperature. Assume that at each point in space, a temperature can be measured. Scalars in this sense are to be thought of as fields, not just a single number. You could think of temperature as a function of the 3 spatial coordinates. The derivative of a scalar is known as the gradient, which is a covariant vector (as with scalars, vectors are to be thought of as vector fields, not just single vectors, as they are defined at each point in the space). Why is the gradient a covariant vector? Because its components are derivatives with respect to a coordinate system, if you change the coordinate system from xyz to x'y'z', the components must covary with the change in coordinate systems. This is basically the chain rule(ie in 1 direction, df/dx' = df/dx dx/dx').

Now suppose have some coordinate system xyz and you are interested in measuring the change in temperature at a point along some direction. This direction is defined as a vector in terms of components of the coordinate system. A vector in this sense can be thought of as the sum of the components projected along the coordinate axes. If you change the coordinate system to x',y',z' the orientation of the vector doesn't change but its components do as you are now projecting components along different axes. In fact you can think of the transformation rule as a set of functions x'=f(x,y,z), y'=g(x,y,z) and z'=h(x,y,z). The transformation rule has an inverse (so you can go from one system to the other and back) and can also be thought of as a set of functions x=f^{-1}(x',y',z'), y=g^{-1}(x',y',z'), z=h^{-1}(x',y',z')

But here, the direction we have in mind varies with respect to the inverse transformation rule and is therefore said to be contravariant with changes to the coordinate system.

It turns out that the change in temperature along some direction is given (in euclidean space) as the directional derivative and is the inner product (dot product) of the gradient and the directional vector.

I hope that makes sense.

llarsen · Aug 4, 2010

Above I should have said that the coordinate lines associated with x are (x, y_0, z_0) where y_0 and z_0 are constants. (x,0,0) is the coordinate axis. But there is a coordinate line associated with every choice of y_0 and z_0. However since y_0 and z_0 are constants, when we take the derivative with respect to the parameter x, the associated vector is still (1,0,0).

llarsen · Aug 5, 2010

One part of the tensor picture is something called differential forms. These are contravariant tensors with an antisymmetric product called the wedge product defined. The wedge product can be written in terms of the tensor product. Both are easy to use. If you want to get a feel for tensors in a real world application way that has useful pictures to show the way, then there are a few good resources that teach differential forms. One of the ones I really like is from some electrical engineering professors at BYU - http://eceformsweb.groups.et.byu.net/ftext.pdf" .

With differential forms, you don't tend to use the lower index notation to represent a covariant tensor. A covariant tensor is represented by using something like a dx + b dy for single index tensors, and a dx \wedge dy + b dy \wedge dz + ... for double index covariant tensors, a dx \wedge dy \wedge dz + b dy \wedge dz \wedge dw + ... for tripple index tensors etc. Anyway, differential forms are an important class of tensors, and this introductions is more approachable that most I have seen.

mordechai9 · Aug 5, 2010

Wow Fredrik, that's an outstanding response... thumbs up...

Regarding question 3 --

This resource "http://homepages.cae.wisc.edu/~callen/FluxCoordinates.pdf" is frequently used in my department for introducing the notions of "covariant components" and "contravariant components". Chapter 2 and 3 is the material for this.

As Fredrik says, maybe some of this material is obsolete or confusing when compared with other treatments of the subject. In this particular treatment, however, they do introduce these definitions in a simple way, which is also formally accurate, I believe. It involves a good deal of vector calculus, but if you know rudimentary vector calculus and partial differentiation, you should be able to understand it.

Instead of talking about tensors, I will just share with you the simpler idea of "covariant" and "contravariant" vector components. This is maybe the simplest context in which you see lower versus upper indices, and their associated meanings.

The first idea is that a vector space (like R^3) can be decomposed in two sets of basis vectors, known as "reciprocal basis sets" or "recriprocal basis vectors". These two sets of vectors obey certain nice algebraic relations between each other. In case you're not familiar with basis vectors, a simple example of a "basis set" in R^3 might be the x, y, and z directions, with which you can write any other vector in the space.

Given some basis set, a certain natural reciprocal basis set can always be derived. More specifically, if you specify a vector in a three dimensional space, it must have six pieces of information. For example, you must have the value of the vector in the x-direction, the value of the actual x-direction itself, the value of the vector in the y-direction, the value of the actual y-direction itself, and similarly for z. In order to specify this information, you need three coordinates, since each of these 6 values may depend on where you are in R^3 (in general).

These three coordinates (which are actually functions) give rise to the "contravariant" basis vectors, denoted with upper indices, which are the reciprocal basis for your original basis set (the "covariant" basis vectors.) The terminology is perhaps obsolete or does not generalize, but I think this is the most elementary treatment of the ideas. Note that in different contexts, these words might indicate different ideas. But this is at least one context in which the words are used.

marschmellow · Aug 6, 2010

Okay, I have read each of your posts several times and tried to digest it all. The best summary of co-and contravariance that I can put in my own words is that a covector is a derivative with respect to space (the coordinate system) and a contravariant vector is a derivative with respect to a parameter (something outside the coordinate system). Is this a false, correct, or incomplete explanation?

Edit: I guess what still bugs me is that I understand the distinction but don't see why it's a useful one worthy of its own notation and terminology. Coordinate system dimensions and parameters are both variables, but one is represented spacially and another just analytically. Is it only useful in a purely physical context where a variable that is a spatial dimension shares a mathematical distinction (originating from physical laws) from a variable that is not a spatial dimension? If you think of a parameter as its own 1-dimensional coordinate system (the number line) mapped onto another coordinate system, could the derivative of the mapping (aka function) with respect to the parameter be a covector if you are considering coordinate transformations of the number line (the graphical representation of the parameter)?

Let me know if that made no sense.

Fredrik · Aug 7, 2010

marschmellow said:

Okay, I have read each of your posts several times and tried to digest it all. The best summary of co-and contravariance that I can put in my own words is that a covector is a derivative with respect to space (the coordinate system) and a contravariant vector is a derivative with respect to a parameter (something outside the coordinate system).

I wouldn't say that. Suppose that f is a function from a manifold M into the real numbers, that v is a tangent vector at a point p in M, and that ω ("omega") is a cotangent vector at p. Then v is the kind of function that takes f to a real number, and ω is the kind of function that takes v to a real number. v(f) is a kind of derivative of f. In fact, v can be expressed in the form

v=\sum_\mu v^\mu\frac{\partial}{\partial x^\mu}\bigg|_p

The v^\mu are called the components of v, in the coordinate system x. The partial derivative operators are defined this way:

Fredrik said:

If M is a manifold, U is an open subset of M, p is a point in U, and x:U\rightarrow \mathbb{R}^n is a coordinate system, then the partial derivative operators

\frac{\partial}{\partial x^\mu}\bigg|_p

are basis vectors of the tangent space T_pM of M at p.

These operators are defined by their action on functions f:M\rightarrow\mathbb{R}.

\frac{\partial}{\partial x^\mu}\bigg|_p f=(f\circ x^{-1}),_\mu(x(p))

where ,_\mu denotes the partial derivate of the function, with respect to the \muth variable.

marschmellow · Aug 7, 2010

Is it that a covariant index denotes a tensor's location in a coordinate system but a contravariant index denotes other aspects about the tensor (like where it points if it's a vector and how far)?

@Fredrik: Are you simply showing mathematically how to define a tangent space? The tangent space would provide the basis for vectors to live in, correct? So a 2-manifold would have a tangent plane at every point in which 2-component vectors live, one component pointing in an arbitrary direction and the other normal to the first? Do the vectors have to be located at that point of tangency or can they have any coordinates as long as the do not point outside of the tangent thing (line, plane, space, hyperspace, whatever).

Is a tensor isomoprhic to a function of n variables, each of whose domain is the natural numbers from 1 to m, where m is the number of coordinates in the system? Or does this leave out an order to the array of numbers? I suppose the co- and contravariance concepts get lost with this interpretation.

What Are Covariance and Contravariance in Tensors?

Similar threads

Strain Tensor Based on Clifford Algebra

On covariance of the Lagrange equations

Undergrad Differential Geometry with GNU/Linux

Undergrad About the maximal extension of local charts on a manifold

Graduate Injective immersion and embedding

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers