Nice story.
here is another try at why covariant and contravraiant are different. Logically, covariant means in the "same direction as", while contravariant means in the "opposite direction from". Thus there is no way they can be the same. They are by definition opposites, in the sense of transforming in opposite directions.
here is the simplest illustration: consider x(t) as a function of t. Thus given a t, we can transform it into an x, i.e. x(t). But we do not therefore transform a FUNCTION of t into a function of x. just the opposite, we transform a function of x, such as f(x) into a function of t, namely into f(x(t)).
Thus the points, i.e. the coordinate variables, go from t to x, while the functions acting on the points, go the opposite way, from f(x) to f(x(t)). Thus functions and points, or functions and coordinates, transform in opposite directions.
This means if the "standard" direction is considered as the direction the coordinates go in, i.e. from t to x, then the other transformation, from f(x) to f(x(t)), should be called contravariant.
This is reflected exactly in the distinction between tangent vectors and cotangent vectors. A tangent vector at p is represented by a curve passing through p. Then if f is a mapping taking p to q, we can apply it to the curve, obtaining a curve through q. This action of f on curves, is the "derivative" of f. So the derivative of f goes in the same direction as does f, i.e. tangent vectors v at p go to tangent vectors Dfp(v) at f(p).
On the other hand, dual vectors go the opposite way. Now this is going to get more complicated notationally, and I apologize. But here goes:
For instance, even if we use an inner product to represent a cotangent vector at f(p) as dotting with a tangent vector w at f(p), i.e. say we think of <w, > as a cotangent vector at f(p), it still transforms the other way, i.e. from the q's back to the p's.
I.e. given w = a tangent vector at f(p), if we use the dot product to consider it as the cotangent vector <w, > at f(p), then it gives us a cotangent vector Dfp*(<w, >) at p as follows: to prove Dfp*(<w, >) is a cotangent vector at p, we have to show how it acts on a tangent vector v at p.
Well, given any tangent vector v at p, the pullback covector Dfp*(<w, >) acts on v by first mapping v over to the tangent vector Dfp(v) at f(p), and then applying <w, > to that vector.
I.e. Dfp*(<w, >)(v) = by definition, <w, Dfp(v)>. Thus DOTTING with a tangent vector, transforms in the opposite direction from the tangent vector itself.
Now it is true that this operation on tangent vectors v at p, CAN be achieved by dotting them with some tangent vector at p, but there is NO natural choice of such! The choice depends on the choice of inner product at p, which is completely arbitrary.
I.e. it is not true that the covector Dfp*(<w, >) at p, obtained by pulling back <w, >, is in any natural way equal to dotting with a tangent vector at p. on the other hand without any choice of inner product, the operation of composing the derivative of f with a linear function at f(p) is totally natural.
Oh I guess I went overboard here. but heck, it is hard to just give up. Soon school will start again and I will have no such time on my hands. I wil be engaged trying to convince people that there is no one distinguished "dependent vector" in a dependent set of vectors.
Actually it is the same idea, since my whole point is that covariant and contravariant are not properties of a single type of vector, but of a relationship between two things. I.e. to detect covariance you have to compare transformation rules of your object, with those of a standard object, usually the coordinate map on points.
Of course classical differential geometry terminology has screwed this whole covariant contravariant thing up BIG time, and uses the terms backwards. i.e. in differential geometry, "contravariant vectors" are the tangent vectors that transform in the SAME direction as the mapping on points, while "covariant vectors" or "covectors" are the ones that transform in the opposite direction. I.e. in classical differential geometry language, "contravariant vectors" transform covariantly, because Dfp goes in the same direction as f, while "covariant vectors" transform contravariantly, since Dfp* goes in the opposite direction from f.
Of course algebraic topologists are also guilty since "cohomology" is a contravariant operation. Years ago Peter Hilton tried to change history and call it "contrahomology", but the reason you have never heard of contrahomology, is of course he failed.
No matter, it still follows that covariant and contravariant vectors are distinct because they transform in the opposite direction from EACH OTHER.
(Maybe the classical screwup occurred because classicists were not in possession of the idea of coordinates transformations as maps on points, and were instead referring to the transformation of coordinate FUNCTIONS as opposed to the points of coordinate space. So they were being consistent, in calling contravariant vectors ones which transformed in the opposite direction to the coordinate functions. So possibly the whole confusion began, and persists, by substituting notation, i.e. coordinates, in place of concepts, i.e. geometry.)