I am not a physicist, so take this with a grain of salt. But I'll try to give a math guy's view of this.
A vector represents a physical arrow in a vector space, say a position (with respect to a given origin), or (more intrinsically) a velocity. A covector represents an operator on vectors that spits out a number. A vector plus a metric yields a covector. I.e. if A is a vector in V, then A.( ) is the corresponding covector in V*.
Some objects in physics are naturally vectors and some are naturally covectors. If we have a basis e1,...,en of our vector space V, we use it to represent our vectors, i.e. to know where they are. Thus we write A in terms of the e1,...,en to locate A. If we have a covector in V* such as the dot product operator A.( ), we would like to write it in terms of a nice basis of V*.
Now if e1,...,en is a basis of V, then the dot products e1.( ),......, en( ), will be a basis of of V*, but not a particularly useful one unless {ej} is orthonormal. I.e. if we are given a vector A, written in terms of the ej, then A.( ) can be written in terms of the basis ej.( ), but this won't give us much information about the operation A.( ). I.e. we would like to be able to evaluate A.v on any v, if we know the expansion of v as a1e1+...+anen. But this is not so easy, since when {ej} is not orthonormal, we don't even know how e1 acts on itself, or any of the other ej. Thus writing A.( ) in terms of the ej.( ), is unhelpful, not even to compute A.e1. Of course we can do it, but the answer will be complicated, and we want it to be simple.
Thus if we are really interested in the operator A.( ) rather than the physical vector A, we want to write A.( ) in terms of a nicer basis of V*. That means finding a basis of V* whose action on the basis {ej} is very simple. This is given by the basis {e^j.( )} for V*. I.e. given the vector basis e1,...en, and a vector A = a1e1+...+anen, to understand the operator A.( ), we find the "dual basis" e^1,...,e^n for V, and rewrite A in terms of this basis, A = c1e^1+...+cne^n.
Then we have the expansion A.( ) = c1(e^1.( )) +....+ cn(e^n.( )), and this is very useful for computing the operator A.( ) on vectors. I.e. given a vector B = b1e1+...+bnen, A.B is just c1b1+...cnbn.
For comparison, using the expansion A = a1e1+...+anen, we have
A.B = a1b1(e1.e1) + a1b2(e1.e2) +....+ a1bn(e1.en)
+ a2b1(e2.e1) + a2b2(e2.e2)+......+a2bn(e2.en)
......
+anb1(en.e1) +......+anbn(en.en). So kind of a long mess, n^2 terms.
It's the difference between using an arbitrary matrix or a diagonal matrix to compute the dot product; or in fact, just the identity matrix. I.e. given a basis {ej} for V, the dot product on V is given by a matrix, and finding the dual basis {e^j} essentially diagonalizes that matrix, and even makes all diagonal elements equal to 1.
To repeat, given A = a1e1+...+anen in V, if what we are really interested in is computing dot products A.B for all B in V, then we want to first rewrite A in terms of the dual basis e^1,...,e^n, as A = c1e^1+...+cne^n.
Then writing A in terms of the "dual basis" {e^j}, and writing B in terms of the original basis {ej}, makes it easy to compute A.B.
So indeed a1e1+...+anen, and c1e^1+...+cne^n are the same object, namely both equal the vector A in V, but the second representation is of interest for studying not so much the physical object A, but its role as an operator A.( ).
On the other hand, even though we may be interested more in the operation of dotting with a given vector, there may still be some meaning in the physical position of that vector, since of course that position does determine its action when dotting. E.g. although the gradient of a function of two variables is, in modern terms, an operator on tangent vectors, telling you the derivative of the function in that direction, still the physical vector whose dot product gives you that directional derivative can be visualized as a vector pointing in the direction where the function increases fastest.
I.e. given a (curve C with) velocity vector v at p, in the domain of f(x,y) in the plane, then grad_pf is the covector whose value at v is grad_pf(v) = the slope of the graph at the point above p, and in the direction of v, (and multiplied by the length of v). But it also equals the dot product del_pf.v, where del_pf is the vector in the plane whose coordinates are the partials of f. The vector del_pf lives in the plane V, and the covector grad_pf lives in the dual plane V*. This is all confusing of course since we are used to calling the vector del_pf also "grad f at p", i.e. we speak of "the gradient vector". Thus the gradient as a covector, is the operation of dotting with the gradient vector.
So always keep in mind there are two different concepts: w a vector, and w.( ) a covector, the operation of dotting with that vector. The confusion arises by dropping the notation w.( ) for the dot product, and thinking of w and w.( ) as the same object, which they are not.
In particular the elements e^j of the "dual basis", are not covectors, nor are they a basis of the dual space. Rather given a basis e1,...,en for the space V, with "dual basis" e^1,...,e^n, the operators e^1.( ), ..., e^n( ), i.e. the corresponding covectors, give the corresponding good basis of the dual space V*.
In modern terms, it is the operators e^1.( ), ..., e^n( ) which are properly called the dual basis for e1,...,en. I.e. in modern language, the dual basis of a basis for V, is a basis for V*. This states the relation of the bases {ej} and {e^j} this way: given a basis {ej} for V, find the basis {e^j} for V whose dot products yield the correct, intrinsic, dual basis for {ej} in V*.
Note: if you keep the basis {ej} but change the metric, the basis {e^j} will also change, (since the correspondence between V and V* will change), but the intrinsic dual basis for V* given by the operators {e^j.( )} will not change. I.e. if we change the metric, the "dual basis" of {ej} in V will change to some other basis {d^j} for V, but the resulting basis for V*, given by the operators {d^j.( )} will be the same as the old basis {e^j.( )}.