I believe that the words covariant and contravariant refer to the way the components of the vector change with respect to the coordinates system. Suppose (x^1,\ldots,x^n), (\tilde{x}^1,\ldots,\tilde{x}^n) are two intersecting coordinate systems on a manifold M. Suppose for each coordinate system around a point p of M, there is a rule that associates to the coordinates (x^1,\ldots,x^n) of p a set of n numbers (a vector in R^n, then) (v^1,\ldots,v^n).
If the components v^i, \tilde{v}^i of the vector corresponding to two coordinate systems (x^1,\ldots,x^n), (\tilde{x}^1,\ldots,\tilde{x}^n) around p are related like so:
\tilde{v}^i=\sum_jv_j\frac{\partial x^j}{\partial \tilde{x}^i}
then the vector v=(v^1,\ldots,v^n), which we consider the same as the vector \tilde{v}=(\tilde{v}^1,\ldots,\tilde{v}^n), is called a covariant vector.
If, on the other hand, the components are related like so:
\tilde{v}^i=\sum_jv_j\frac{\partial \tilde{x}^j}{\partial x^i}
then the vector v=(v^1,\ldots,v^n), which we consider the same as the vector \tilde{v}=(\tilde{v}^1,\ldots,\tilde{v}^n), is called a contravariant vector.
So why the names? Probably because the formula as you go from v to \tilde{v} in a covariant vector involves the rate at which x changes with respect to \tilde{x}, while in a contravariant vector, it is the contrary: it involves the rate at which \tilde{x} changes with respect to \tilde{x}.
Examples:
(1) Suppose we have a curve on an n-manifold M passing through the point p at the time t=0. Then for each coordinate system around p, there corresponds a curve in R^n, and we may differentiate this curve at t=0 to obtain a vector in R^n. If you carry out the computation, you will discover that this is an example of a contravariant vector.
(2) For f a function of a manifold, given a coordinate system around p, you can compute the gradient of the coordinate representation of f at p. This is an example of a covector.
(3) [If you know some classical mechanics] If M=Q is the manifold of physical states of a system and L:TQ\rightarrow \mathbb{R} is a lagrangian function, then for each chart (q^1,\ldots,q^n) of Q (i.e. each set of generalized coordinates) the generalized momenta are defined by
p^i:=\frac{\partial L(q^1,\ldots,q^n,v^1,\ldots,v^n)}{\partial v^i}
This too defines a covector.
Now, you will often read things like "a contravariant vector is an element of the tangent space and a covariant vector is an element of the cotangent space". What is meant by that is the following. Given a point p on a manifold, we call tangent space at p the vector space T_pM consisting of all linear maps D:C^{\infty}(M)\rightarrow\mathbb{R} satisfying the Leibniz rule "at p" (i.e. D(fg)=D(f)g(p)+f(p)D(g)). It turns out that for a coordinate system (x^1,\ldots,x^n) around p, there is a natural basis for T_pM which we denote (by no accident) (\partial/\partial x^1|_p,\ldots, \partial/\partial x^n|_p). So a general element of T_pM is of the form
v=\sum_{i}v^i\left.\frac{\partial}{\partial x^i}\right|_p
and the vector (v^1,\ldots,v^n) is contravariant. Indeed, if (\tilde{x}^1,\ldots,\tilde{x}^n) is another coordinate system around p, then by the chain rule
v=\sum_{i}v^i\left.\frac{\partial}{\partial x^i}\right|_p=\sum_{i}v^i\left(\sum_j \frac{\partial\tilde{x}^j}{\partial x^i}(p)\left.\frac{\partial}{\partial \tilde{x}^i}\right|_p\right)=\sum_j\left(\sum_iv^i\frac{\partial\tilde{x}^j}{\partial x^i}(p)\right)\frac{\partial}{\partial \tilde{x}^i}\right|_p\right)
So, given any contravariant vector (v^1,\ldots,v^n) at p associated to a coordinate system (x^1,\ldots,x^n), you can identify (v^1,\ldots,v^n) with the element
\sum_{i}v^i\left.\frac{\partial}{\partial x^i}\right|_p
of T_pM. Therefor, from the mathematical perspective of structures, the only contravariant vectors at p are the elements of T_pM, since any other can be naturally identified with one of these.
Similarly, if you consider T^*_pM, the dual space of T_pM, and if you note (again, by no accident) (dx^1_p,\ldots,dx^n_p) the basis of T^*_pM dual to the basis (\partial/\partial x^1|_p,\ldots, \partial/\partial x^n|_p) of T_pM, then any element of T^*_pM is of the form
v=\sum_{i}v_idx^i_p
and the vector (v_1,\ldots,v_n) is covariant. Indeed, if (\tilde{x}^1,\ldots,\tilde{x}^n) is another coordinate system around p, then by definition of the differential of a function
v=\sum_{i}v_idx^i_p=\sum_{i}v_i\left(\sum_j\frac{\partial x^i}{\partial \tilde{x}^j}d\tilde{x}_p^j\right)=\sum_j\left(\sum_iv_i\frac{\partial x^i}{\partial \tilde{x}^j}\right)d\tilde{x}^j_p
So, given any covariant vector (v_1,\ldots,v_n) at p associated to a coordinate system (x^1,\ldots,x^n), you can identify (v_1,\ldots,v_n) with the element
\sum_{i}v_idx^i_p
of T^*_pM. Therefor, from the mathematical perspective of structures, the only covariant vectors at p are the elements of T^*_pM, since any other can be naturally identified with one of these.
Tensors of rank (k l) (read "tensor of k contravariant indices and l covariant indices") are defined similarly as a rule associating an array of number T^{i_1,\ldots,i_k}_{j_1,\ldots,j_l} to each chart (x^1,\ldots,x^n) around p (where each i and j takes any values between 1 and n) such that if \tilde{T}^{i_1,\ldots,i_k}_{j_1,\ldots,j_l} is the array of numbers associated with another coordinate system (\tilde{x}^1,\ldots,\tilde{x}^n), then
\tilde{T}^{i_1,\ldots,i_k}_{j_1,\ldots,j_l}=\sum_{i_1'}\ldots\sum_{j_l'}T^{i_1',\ldots,i_k'}_{j_1',\ldots,j_l'}\frac{\partial \tilde{x}^{i_1'}}{\partial x^{i_1}}\ldots\frac{\partial x^{j_l}}{\partial \tilde{x}^{j_l'}}
But each of those can be canonically identified with an element of
\otimes_{r=1}^kT_pM\otimes \otimes_{s=1}^lT^*_pM
so we often say that a tensor of rank (k l) is just an element of the above.