A good place to start studying tensors is chapter 3 of "A first course in general relativity" by Schutz. Another place that looks good (I know that the rest of the book is very good) is chapter 8 of "Linear algebra done wrong" by Treil.
I'm going to quote myself. This was first posted in
this thread, posts #11, #23 and #24.
Post #11:
The components of a vector ##v## with respect to an ordered basis ##(e_1,\dots,e_n)## are the unique real numbers ##v^1,\dots,v^n## such that ##v=\sum_{i=1}^n v^i e_i##.
I will elaborate a bit...
Let ##V## be an n-dimensional vector space over ##\mathbb R##. Let ##V^*## be the set of linear functions from ##V## to ##\mathbb R##. Define addition and scalar multiplication on ##V^*## by ##(f+g)(v)=f(v)+g(v)## and ##(vf)(x)=a(f(v))## for all ##v\in V##. These definitions turn ##V^*## into a vector space. The ##V^*## defined this way is called the dual space of ##V##.
Let ##(e_i)_{i=1}^n## be an ordered basis for ##V##. (The notation denotes the n-tuple ##(e_1,\dots,e_n)##). It's conventional to put these indices downstairs, and to put the indices on components of vectors in ##V## upstairs. For example, if ##v\in V##, then we write ##v=v^i e_i##. I'm using the summation convention here, so the right-hand side really means ##\sum_{i=1}^n v^i e_i##.
For each ##i\in\{1,\dots,n\}##, we define ##e^i\in V^*## by ##e^i(e_j)=\delta^i_j##. It's not hard to show that ##(e^i)_{i=1}^n## is an ordered basis for ##V^*##. The ordered basis ##(e^i)_{i=1}^n## is called the dual basis of ##(e_i)_{i=1}^n##. It's conventional to put the indices on components of vectors in ##V^*## downstairs. For example, if ##f\in V^*##, then we write ##f=f_ie^i##.
Exercise: Find an interesting way to rewrite each of the following expressions:
a) ##e^i(v)##
b) ##f(e_i)##
Post #23:
If ##(e_i)_{i=1}^n## and ##(e_i')_{i=1}^n## are ordered bases for ##V##, then for all ##i##, there must exist numbers ##M_i^j## such that ##e_i'=M_i^j e_j##. (In other words, we can always write the new basis vectors as linear combinations of the old).
Post #24:
Now let ##M## be the matrix such that for all ##i,j##, the component on row ##i##, column ##j## is ##M^i_j##. Recall that the definition of matrix multiplication is ##(AB)^i_j=A^i_k B^k_j##. Let ##v\in V## be arbitrary. We have
$$v=v^j e_j=v^i{}' e_i{}' =v^i{}' M^j_i e_j,$$ and therefore ##v^j=v^i{}' M^j_i##. This implies that
$$(M^{-1})^k_j v^j =v^i{}' (M^{-1})^k_j M^j_i =v^i{}' (M^{-1}M)^k_i =v^i{}' \delta^k_i =v^k{}'.$$ So the n-tuple of components ##(v^1,\dots,v^n)## transforms according to
$$v^i{}'= (M^{-1})^i_j v^j.$$ The fact that the matrix that appears here is ##M^{-1}## rather than ##M## is the reason why an n-tuple of components of an element of ##V## is said to transform contravariantly. The terms "covariant" and "contravariant" should be interpreted respectively as "the same as the ordered basis" and "the opposite of the ordered basis".
It's easy to see that the dual basis transforms contravariantly. Let ##N## be the matrix such that ##e^i{}' =N^i_j e^j##. We have
$$\delta^i_j =e^i{}'(e_j{}')=N^i_k e^k (M_j^l e_l) = N^i_k M_j^l e^k{}(e_l{}) =N^i_k M_j^l \delta^k_l =N^i_k M_j^k =(NM)^i_j.$$ This implies that ##N=M^{-1}##. So we have
$$e^i{}' =(M^{-1})^i_j e^j.$$ Now we can easily see that an n-tuple of components of an arbitrary ##f\in V^*## transforms covariantly. We can prove it in a way that's very similar to how we determined the transformation properties of the n-tuple of components of ##v##, but the simplest way is to use the formula ##f_i=f(e_i)##, which I left as an easy exercise in post #11.
$$f_i{}' =f(e_i{}')=f(M_i^j e_j) =M_i^j f(e_j)= M_i^j f_j.$$ Note that what's "transforming" under a change of ordered basis in these examples are n-tuples of real numbers or n-tuples of vectors (in ##V## or ##V^*##). In the case of a tensor of type ##(k,l)##, what's transforming isn't the tensor, but its ##n^{k+l}##-tuple of components with respect to the ordered basis ##(e_i)_{i=1}^n##.
Of course, one can take the point of view that these ##n##-tuples or ##n^{k+l}##-tuples are the tensors, or rather, that the function that associates tuples with ordered bases is what should be called a tensor. I'm not a fan of that view myself. I consider it inferior and obsolete. However, there isn't anything fundamentally wrong with it. The real problem is that it's so hard to find an explanation of this view that isn't unbelievably bad.