In linear algebra, you studied linear operators on vector spaces. These were functions that mapped vectors to vectors in a linear way, specified by the fact that if u and v were any two vectors in V and r was any scalar, then T is linear if T(ru + v) = rT(u) + T(v).
A physical example of a linear operator is rotation. If we rotate the resultant of two vectors (by drawing a parallelogram), we expect the result to be the same as if we rotated each individual vector by the same amount and then found the resultant.
A more general idea is that of a linear map from a vector space to the set of real numbers. Some real numbers we associate with geometric objects that are useful are volumes and areas. For the area of a parallelogram spanned by two vectors, it can be shown that if A(u, v) is the function giving us the area, it must be linear in u and linear in v separately. That is, A(ru + v, w) = rA(u, w) + A(v, w) and A(u, rv + w) = rA(u, v) + A(u, w). A is then called a multilinear map, which is shortened to tensor.
It should be apparent from linear algebra that since A is linear in each vector argument, once we choose a coordinate basis, we can write components for A, and these components will differ in different coordinate systems.
In particular, if we choose (1, 0) = e1 and (0, 1) = e2 as a basis with Cartesian coordinates, then A((x1, y1), (x2, y2)) = A(x1e1 + y1e1, x2e1 + y2e2) = x1A(e1, x2e1 + y2e2) + y1A(e2, x2e1 + y2e2) = x1x2A(e1, e1) + x1y2A(e1, e2) + y1x2A(e2, e1) + y1y2A(e2, e2).
If we reasonably define the 2-dimensional area between a vector and itself to be 0, then we get A(e1, e2) + A(e2, e1) = A(e1 + e2, e1 + e2) = 0, which means A(e1, e2) = -A(e2, e1).
By choosing to represent area with a tensor, we are forced into contemplating negative areas, which correspond to changing the order of the vectors. This is useful in determining the orientation of the vectors with respect to a chosen coordinate system.
In our chosen system, if we choose units such that A(e1, e2) = 1, we have all 4 components for A: A(e1, e1) = 0, A(e1, e2) = 1, A(e2, e1) = -1, and A(e2, e2) = 0. This can be listed more concisely as A11 = 0, A12 = 1, A21 = -1, and A22 = 0 if we wish to put A in matrix form.
With a volume tensor in 3 dimensions, we use 3 vectors, and thus have 3 indexes instead of 2, one index per vector. This, and other tensors that take more than two vectors, cannot be put into a flat 2-dimensional matrix form for this reason, and are best dealt with by their index form, which represents dealing with their components when put into a specific coordinate system.
When we move the parallelogram around the plane, attaching the vector tails to points other than the origin, the area of the square does not change. For more general spaces, it is not hard to see that on a curved surface, a parallelogram attached to each point may actually have a different area, and thus an area tensor field should assign a different area tensor to each point.
To specify A as an object (ie., area of parallelogram spanned by two vectors) independent of its target's location in the plane, it is thus best to use a tensor field: a function which assigns a tensor A to each point in the plane. For Cartesian coordinates, the tensor field A(p) is a constant: it returns the same tensor A with the same components at each point p. Physicists tend to blur the distinction between tensor fields and tensors, which can confuse the layperson consulting non-physics oriented mathematical texts.
Talking about parallelograms changing area when attached to different points requires us to have a concept of "the same vectors attached to different points", which is where we get the concept of tangent space to a point (which is what tensors act upon) and the machinery of parallel transport to say how the vectors in the tangent spaces at different points are connected. Flat Euclidean space has a flat connection: the components of a vector attached to (in the tangent space of) a point remains the same if we parallel transport it to (the tangent space of) a different point.
There is a bit more going on. Ie., raising and lowering indexes has to do with the dual tangent space, or covectors, and so on, and there are many tensors where you do not want to actually get the real number by giving it the proper number of vectors, but use them to transform vectors from one space to another, since in general, a tensor need not act on two vectors in the same vector space; it may act on many vectors in many different vector spaces. The important thing to remember is that a tensor is just a multilinear map, and thus once you know linear algebra, you already have a good understanding of tensor algebra.
One useful tensor is a tensor that tells us how a vector changes in a curved space when we move it around a small parallelogram using parallel transport connections for that space's curvature rules back to its original tangent space. For example, if you move a tangent vector around a small parallelogram on the surface of a sphere, maintaining its angle with each curve representing each side of the parallelogram on the sphere, the vector no longer points in the same direction when it goes back to its starting point. That would take 3 vectors: the original vector and the two vectors defining the parallelogram, and give us the resulting vector, which may be different from the original vector if the space is not flat. This particular tensor is the Riemann curvature tensor that you will be using a bit.
Any good text on differential geometry or general relativity should give you a more than adequate introduction to tensor analysis, which is a bit more involved, and in far more detail than can be reasonably typed in this post. I would recommend Spivak's texts for a complete view, but I also like "Differential Forms and Connections", although it may be a bit too pure mathematical. I've found Schutz's "A First Course in General Relativity" to have a great section on tensors and motivates the Riemann tensor and connections well.