Most textbooks take multiple chapters to explain them -- I suggest Gravitation by Misner, Thorne and Wheeler as the ultimate physics book. But any textbook on differential geometry should cover it.
My attempt:
For any vector space V, you can get a dual vector space V*, by considering linear functions on V: f:V\rightarrow \mathbb{R} \in V^* iff f(aX + bY) = af(X) + bf(Y) where X,Y \in V. The vector spaces turn out to have the same dimensions, and linear functions on V* are isomorphic to V (though not in a canonical way!) We can define multiplication between an element of V and an element of V* to simply be the application of the function to the vector.
However, there is not a canonical way to take an element of V and make it into an element of V*. This means that there is more than one possible isomorphism between the spaces, and none of them are preferred in anyway. We can define a way through a metric, g:V \rightarrow V*, combined with the multiplication we defined earlier, this allows us to define an inner product on V: \langle . | . \rangle : V \cross V \rightarrow \mathbb{R}.
Now, we can introduce a basis set on V, \mathbf{e}_\mu (the superscript index does *not* denote a component!), such that a vector can be expressed as \mathbf{v} = v^\mu \mathbf{e}_\mu, where v^\mu are the components of the vector. We can also introduce a basis in V* \mathbf{\theta}^\mu, such that \mathbf{\theta}^\mu \mathbf{e}_\nu = \delta^{\mu}_{\nu}. Note that the up/down of indices are just convention -- reversing them would have no effect on the maths being done. Now, given a v in V and u in V*, we would multiply them as: \mathbf{uv} = u_\mu \mathbf{\theta}^\mu v^\nu \mathbf{e}_\nu = u_\mu v^\nu \delta_\nu^\mu = u_\mu v^\mu. Because we only ever multiply elements from V to V*, the basis elements always drop out, and we usually never write them down explicitly. Now, our metric, g, is linear in its argument. As it happens, we can therefore express it as a series of numbers \mathbf{g} = g_{\mu\nu}\mathbf{\theta}^\mu\mathbf{\theta}^\nu, such that we can create V* vectors from V vectors by doing a summation: v_\mu = g_{\mu\nu}v^\nu. Technically, it's a bit of abuse to call the covector and the vector both v -- in many textbooks they'd be differentiated as \bar{v} and \tilde{v}.
In your given case, g = \eta. As you can now see, it's pretty obvious why the contraction occurred.