# Tensor product of vector spaces

1. Dec 2, 2009

### Fredrik

Staff Emeritus
I still don't fully understand the explicit construction of the tensor product space of two vector spaces, in spite of the efforts by several competent posters in another thread about 1.5 years ago. I'm hoping someone can provide the missing pieces. First, a summary of the things I think do understand: (Let me know if I have misunderstood something).

A bilinear function $\tau:X\times Y\rightarrow Z$, where X,Y and Z are vector spaces, is said to be a tensor product if, for each bilinear function $\sigma:X\times Y\rightarrow W$, where W is a vector space, there's a unique linear function $\sigma_*:Z\rightarrow W$ such that $\sigma=\sigma_*\circ\tau$.

If $\tau:X\times Y\rightarrow Z$ is a tensor product, we use the notation $\tau(x,y)=x\otimes y$, and also $Z=X\otimes Y$.

The standard way to prove that the tensor product of any two vector spaces X and Y exists uses the concept "free vector space", so I'll explain that next. Let S be a set and F(S) the set of functions from S into $\mathbb R$ with finite support (i.e. each function takes the value 0 at all but a finite number of points in S). Define the sum of two such functions, and multiplication by a real number, in the usual way:

$(u+v)(x)=u(x)+v(x)$
$(kv)(x)=k(v(x))$

These definitions turn F(S) into a vector space. For each x in S, we define ex in F(S) by

$$e_x(y)=\delta_{xy}$$

This set of functions is now a basis for F(S), since any v in F(S) can be expressed as

$$v=\sum_{x\in supp\ v}v(x)e_x$$

where supp v is the support of v, i.e. the set of points where the function has non-zero values. (The support is defined as the closure of that set, but the closure of a finite set F is just F).

We're going to pick a subspace $H$ of $F(X\times Y)$ and then define the tensor product space as

$$X\otimes Y=\frac{F(X\times Y)}{H}$$

To understand this, we need to understand what V/U means when V is a vector space and U is a subspace of V. (What follows here is just my guess about how this is usually done). I have recently studied group actions, so the easiest way for me to do this is to define V/U as the set of orbits in V of the right action $\rho:V\times U\rightarrow V$of U on V defined by $\rho(v,u)=v+u$. The orbit corresponding to $v\in V$ is then defined as

$$\mathcal O_v=v+U=\{v+u|u\in U\}$$

Each member of V belongs to exactly one orbit. An alternative notation for the orbit $\mathcal O_v$ is [v]. V/U is defined as the set of orbits:

$$\frac V U=\{\mathcal O_v|v\in V\}=\{[v]|v\in V\}$$

The vector space structure on that set is defined by

$$[x]+[y]=[x+y]$$
$$k[x]=[kx]$$

OK, we're ready to choose the set H. We take it to be the subspace of $F(X\times Y)$ that's spanned by all vectors of the following forms:

$$ae_{(x,y)}-e_{(ax,y)}$$
$$ae_{(x,y)}-e_{(x,ay)}$$
$$e_{(x,y)}+e_{(x',y)}-e_{(x+x',y)}$$
$$e_{(x,y)}+e_{(x,y')}-e_{(x,y+y')}$$

Note that there's one of each of these vectors for each choice of a,x,x',y,y'.

Since $F(X\times Y)/H$ is supposed to be the tensor product space, we'll write the orbit that $e_{(x,y)}$ belongs to as $x\otimes y$ instead of as $[e_{(x,y)}]$.

What I would like someone else to do is to tell me if I'm wrong about anything of what I've said so far, and if I'm not, show me (and others) how to use the above to prove that we have actually constructed a tensor product. For example, how do we prove that

$$k(x\otimes y)=kx\otimes y=x\otimes ky$$

or that

$$(x+y)\otimes z=x\otimes z+y\otimes z$$?

---

Moderator's note: post edited at request of author to correct the definition of tensor product. quasar987 is acknowledged for pointing out the error.
It may be necessary to refresh this page in your browser, in order to see the corrected version.

Last edited by a moderator: Aug 18, 2011
2. Dec 3, 2009

### Fredrik

Staff Emeritus
I think I got it. To prove that $k(x\otimes y)=kx\otimes y$, we start by noting that

$$k(x\otimes y)=k[e_{(x,y)}]=[ke_{(x,y)}]=ke_{(x,y)}+H$$

and

$$(kx\otimes y)=[e_{(kx,y)}]=e_{(kx,y)}+H$$

So all we have to do is to prove that

$$ke_{(x,y)}-e_{(kx,y)}\in H$$

but this is obvious since we defined H to be the vector space that contains all vectors of that form, along with vectors of the forms we need to prove the other identities.

3. Dec 3, 2009

### quasar987

Is it possible you mixed things up in the definition:

I think this should read

(I put the changes in bold)

I'm not an expert in the tensor product but everything you've said thus far seems true to me.

Last edited by a moderator: Aug 18, 2011
4. Dec 3, 2009

### quasar987

Just a remark... You defined a tensor product as [...], then you constructed one as the quotient map (XxY)-->(XxY)/H for some subspace H. So it exists... but it is also worthwhile to note that it is unique (up to isomorphism)! This can be deduced very easily from the definition you gave. The definition you used for a tensor product is an instance of what is called in category theory a universal property. Every time an object is defined through a universal property or is shown to satisfy a universal property, then it is unique (up to an isomorphism in the relevant category).

5. Dec 3, 2009

### quasar987

As I said, the summary of the "tensor product theory" you gave is very correct, but it is also very formal. I think it is also important to have an informal idea of the construction.

Here is how I understand the tensor product informally.

First of all, I think it is never a good idea to define an object through a universal property. I prefer to see an explicit construction, and then see the universal property stated in a theorem as an interesting property of the construction.

So by THE tensor product of X and Y I mean the construction (XxY)/H you just made.

Ok, so what is this thing? To me, $X\otimes Y$ is just the the vector space of the "formal products" $x\otimes y$ where the "formal product operation" $\otimes$ is bilinear : $a(x\otimes y)=ax\otimes y=x\otimes ay$, $x\otimes (y+z)=x\otimes y + x\otimes z$, $(x+y)\otimes z=x\otimes z + y\otimes z$.

How is it constructed?

-First of all, note that the free (real) vector space over the set S can be seen as the vector space whose elements are the formal linear combinations of element of S. Indeed, just note the map f(s) as
$$\sum_{i=1}^Na_is_i$$
where s_i are the elements of S for which f does not vanish and a_i=f(s_i).
This is a much less abstract view of F(S) in my opinion! (Just as it is much less abstract to think of the cartesian product AxB of two sets A and B as the set of all "ordered pairs" (a,b) rather than as the set of maps f:{1,2}-->AuB where f(1) is in A and f(2) is in B.)

-Secondly, concerning the quotient space V/U. This is to be regarded as "the vector space V, where every element of U has been transformed into the null vector 0". Indeed, notice that in V/U, for every u in U, = [0]... and two elements [v_1] [v_2] are equal if and only if v_1 and v_2 differ by an element of U: v_1 = v_2 + u for some u in U. (whereas normally, v_1 = v_2 + w iff w=0. Now any w in U does the trick.)

-Thirdly, notice that under the above interpretation (presentation) of the free vector space over a set, F(XxY) is to be regarded as the vector space of all formale linear combinations of pairs (x,y). And the elements you wrote as
$$ae_{(x,y)}-e_{(ax,y)}$$
$$ae_{(x,y)}-e_{(x,ay)}$$
$$e_{(x,y)}+e_{(x',y)}-e_{(x+x',y)}$$
$$e_{(x,y)}+e_{(x,y')}-e_{(x,y+y')}$$
take the following simpler form:
$$a(x,y)-(ax,y)$$
$$a(x,y)-(ax,y)$$
$$(x,y)+(x',y)-(x+x',y)$$
$$(x,y)+(x,y')-(x,y+y')$$
So what is, in laymen's terms, the subspace H generated by these elements? Well it is the smallest subspace of F(XxY) containing these elements. And so the quotient F(XxY)/H is the largest quotient of F(XxY) in which the elements of the forms
$$a(x,y)-(ax,y)$$
$$a(x,y)-(ax,y)$$
$$(x,y)+(x',y)-(x+x',y)$$
$$(x,y)+(x,y')-(x,y+y')$$
have been collapsed to zero. But what does it means for instance for a(x,y)-(ax,y) to have been collapsed to zero? It means that a(x,y) = (ax,y). And similarly, a(x,y)-(ax,y)=0 means a(x,y)=(ax,y), (x,y)+(x',y)-(x+x',y) = 0 means (x,y)+(x',y)=(x+x',y) and (x,y)+(x,y')-(x,y+y')=0 means (x,y)+(x,y')=(x,y+y').

So we see that the seemingly complex construction leading to the tensor product is actually a very natural way to create out of X and Y the space of (formal linear combinations of) pairs (x,y) obeying the "bilinearity relations"
$$a(x,y)=(ax,y)$$
$$a(x,y)=(ax,y)$$
$$(x,y)+(x',y)=(x+x',y)$$
$$(x,y)+(x,y')=(x,y+y')$$
And we note $x\otimes y$ the class of (x,y) under these relations.

6. Dec 3, 2009

### Fredrik

Staff Emeritus
Thanks. I checked two of my books and they both agree with you. I thought my f should be an isomorphism (and that because of that it didn't matter if I defined f from W into Z or from Z into W). But we don't need f to be an isomorphism to guarantee that the codomains of two tensor products are vector space isomorphic to each other. George Jones posted a proof of that here.

7. Dec 3, 2009

### Fredrik

Staff Emeritus
Isn't that just a matter of taste? I usually prefer the other option. For example, I like to define the real numbers as a complete* ordered field (any complete ordered field will do), and I think of Dedekind cuts and that stuff about equivalence classes of Cauchy sequences as "constructions" rather than as definitions. Why? Because the alternative gives one construction a higher status than the others even though they're all equally useful.

*) Here "complete" means that every set that's bounded from above has a least upper bound. I haven't really thought about if that's equivalent to the other kind of "complete" (all Cauchy sequences convergent).

When I encountered expressions such as "formal linear combinations" in the past, it always bothered me a lot. Everything else is defined in terms of sets, and suddenly we're using something well-defined to define...a string of text?! And why? Just so that we can pretend that we can add things that can't be added!? (This is just how I felt years ago. Now I have a better understanding of what these things mean).

So I think that if we're going to say that we want to define something that behaves like a "linear combination"

$$a_1s_1+\cdots+a_ns_n$$

we should immediately point out that this expression doesn't exist, since no sum is defined on S, and that the only way to get something that behaves the way we want is to replace the $s_k$ above with $\phi(s_k)$ where $\phi$ is a function from S into a vector space.

$$a_1\phi(s_1)+\cdots+a_n\phi(s_n)$$

Then we can start talking about how to construct that vector space.

Last edited: Dec 3, 2009
8. Dec 4, 2009

### Fredrik

Staff Emeritus
Just adding some stuff to make this page more complete. First, I'll correct my definition of tensor product:

A bilinear function $\tau:X\times Y\rightarrow Z$, where X,Y and Z are vector spaces, is said to be a tensor product if, for each bilinear function $\sigma:X\times Y\rightarrow W$, where W is a vector space, there's a unique linear function $\sigma_*:Z\rightarrow W$ such that $\sigma=\sigma_*\circ\tau$.

The next thing I want to do is to prove that the codomains of two tensor products are isomorphic vector spaces. This is just the proof George Jones showed me, but simplified a little bit, and expressed in the notation used in this thread. First a reminder about homomorphisms and isomorphisms. A homomorphism is a structure preserving map. For vector spaces that means that it's a linear function. A homomorphism $f:Z\rightarrow W$ is an isomorphism if there exists a homomorphism $g:W\rightarrow Z$ such that $f\circ g=id_W$ and $g\circ f=id_Z$.

The definition of a tensor product implies that $id_Z$ (the identity map on Z, defined by $id_Z(z)=z$ for all z in Z) is the only function that can be composed with $\tau$ without changing it. What I mean by that is that we obviously have

$$id_Z\circ\tau=\tau$$

but if we replace $id_Z$ in this equation with any other function, it will no longer be valid. To see this, just take W=Z and $\sigma=\tau$ in the definition of a tensor product.

Now suppose that $\tau:X\times Y\rightarrow Z$ and $\sigma:X\times Y\rightarrow W$ are both tensor products. The definition implies that there exist unique linear functions (i.e. vector space homomorphisms) $\sigma_*:Z\rightarrow W$ and $\tau_*:W\rightarrow Z$ such that

$$\sigma=\sigma_*\circ\tau$$

$$\tau=\tau_*\circ\sigma$$

When we combine these two equations we get

$$\sigma=\sigma_*\circ\tau_*\circ\sigma$$

$$\tau=\tau_*\circ\sigma_*\circ\tau$$

and combined with the result we obtained earlier, this implies that

$$\sigma_*\circ\tau_*=id_Z$$

$$\tau_*\circ\sigma_*=id_W$$

which means that $\tau_*$ and $\sigma_*$ are both isomorphisms.

Last edited by a moderator: Aug 18, 2011
9. Dec 4, 2009

### Fredrik

Staff Emeritus
If X and Y are inner product spaces, we also need to define the inner product on $X\otimes Y$. The definition is

$$\langle x\otimes y,x'\otimes y'\rangle_{X\otimes Y}=\langle x,x'\rangle_X\langle y,y'\rangle_Y$$

It's easy to verify that this satisfies the definition of an inner product.

If X and Y are Hilbert spaces, we're still not done with the explicit construction of $X\otimes Y$. The construction in #1 along with the above definition of an inner product gives us an inner product space, but it may not be complete. So we have to go through the process of completion to finally end up with a Hilbert space.

This raises another question, which I don't immediately see the answer to. Why can we still use the notation $x\otimes y$? I mean, this is supposed to be the equivalence class $[e_{(x,y)}]$, but when X and Y are Hilbert spaces, the members of $X\otimes Y$ are equivalence classes of Cauchy sequences of such equivalence classes, so to continue using that notation seems to make as much sense as it would to write real numbers as n/m where n and m are integers.

Edit: OK, here's my attempt to answer that question. When we have completed the inner product space to get a Hilbert space, we're no longer dealing with objects of the form $x\otimes y$ (as defined above). Instead we're dealing with equivalence classes of Cauchy sequences of such objects. But suppose we use the notation $x\otimes y$ also for the equivalence class that contains the constant sequence $x\otimes y, x\otimes y, x\otimes y,\dots$. Most members of $X\otimes Y$ still can't really be expressed as $x\otimes y$, but we don't care, because the set of equivalence classes that contain constant sequences is dense in $X\otimes Y$, and that means that any member of $X\otimes Y$ can be approximated to arbitrary precision by a member that can be expressed as $x\otimes y$.

Last edited: Dec 4, 2009
10. Dec 5, 2009

### Landau

Not really equivalent. This kind of completeness is a notion about partial ordered sets, which in general have little to do with metric spaces.

About the construction of the tensor product, I like the explanation in Roman's book Advanced Linear Algebra, see here. He discusses several definitions/constructions of the tensor product. The whole book is great, btw.
Wel, yes, but that's why the word 'formal' is always used, and the word 'free' in 'free vector space' also reminds the reader of this fact.

Let S be a set. We want to give meaning to an expression like $$F=\sum_{i=1}^n a_ix_i$$, where $$x_i\in S$$, and $$a_i\in\mathbb{R}$$. Such a formal linear combination of elements of S is just a function $$F:S\to\mathbb{R}$$ such that $$F(s)=0$$ for all but finitely many s in S. The set $$\mathbb{R}\left<S\right>$$ of such functions (= the set of all formal linear combinations) then becomes a vector space over $$\mathbb{R}$$ under pointwise addition and scalar multiplication. Now identify $$s\in S$$ with the function $$F_s\in \mathbb{R}\left<S\right>$$ defined by $$F_s(s)=1$$ and $$F_s(x)=0$$ for $$x\neq s$$.

\\edit: I apologize, my last paragraph is redundant since this is already discussed in this thread. I should read more carefully.

Last edited: Dec 5, 2009
11. Dec 31, 2009

### Fredrik

Staff Emeritus
Just adding some stuff about the tensor product of linear operators. The standard definition is

$$A\otimes B(x\otimes y)=A(x)B(y)$$

Edit: No, it's not. See #13.

It's very easy to see that the map $(A,B)\mapsto A\otimes B$ is bilinear, and that implies (see the posts above) that this qualifies as a tensor product. (The vector space of such operators is isomorphic to the one we can construct explicitly in the way described in the existence proof above).

This definition is used even when the codomain isn't the same vector space as the domain. In particular, we use it for linear functionals, including bras.

Last edited: Dec 31, 2009
12. Dec 31, 2009

### quasar987

How does that work when the codomain is anything else than the base field? How is A(x)B(y) defined in that case?

Here is how I understand this chapter of the theory.

Take V,W two vector spaces over a field $\mathbb{F}$. Then a simple way of realizing the tensor product of V* with W* is as the vector space Z of all the maps $f\otimes g:V\times W\rightarrow \mathbb{F}:(v,w)\mapsto f(v)g(w)$ for f in V* and g in W* with addition and scalar multiplication defined in the obvious way. The bilinear map $\tau:V\times W\rightarrow Z$ appearing in the definition of the tensor product being just $\tau(\sum_ia_if_i,\sum_jb_jg_j)=\sum_{i,j}a_ib_jf_i\otimes g_j$. It is easily verified that the universal property characterizing tensor products is indeed verified for this pair (Z,$\tau$).

And then, the fun thing is that because of the isomorphism $V\cong V^{**}=(V^*)^*$, this yields another realization of the tensor product of any two vector spaces V,W over $\mathbb{F}$, and not just dual vector spaces. Namely,

$$V\otimes W= V^{**}\otimes W^{**}=\left\{\sum_ia_i\eta_i\otimes\xi_i:a_i\in \mathbb{F}, \ \eta_i\in V^{**}, \ \xi\in W^{**}\right\}$$

Last edited by a moderator: Aug 18, 2011
13. Dec 31, 2009

### Fredrik

Staff Emeritus
D'oh, I need to stop posting just before I go to bed. Yes, the right-hand side of the definition in #11 only makes sense if A(x) and B(y) are members of the field, or at least some structure with a bilinear multiplication operation, like an algebra. It certainly doesn't make sense if they are members of an arbitrary vector space. What I should have written instead of $A\otimes B(x\otimes y)=A(x)B(y)$ is

$$A\otimes B(x\otimes y)=A(x)\otimes B(y)$$

That's how it's defined in QM, but usually with the vectors written in the ket notation

$$A\otimes B\big(|\psi\rangle\otimes|\phi\rangle\big)=A|\psi \rangle \otimes B|\phi\rangle$$

I should also have explained that my point was that this particular use of the symbol $\otimes$ satisfies the definition of a tensor product. Thanks for drawing my attention to this, and for the additional information.

Last edited by a moderator: Aug 18, 2011