# What Is a Tensor?

So without any trouble we have already found that a number is

- An element of a field, e.g. ##\mathbb{R}##.
- A scalar.
- A coordinate.
- A component.
- A transformation of other numbers, an element of a dual vector space, e.g. ##\mathbb{R}^*##
- A matrix.
- A vector.

As you might have noticed, we can easily generalize these properties to higher dimensions, i.e. arrays of numbers, which we usually call vectors and matrices. I could as well have asked: What is a vector, what a matrix? We would have found even more answers, as matrices can be used to solve systems of linear equations, some form matrix groups, others play an important role in calculus as Jacobi matrices, and again others are number schemes in stochastic. In the end they are only two dimensional arrays of numbers in rectangular shape. A number is even the one dimensional special case of the two dimensional array matrix.

This sums up the difficulties when we ask: What is a tensor? Depending on whom you ask, how many room and time there is for an answer, where the emphases lie or what you want to use them for, the answers may vary significantly. In the end they are only multi-dimensional arrays of numbers in rectangular shape.^{1)} ^{2)}.

\begin{equation*}

\begin{aligned}

\label{Ex-I}

\begin{bmatrix}

2

\end{bmatrix}

&\;

&\begin{bmatrix}

1\\1

\end{bmatrix}

&\;

&\begin{bmatrix}

0& -1\\1&0

\end{bmatrix}

&\;

\\

scalar &\; & vector &\; & matrix

\\

-&\; & (1,0)\; tensor &\; & (2,0)\; tensor

\end{aligned}

\end{equation*}

##(3,0)\; tensor##

Many students are used to deal with scalars (numbers, mass), vectors (arrows, force) and matrices (linear equations, Jacobi-matrix, linear transformations, covariances). The concept of tensors, however, is often new to them at the beginning of their study of physics. Unfortunately they are as important in physics as scalars, vectors and matrices are. The good news is, they aren’t any more difficult than the former. They only have more coordinates. This might seem to go to the expense of clarity, but there are methods to deal with it. E.g. a vector also has more coordinates than a scalar. The only difference is, that we can sketch an arrow, whereas sketching an object defined by a cube of numbers is impossible. And as we can do more with matrices, than we can do with scalars, we can do even more with tensors, because a cube of numbers, or even higher dimensional arrays of numbers, can represent a lot more than simple scalars and matrices can. Furthermore, scalars, vectors and matrices are also tensors. This is already the entire secret about tensors. Everything beyond this point are methods, examples and language, in order to prepare for how tensors can be used to investigate certain objects.

### Definitions

As variable as the concept of tensors is as variable are possible definitions. In coordinates a tensor is a multi-dimensional, rectangular scheme of numbers: a single number as a scalar, an array as a vector, a matrix as a linear function, a cube as a bilinear algorithm and so on. All of them are tensors, as a scalar is a special case of a matrix, all these are special cases of a tensor. The most abstract formulation is: A tensor ##\otimes_\mathbb{F}## is a binary covariant functor that represents a solution for a co-universal mapping problem on the category of vector spaces over a field ##\mathbb{F}\, [3].## It is a long way from a scheme of numbers to this categorial definition. To be of practical use, the truth lies – as so often – in between. Numbers don’t mean anything without basis, and categorial terms are useless in everyday’s business where coordinates are dominant.

**Definition:** A **tensor product** of vector spaces ##U \otimes V## is a vector space structure on the Cartesian product ##U \times V## that satisfies

\begin{equation}\label{Tensor Product}

\begin{aligned}

(u+u’)\otimes v &= u \otimes v + u’ \otimes v\\

u \otimes (v + v’) &= u \otimes v + u \otimes v’\\

\lambda (u\otimes v) &= (\lambda u) \otimes v = u \otimes (\lambda v)

\end{aligned}

\end{equation}

This means a tensor product is a freely generated vector space of all pairs ##(u,v)## that satisfies some additional conditions such as linearity in each argument, i.e. bilinearity, which justifies the name product. Tensors form a vector space as matrices do. The tensor product, however, must not be confused with the direct sum ##U \oplus V## which is of dimension ##\operatorname{dim} U +\operatorname{dim} V## as a basis would be ##\{(u_i,0)\, , \,(0 , v_i)\}##, whereas in a tensor product ##U \otimes V##, all basis vectors ##(u_i,v_j)## are linearly independent and we get the dimension ##\operatorname{dim} U \cdot \operatorname{dim} V##. Tensors can be added and multiplied by scalars. A tensor product is not commutative even if both vector spaces are the same. Now obviously it can be iterated and the vector spaces could as well be dual spaces or algebras. In physics tensors are often a mixture of several vector spaces and several dual spaces. It also makes sense to sort both kinds as the tensor product isn’t commutative.

**Definition:** A **tensor ##T_q^p## of type ##(p,q)## of ##V## with ##p## contravariant and ##q## covariant components** is an element (vector) of

$$

\mathcal{T}(V^p;V^{*}_q) = \underbrace{ V\otimes\ldots\otimes V}_{p- times}\otimes\underbrace{ V^*\otimes\ldots\otimes V^*}_{q-times}

$$

By (1) this tensor is linear in all its components.

### Examples

From a mathematical point of view it doesn’t matter whether a vector space ##V## or its dual ##V^*## of linear functionals is considered. Both are vector spaces and a tensor product in this context is defined for vector spaces. So we can simply say

- A tensor of rank ##0## is a scalar: ##T^0 \in \mathbb{R}##.
- A tensor of rank ##1## is a vector: ##T^1 = \sum u_i##.
- A tensor of rank ##2## is a matrix: ##T^2 = \sum u_i \otimes v_i##.
- A tensor of rank ##3## is a cube: ##T^3 = \sum u_i \otimes v_i \otimes w_i##.
- A tensor of rank ##4## is a ##4##-cube and we run out of terms for them: ##T^4 = \sum u_i \otimes v_i \otimes w_i \otimes z_i##.
- ##\ldots## etc.

If we build ##u \otimes v## in coordinates we get a matrix. Say ##u = (u_1,\ldots ,u_m)^\tau## and ##v = (v_1,\ldots , v_n)^\tau##. Then

$$

u \otimes v = u \cdot v^\tau = \begin{bmatrix} u_1v_1& u_1v_2 & u_1v_3 & \ldots & u_1v_n\\

u_2v_1& u_2v_2 & u_2v_3 & \ldots & u_2v_n\\

u_3v_1& u_3v_2 & u_3v_3 & \ldots & u_3v_n\\

\vdots & \vdots & \vdots & \ldots & \vdots \\

u_nv_1& u_nv_2 & u_nv_3 & \ldots & u_nv_n

\end{bmatrix}

$$

Note that this is the usual matrix multiplication, row times column. But here the first factor are ##m## rows of length ##1## and the second factor are ##n## columns of length ##1##. It also means that this matrix is a matrix of rank one, since it consists of different multiples of a single row vector ##v^\tau##. To write an arbitrary ##n \times n## matrix ##A## as a tensor, we need ##n## of those dyadic tensors, i.e.

$$

A = \sum_{i=1}^n u_i \otimes v_i

$$

A generic “cube” ##u \otimes v \otimes w## will get us a “rank ##1## cube” as different multiples of a rank ##1## matrix stacked to a cube. An arbitrary “cube” would be a sum of these. And this procedure isn’t bounded by dimensions. We can go on and on. Only thing is, that already “cube” was a bit of a crutch to describe a three dimensional array of numbers and we run out of words other than tensor. A four dimensional version (tensor) could be viewed as the tensor product of two matrices, which themselves are tensor products of two vectors and always sums of them.

Let us consider now arbitrary ##2 \times 2## matrices ##M## and order their entries such that we can consider them as vectors, because ##\mathbb{M}(2,2)## is a vector space. Say ##(M_{11},M_{12},M_{21},M_{22})##. Then we get in

\begin{equation*} \begin{aligned}

M \cdot N &= \begin{bmatrix} M_{11}&M_{12} \\ M_{21}&M_{22}

\end{bmatrix} \cdot \begin{bmatrix} N_{11}&N_{12} \\ N_{21}&N_{22} \end{bmatrix} \\ & \\ &= \begin{bmatrix}M_{11}\cdot N_{11}+M_{12} \cdot N_{21} &M_{11} \cdot N_{12}+M_{12} \cdot N_{22} \\ M_{21}\cdot N_{11}+M_{22} \cdot N_{21} & M_{21} \cdot N_{12}+M_{22} \cdot N_{22} \end{bmatrix}\\ & \\

&= (\sum_{\mu =1}^{7} u_{\mu }^* \otimes v_{\mu }^* \otimes W_{\mu })(M,N) = \sum_{\mu =1}^{7} u_{\mu}^*(M) \cdot v_{\mu}^*(N) \cdot W_{\mu}\\ & \\ &= ( \begin{bmatrix} 1\\0 \\0 \\1

\end{bmatrix}\otimes \begin{bmatrix} 1\\0 \\0 \\1

\end{bmatrix} \otimes \begin{bmatrix} 1\\0 \\0 \\1

\end{bmatrix} + \begin{bmatrix} 0\\0 \\1 \\1

\end{bmatrix}\otimes \begin{bmatrix} 1\\0 \\0 \\0

\end{bmatrix} \otimes \begin{bmatrix} 0\\0 \\1 \\-1

\end{bmatrix}\\ & \\ & + \begin{bmatrix} 1\\0 \\0 \\0

\end{bmatrix}\otimes \begin{bmatrix} 0\\1 \\0 \\-1

\end{bmatrix} \otimes \begin{bmatrix} 0\\1 \\0 \\1

\end{bmatrix} + \begin{bmatrix} 0\\0 \\0 \\1

\end{bmatrix}\otimes \begin{bmatrix}-1\\0 \\1 \\0

\end{bmatrix} \otimes \begin{bmatrix} 1\\0 \\1 \\0

\end{bmatrix}\\ & \\ &+ \begin{bmatrix} 1\\1 \\0 \\0

\end{bmatrix}\otimes \begin{bmatrix} 0\\0 \\0 \\1

\end{bmatrix} \otimes \begin{bmatrix}-1\\1 \\0 \\0

\end{bmatrix} + \begin{bmatrix}-1\\0 \\1 \\0

\end{bmatrix}\otimes \begin{bmatrix} 1\\1 \\0 \\0

\end{bmatrix} \otimes \begin{bmatrix} 0\\0 \\0 \\1

\end{bmatrix}\\ & \\ &+ \begin{bmatrix} 0\\1 \\0 \\-1

\end{bmatrix}\otimes \begin{bmatrix} 0\\0 \\1 \\1

\end{bmatrix} \otimes \begin{bmatrix} 1\\0 \\0 \\0

\end{bmatrix} ).(M,N) \end{aligned} \end{equation*}

a matrix multiplication of ##2 \times 2## matrices which only needs seven generic multiplications ##u_{\mu}^*(M) \cdot v_{\mu}^*(N) ## to the expense of more additions. This (bilinear) algorithm is from Volker Strassen ##[1]##. It reduced the “matrix exponent” from ##3## to ##\log_2 7 = 2.807## which means matrix multiplication can be done with ##n^{2.807}## essential multiplications instead of the obvious ##n^3## by simply multiplying rows and columns. The current record holder is an algorithm from François Le Gall (2014) with an upper bound of ##O(n^{2.3728639}) [5]##. For sake of completeness let me add, these numbers are true for large ##n## and they start with different values of ##n##. For ##n=2## Strassen’s algorithm is already optimal. One cannot use less than seven multiplications to calculate the product of two ##2 \times 2 ## matrices. For larger matrices, however, there are algorithms with less multiplications. Whether these algorithms can be called efficient or useful is a different discussion. I once have been told that Strassen’s algorithm had been used in cockpit software, but I’m not sure if this is true.

This example shall demonstrate the following points:

- The actual presentation of tensors depends on the choice of basis as well as it does for vectors and matrices.
- Strassen’s algorithm is an easy example on how tensors can be used as mappings to describe certain objects. The set of all those algorithms for this matrix multiplication form in fact an algebraic variety, i.e. a geometrical object.
- Tensors can be used for various applications, not necessarily only in mathematics and physics, here computer science.
- The obvious, here “Matrix multiplication of ##2 \times 2## matrices needs ##8## generic multiplications.” isn’t necessarily the truth. Strassen saved one.
- A tensor itself is a linear combination of let’s say generic tensors of the form ##v_1 \otimes \ldots \otimes v_m##. In case ##m=1## one doesn’t actually speak of tensors, but of vectors instead, although strictly speaking they would be called monads. In case ##m=2## these generic tensors are called dyads and in case ##m=3## triads.
- One cannot simplify the addition of generic tensors, it remains a formal sum. The only exception as for multilinear objects always is

$$u_1 \otimes v \otimes w + u_2 \otimes v \otimes w = (u_1 +u_2) \otimes v \otimes w $$

where all but one factors are identical in which case we know it as distributive property. - Matrix multiplication isn’t commutative. So we are not allowed the swap the ##u_\mu^*## and ##v_\mu^*## in the above example, i.e. a tensor product isn’t commutative either.
- Tensors in a given basis are number schemes. Which meaning we attach to them depends on our purpose.

However, these schemes of numbers called tensors can stand for a lot of things: transformations, algorithms, tensor algebras or tensor fields. They can be used as a construction template for Graßmann algebras, Clifford algebras or Lie algebras, because of their (co-)universal property. They occur on really many places in physics, e.g. stress-energy tensor, Cauchy stress tensor, metric tensor or curvature tensors as the Ricci tensor, just to name a view. Not bad for some numbers ordered in multidimensional cubes. This only reflects, what we’ve already experienced with matrices. As a single object, they are only some numbers in rectangular form. But we use them from solving linear equations as well as to describe the fundamental forces in our universe.

### Sources

##\underline{Footnotes:}##

1) Tensors don’t need to be of the same size in every dimension, i.e. don’t have to be built from vectors of the same dimension, so the examples below are already a special case even though the *standard case* in the sense that quadratic matrices appear *more often* than rectangular ones. ##\uparrow##

2) One might call a scalar a (0,0) tensor, but I will leave this up to the logicians. ##\uparrow##

Good point; same is the case with Tensor Contraction, i.e., it assumes/makes use of , an isomorphism.

Given a linear map between two vector spaces ##L:V →W## then ##L## determines a map of the algebra of tensor products of vectors in ##V## to the algebra of tensor products of vectors in ##W##. This is correspondence is a covariant functor. ##L## also determines a map of the algebra of tensor products of dual vectors in ##W## to the algebra of tensor products of dual vectors in ##V##. This correspondence is a contravariant functor.

One might guess that this is the reason for the terms covariant and contra-variant tensor though I do not know the history.

Yes, but one could as well say ##T_q^p(V) = underbrace{V otimes ldots otimes V}_{p-times} otimes underbrace{V^* otimes ldots otimes V^*}_{q-times}## has ##p## covariant factors ##V## and ##q## contravariant factors ##V^*## and in this source

http://www.math.tu-dresden.de/~timmerma/texte/tensoren2.pdf (see beginning of section 2.1)

it is done. So what are the reasons for one or the other? The fact which are noted first? Are the first ones always considered contravariant? As someone who tends to confuse left and right I was looking for some possibility to remember a convention, one or the other. So I'm still looking for a kind of natural, or if not possible, at least a canonical deduction.

I think I said the same thing. The covariant factors are the tensor products of the vectors, the contravariant are the tensors of the dual vectors.

I agree. This would be a natural way to look at it. However, the German Wikipedia does it the other way around and the English speaks of considering ##V## as ##V^{**}## and refers to basis transformations as the origin of terminology. I find this a bit unsatisfactory as motivation but failed to find a good reason for a different convention.

In Physics a contravariant vector is thought of as a displacement dx. In Mathematics this corresponds to a 1 form and at each point in space this is a dual vector to the vector space of tangent vectors.

In primitive terms one does not think of the tangent space as its double dual.

I don't know if this matters in terms of equating the two, but the isomorphism between ## V , V^{**} ## is not a natural one.

Is there a reason why we group together the (contra/co) variant factors? Why not have , e.g., ## T^p_q = V otimes V^{*} otimes V… ## , etc ?

One has the isomorphism between ##V## and ##V^{**}##, ##v→v^{**}##, defined by ##v^{**}(w) = w(v)##.

Yes, but AFAIK is not a natural isomorphism, meaning it is not basis-independent. I wonder to what effects/ when this makes a difference.

I learned about tensors in college – fluids or thermodynamics, maybe, I cannot recall for sure. I "sort of" got it, but later on in life, I came across this video, which I found useful.

I liked this article. I am well aware of the proof of correctness of Strassen's algorithm, but had never seen where the idea came from — nice.

It occurs to me that if people are only interested in certain properties of higher rank tensors and they don't want the object to jump off the page, they may be interested in things like Kronecker products or wedge products.

It probably should be noted that when moving from a 2-D matrix to something like a 3-D or 4-D (or n-D) tensor, is a bit like moving from 2-SAT to 3-SAT… most of the interesting things you'd want to do computationally (e.g. numerically finding eigenvalues or singular values) become NP Hard ( E.g. see: https://arxiv.org/pdf/0911.1393.pdf )

The grouping allows a far better handling. There is no advantage in mixing the factors, so why should it be done? Perhaps in case where one considers tensors of ##V = U otimes U^*##. The applications I know are all for low values of ##p,q## and it only matters how the application of a tensor is defined on another object. Formally one could even establish a bijection like the transposition of matrices. But all of this only means more work in writing without any benefits. E.g. Strassen's algorithm can equally be written as ##sum u^*otimes v^* otimes W## or ##sum W otimes u^*otimes v^*##. Only switching ##u^*,v^*## would make a difference, namely between ##Acdot B## and ##B cdot A##.

I once calculated the group of all ##(varphi^*,psi^*,chi)## with ##[X,Y]=chi([varphi(X),psi(Y)])## for all semisimple Lie algebras. Nothing interesting except that ##mathfrak{su}(2)## produced an exception – as usual. But I found a pretty interesting byproduct for non-semisimple Lie algebras. Unfortunately this excludes physics, I guess.

Thanks, but aren't there naturally-occurring tensors in which the factors are mixed? What do you then do?

Yes, interesting, isn't it? This tiny difference between ##2## and ##3## which decides, whether we're too stupid to handle those problems, or whether there is a system immanent difficulty. And lower bounds are generally hard to prove. I know that Strassen lost a bet on ##NP = P##. I've forgotten the exact year, but he thought we would have found out somewhere in the 90's. But I guess he enjoyed the journey in a balloon over the Alps anyway.

Perhaps if you consider tensor algebras of ##operatorname{Hom}(V,V^*)## or similar. I would group them pairwise in such a case: all even indexed ##V## and all odd indexed ##V^*##. This is what I really learnt about tensors: it all heavily depends on what you want to do.

Maybe we are referring to different things, but if we have a multilinear map defined on , say, ## V otimes V^{*} otimes V ## then the map would be altered, wouldn't it?

Say we have a map ##V otimes V^* otimes V = V_1 otimes V^* otimes V_2 longrightarrow W##, then it is an element of ##V_1 otimes V^* otimes V_2 otimes W## which could probably be grouped as ##V^* otimes V_1 otimes V_2 otimes W## and we have the original grouping again. I don't know of an example, where the placing of ##V^*## depends on the fact, that it is in between the copies of ##V##. As soon as algebras play a role, we factor their multiplication rules anyway. Or even better in a way such that the contravariance of ##W## is respected.

I don't know if you mentioned this, but I think another useful perspective here is that the tensor product also defines a map taking a k-linear map into a linear map ( on the tensor product ; let's stick to vector spaces over ## mathbb R ## and maps into the Reals, to keep it simple for now) , so that there is a map taking, e.g., the dot product ( as a bilinear map, i.e., k=2 ) on ## mathbb R^2 times mathbb R^2 ## into a linear map defined on ## mathbb R^2 otimes mathbb R^2 ## ( Into the Reals, in this case ), so we have a map from {## K :V_1 times V_2 times…times V_k ##} to {## L:V_1 otimes V_2 otimes…otimes V_k ##} , where K is a k-linear map and L is linear. This perspective helps me understand things better.

I listed my originally intended chapters here:

https://www.physicsforums.com/threads/what-is-a-tensor-comments.917927/#post-5788263

where universality, natural isomorphisms and what else comes to mind considering tensors would have been included, but this tended to became about 40-50 pages and I wasn't really prepared for such a long explanation … And after this debate here, I'm sure that even then there would have been some who thought I left out an essential part or described something differently from what they are used to and so on. Would have been interesting to learn more about the physical part of it, the more as a tensor to me is merely a multilinear product, which only gets interesting if a subspace is factored out. If there only wasn't these coordinate transformations and indices wherever you look. :wideeyed: