# Understanding Superposition Physically and Mathematically

### Introduction

One should first and foremost understand that superposition has both a mathematical and physical meaning, or rather it originates as a mathematical property of the system description which will have implications in physics.  The mathematical concepts are fairly straightforward aspects of linear algebra which will be briefly reviewed first.

To understand the physical implications of mathematics we need to understand how representations in mathematics get translated to operationally meaningful statements of the physical phenomena we observe.  So the next step is to consider if where and how mathematical aspects of superposition connect to the logical structure of our empirically verifiable statements about physical systems.  When this is understood then one has a solid foundation to stand upon while contemplating the physical and philosophical meaning of this idea of superposition.

### Vectors and Linearity

So, what is mathematical superposition? It is simply the linearity of the system description which is to say the system is described by some abstract form of vector in a vector space. Specifically, vectors are mathematical objects which may be added together and multiplied by scalars (numbers) to form other vectors.  When taken together addition and scalar multiplication resolves as the action of taking linear combinations.   Given $X$ and $Y$ we also have as defined the object $W = aX+bY$ for real (or complex) numbers $a$ and $b$.

A linear combination of two vectors is what we mean by referring to a superposition of them. That vector $W$ is a superposition of vectors $X$ and $Y$ says simply that $W$ is a linear combination of $X$ and $Y$.  And note that where this is true and the multipliers are not zero we can likewise solve this linear equation for $X$ or for $Y$ so the more balanced statement is that each of the three is actually a superposition of the other two.  For example:  If  $W=aX+bY$ and $a\ne 0$ then $X = 1/a W -b/a Y$.

### Classical Superposition

Now in pure mathematics, we typically do not use the term “superposition”.  This term is more aptly applied to physical phenomena which lend themselves to a linear mathematical description, i.e. vectors.  As an example when considering classical fields (such as electromagnetism) which have a linear set of dynamic equations (Maxwell’s equations) then linear combinations of solutions will also be a solution.  We thus express general solutions as linear combinations of or superpositions of certain standard solutions.  We see this in the physical phenomena where the effect on a hypothetical test charge by the fields of many other charges can be determined by adding the effect due to each of the other charges as it would occur acting alone.  We then describe the immediate effect on a unit test charge as the electromagnetic field at its position and we can describe the fields of any multitude of moving charges simply by adding the fields of each individual charge.

Even if the underlying dynamics are not linear if we consider small perturbations from a stationary state we tend to see approximately linear behavior.  For example, if you apply pressure changes to say the air, it will compress or expand, change temperature, and possibly components like water vapor will condense out.  Very complex non-linear behavior.  But small short-lived perturbations of pressure will behave and propagate linearly as sound.  And you hear the superpositions of sounds constantly.  The hum of your computer fan sounds the same when you also hear someone’s voice out your window.  Your ear hears the sum of the two sounds.

Now, waves have another interesting behavior due to superposition namely interference.  You can hear interference over time in the beat frequency as you tune one guitar string relative to another fretted at the desired note.  As their frequencies grow close, the adding of their pressure waves in your ear (which are oscillating between positive and negative pressure differences) will vary between lining up positive plus positive to reinforce and lining up positive to negative to cancel.  The result is an undulation of the volume of the joint note you hear.  If you don’t have a stringed instrument handy download one of the free function generator apps onto your mobile device and play, say a 200Hz tone out of the right channel and a 201Hz tone out of the left.  You should hear a wobbling of the sound which cycles once per second.  Play with different frequencies and see what happens.

So, that is classical superposition.  In quantum superposition, we work within an abstract vector space, the Hilbert space, from the very beginning.  To understand the meaning of quantum superposition and what it implies we must first have a relatively clear understanding of the operational meaning of the Hilbert space vectors.

### Quantum vs Classical Logic

Classical logic is concretely expressed using the algebra of sets.  Our ontological model of classical reality is a set of states of reality.  Prepositions about outcomes of deterministic experiments may then be reduced to sets of states of realities that cause the respective outcomes.  Measurements reduce our set of possible states to a subset of those states consistent with the observations.  The operations on sets mirror our Boolean algebra of logical operations generated by “and“ing, “or“ing and negating our events.  But the most important algebraic element, for now, is the subset relation which expresses the operation of logical implication.

If you had, say, a three-state system (say you have a chamber partitioned into three parts and you are asking where’s the particle) with states a, b, or c then our “state space” is the set {a, b, c} and all the definite statements we can make about the system resolve as subsets.  The most specific statements (which are not a contradiction) are the singleton sets {a}, {b}, {c} but you could also consider e.g. the set {a, b} which indicates either component state or equivalently “not {c}” in this example.  At the very top of the lattice is the entire state space which translates to the logical tautology (always true) since it is always true by definition of our system that it is in one of these states.  Likewise at the very bottom is the empty set expressing the never true statement or contradiction.

This is the structure of the predicate logic for all classical physics, but also for almost all of your everyday experience when you speak and think about definite states of reality.  But of course, we are not always so certain about things.

Stochastic Descriptions:  We may further extend our descriptions of physical systems by introducing probabilistic statements.  When we account for uncertainties in our measurements, or of our knowledge of the future evolution of the system we quantify that uncertainty with probabilities.  In the classical setting, we assign a probability distribution over the lattice of events which we insist obey certain “natural” properties, namely the additive property:

$$P(A\cup B) =P(A)+P(B)+ P(A\cap B)$$

In “English” The probability of A or B equals the probability of A plus the probability of B minus the probability of A and B.  This additivity rule may not be intuitively obvious at first sight but it helps to consider the special cases.
Since the probability of A or not A is necessarily 1, and of A and not A is necessarily zero we have via letting B = not A in the above $P(A)+P(not A)=1$.  Similarly, we can chop case B into sub-cases of (B and A) vs (B and not A) whose probabilities must add to P(B).  Do that the other way around and you get:

$$P(A)+P(B)=P(A\cap not B) + P(A\cap B) + P(B\cap A)+P(B\cap not A)$$

[INSERT Venn Diagram]

Similarly, we can break up the union (or) case into parts: $P(A\cup B) = P(A\cap not B)+P(A\cap B)+P(not A \cap B)$ and we see in the earlier sum we have this plus an extra intersection/and recovering our original additivity formula.

Now, this was a circular derivation and it is only intended to show the inner workings of this additivity rule.  It is in the end simply book-keeping on the probabilities.  But, importantly, it is the book-keeping on probabilities based on the assumption that the likelihood of a deterministic measurement of any system can be calculated by summing (or integrating) a distribution function over the space of implied states.  In short, probabilities form a measure over the state space.

However, an important change of paradigm has occurred in this extension to probabilistic descriptions.  When we present a specific probability distribution as our system description we are no longer pointing out a specific state of reality.  If I give you a probability density for the position and momentum of a classical particle over a given range, the particle isn’t “spread out over space”.  It is rather in some exact state.  So the probabilistic description can’t be referring to just that one particle.  It rather is referring to a class of possible particles, or rather a mode of production of such particles such that, over many repeated instances we should see the probabilities manifest as predicted proportions in the limit of large samples.

However, we note that this stochastic description envelopes the prior logical description.  We recover the usual logical structure by restricting our descriptions to statements of absolute certainty.  We equate “$A$ is True” with $P(A)=1$ and “$A$ is False” with $P(A)=0$, and play the exact same games of logical operations… but behind the scenes, we are still working with classes of systems and their modes of production.

Quantum Logic

In quantum theory, we replace this set of states with a complex vector space called a Hilbert space.  In our three-state example, we would then have a three-dimensional complex space.  The lattice structure for quantum logic is the lattice of subspaces.  The most specific (non-contradictory) statements about the system would then correspond to the one-dimensional subspaces we identified with these axes.  We may “or” such logical statements by taking the minimal subspace containing these.  We can specify such a subspace with a set of vectors for which that subspace is the span.  This is the role of the Hilbert-space vector.   It projectively represents the state of (our sharp knowledge about) the system.  At the top of our lattice, we again have the whole space and at the bottom, we again have the “empty” zero-dimensional space containing only the zero vector.

The logic of implication which classically is modeled by the subset inclusion operation is now modeled by the subspace inclusion operation.  That “A (definitely) implies B” is expressed by the parallel relationship that the space associated with A is a subspace of the one associated with B.  We can also “and” statements by taking, as with sets, the intersection.  The origin point a.k.a. zero vector then corresponds to the contradictory statement, the logical equivalent to “2=3”.

This alone defines projective quantum mechanics however the operation of logical negation is problematic since, for example, in the three-dimensional case, there are an infinite number of two-dimensional subspaces which only intersect a given axis at the origin (and thus whose “and” results in a null statement).  The resolution of this is why we use a Hilbert space with its metric structure defined by the inner product rather than a simple linear space.  The metric defines which pairs of vectors are orthogonal and thence which subspaces are orthogonal.  The negation of the event associated with a given subspace is the orthogonal subspace.

The result is that for a pair of one-dimensional subspaces if they are orthogonal they are mutually exclusive both positively and negatively.  A implies not B and B implies not A.  If on the other hand, they are parallel then “they” are a single subspace, and equivalent A implies B, and B implies A.

But there is in-between cases where they are neither parallel nor perpendicular.  For systems where A is assured and not-A is forbidden, the event B may or may not occur, and the probability of this will be determined by the angle between the two cases.  $P(B| A \text{ is assured}) = cos^2(\theta)$.  So to interpret these in-between relationships we again must extend from a language of certainty to a stochastic description.  We must recognize that these descriptions are not of the system as it is but of our knowledge of how the system will or might behave when measured.

Now classical logic is embedded in the quantum case in that orthogonal vectors (or rather subspaces) correspond to mutually exclusive sharp outcomes and so an ortho-nomal basis, say {$\lvert a \rangle, \lvert b \rangle, \lvert c \rangle$} corresponds to a complete measurement with three corresponding outcomes.  The system has three “states” when we consider the classical logic of the three possible outcomes of this measurement.  You can think of this as a classical frame and so long as we’re only talking about these three possibilities and their combinations we can safely apply classical logic just as if it were a set of states.

But the weirdness of quantum theory comes in when we arbitrarily rotate our basis and have a wholly distinct classical frame,{$\lvert a’ \rangle, \lvert b’ \rangle, \lvert c’ \rangle$}.  Any action making a determination in one such frame precludes our simultaneously making an exact determination in the other frame.  Making the distinct measurements in succession will yield different behaviors depending on the causal sequence.  The acts of observation do not commute and we can’t make legitimate statements about circumstances in both frames at the same time.

We can however relate one frame to the other.  Definite statements in one frame will resolve as probabilistic statements in the other.  In the vector space representation, the representative vectors in one basis will be linear combinations of or superpositions of the vectors on the other basis.  To pin this down let us consider a concrete example.

Note also that this means, as some statements are necessarily probabilistic we are at all times in the more abstract mode of describing classes of systems.  Our Hilbert space vectors are not system states, they are states of our knowledge about the system, namely that it is an instance of the indicated class.

#### The Double-Slit Experiment

Now picture in your mind the classic double-slit experiment where an electron is emitted from some source, passes through (somehow) a barrier containing two narrow slits, and is then detected somewhere past on a florescent screen where we will see a point flash of light caused by the electron.

Let’s call one slit the X slit and the other slit the Y slit. If we think of the electron classically then it must either pass through the X slit or the Y slit on the way to the screen. You can then describe the logical structure in terms of the possible subsets of this set of possibilities: {X,Y}.
Those would be:

• {X,Y} corresponding to “Either the electron passed through X or through Y.”,
• {X} corresponding to “The electron passed through X.”,
• {Y} corresponding to “The electron passed through Y.”, and
• {} corresponding “The electron passed through neither” (which never happens since we are only considering electrons we’ve later detected at the screen.)

Quantum mechanically we would instead draw an X-axis indicating the “It went through X” and a perpendicular Y axis indicating “it went through Y” and then the plane containing these two axes is the space they span, the X-Y plane which represents the Either X or Y case.  The “Neither” case is expressed by the zero-dimensional spaced {0}.

But, looking at the plain, there’s no particular natural choice for the perpendicular axis.  You could choose a slightly or even significantly rotated pair of axes which would be just as meaningful.

(It gets a bit more complicated in that we actually work within a complex space where we can take both real and imaginary superpositions and where we are considering 1-complex dimensional subspaces.)

Every vector in this space defines a 1-dimensional subspace, a line through the origin which likewise represents a sharp system description. Such a system model is a superposition of the X and Y modes. One may observe the particle in a U mode or V mode each as specific a description as saying either “It went through X” or “It went through Y”.  However, in either case, one can neither say “It went through X” nor “It went through Y” although, by being in the spanning plane, it still (sort of) makes sense to say  “It either went through X or Y”.

Beyond simply stating “It is a superposition of the two cases” it is not really proper given a U or V observation to describe the electron’s behavior in terms of passage through X vs through Y.  This is where we realize that we aren’t talking about a particle in the classical sense anymore.  And this alternative resolution of the system, say U vs V, is not just a mathematical abstraction.  You can actually build some apparatus with magnets and charged plates that will separate electrons into beams flowing only into U or V detectors,  just as you can create a magnetic lens that will bring the electrons which pass through X to a given detector and those which pass through Y to a second separate detector.

There are two different frames of logic about electrons in this situation, each equally valid and neither more fundamental nor more natural or more real than the other.  It is just as correct to say the X and Y outcomes are superpositions of the U and V ones as it is to say that the U and V outcomes are superpositions of the X and Y ones.  We have relativity of the classical logic we can embed within the quantum logic.

This loss of absoluteness here is similar to the problem of asking if two separated events happened at the same time in the space-time of special relativity. Unless an observer frame is selected the relationship between space and time for events is relative. Here we are dealing with the relativity of the classical reality of the quantum system.

#### Another Example

So let’s look at another example system (again an electron but in different circumstances) for which we have a clear understanding of the meanings of some of the various quantum superpositions.  If we run an electron through a Stern-Gerlach magnet oriented so as to measure the electron’s spin component in the z-direction we will see the electron deflect along with one of two paths corresponding to two and only two possible values for its z-component of spin.  We, therefore, find that one and only one binary bit of information can be encoded in the electron’s spin.  The Hilbert space for this spin system is two-dimensional and we can pick out one direction to correspond to “z-spin +1/2” vs an orthogonal direction for “z-spin -1/2” (the unit of 1/2 here is basically so that the spacing is 1 unit).

Now we could also try to resolve the x-component of spin.  But we find that if we previously measure a z-spin value, then measure the x-spin, then subsequent measurement of the z-spin will not correspond to the earlier measurement.  The x-measurement destroyed the prior z-spin value.  We also note that after repeated examples of these types of z then x then z again experiments, we see, regardless of which z measurement we first saw, a 50-50 split in the x-measurements and then a subsequent 50-50 split in the z measurements with no correlation to the x measurements.  Mathematically this works in conjunction with choosing, in the same 2-dimensional space, the x-spin up direction to be rotated 45 degrees in the z-up z-down plane.

$$|x= +1/2\rangle = \tfrac{1}{\sqrt{2}} \left|z=+1/2\right\rangle +\tfrac{1}{\sqrt{2}}\left |z= -1/2\right\rangle$$

$$|x= -1/2\rangle = \tfrac{1}{\sqrt{2}} \left|z=+1/2\right\rangle -\tfrac{1}{\sqrt{2}}\left |z= -1/2\right\rangle$$

This means the spin measured as +1/2 in the x-direction is in a superposition of the + and –  z-spin values.  This does NOT mean we are adding z spins to get an x-spin.  The physical spins are not getting added here, but rather the abstract logical amplitudes represented in the Hilbert space.  And it’s not that we are adding anything per se because the “amounts” of the things we add do not mean anything.   It is the directions only that matter here.  The “reality” of x-spins is rotated 45 degrees to the “reality” of z-spins for the same electron.  Likewise with the y-spins but to get enough independent directions we must work in complex space so we end up with:

$$|y= +1/2\rangle = \tfrac{1}{\sqrt{2}} \left|z=+1/2\right\rangle +\tfrac{i}{\sqrt{2}}\left |z= -1/2\right\rangle$$

$$|y= -1/2\rangle = \tfrac{1}{\sqrt{2}} \left|z=+1/2\right\rangle -\tfrac{i}{\sqrt{2}}\left |z= -1/2\right\rangle$$

With the symmetry we have the x spins as superpositions of the z-spins but also vice versa, and also the y spins are superpositions of the z-spins et vice versa, and the x spins are superpositions of the y-spins and vice versa.

### Summary

The concept of superposition manifests both in classical and quantum physics.  These both have their root in the linearity of the abstract vector spaces used in system descriptions.  Quantum superposition is a more fundamental and less intuitive phenomenon as it occurs in the structure of the very logic of the system description rather than (or more aptly, as well as) at a higher more dynamic level.

To say a system “is in superposition” is meaningless in and of itself.  The phenomenon of an electron or other quantum system “being in superposition” is totally relative to what one has chosen as a starting basis for the Hilbert space representation.  One singular description will manifest as a superposition when expressed in terms of an alternate representation.  To resolve the cognitive dissonance one experiences when trying to understand how a physical object can be “in a superposition of two states” one must reflect upon the very semantics we use and the implicit premises it brings with it.  We never observe physical systems in superpositions, rather we observe physical systems with measurements which are in their descriptions and their effects, superpositions of other acts of measurement.

Tags:
3 replies
1. bhobba says:
After thinking about this for a number of years now I finally decided on the following as a reasonable motivation for QM. Consider a simple Markov chain for turning a coin over each second. Its matrix, A, is dead simple, 0’s on the main diagonal and 1’s otherwise. Now we ask a simple question – what happens if we want to generalise this to whats going on at 1/2 second. We need to find the matrix B such that B^2 = A. Thats not a hard exercise in linear algebra, but low and behold, it’s complex. Apply it to the starting state of the Markov chain and what do you get – a complex state. How are we to make sense of this? Well we define this thing called a POV and apply the modern easier version of Gleason’s Theorem. From that you basically get the two axioms in Ballentine and QM can be developed from that. But – and this is a key point – you need to show how to apply the formalism to problems just like anything in applied math. The following is a good start along those lines:
https://www.scottaaronson.com/democritus/lec9.html

Thanks
Bill

2. Stephen Tashi says:
Classical logic is concretely expressed using the algebra of sets.

As I remember the article Quantum mechanics made transparent. by Richard C. Henry, it makes the argument that probability distributions for classical states can be conveniently represented by vectors that lie on the surface of an n-dimensional sphere (or equivalence classes of rays that pass through such points).

Perhaps there is a way to present the transistion between classical logic and quantum mechanics in a more gradual manner instead of the jump from the logic of sets to the methods of QM. The properties of probability distributions on classical states could be an intermediate step.

• jambaugh says:

I’m not sure I follow, particularly on the convenience part. As to presenting the transition, it is not a gradual transition. It is a paradigm jump and there is no smoothness to it. I have revised the post with more exposition on the classical logic. Please take a look and comment.