Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

State space, beginner's question

  1. Jun 24, 2011 #1
    Background. I recently started reading A Quantum Mechanics Primer by Daniel T. Gillespie. I'm up to the section on Banach spaces in Kreyszig's Introductory to Functional Analysis with Applications, and have looked ahead to the chapter on applications to quantum mechanics, Ch. 11. I've also read the brief section in Griffel's Linear Algebra and its Applications, Vol. 2, section 12K*, on quantum mechanics. I'm left with some questions.

    1. Suppose a quantum mechanical system is being modelled with unit vectors of [itex]L^2(\mathbb{R}^3,\mathbb{C})[/itex] to represent states of the system. How is the state space usually defined. Is it

    (a) the Hilbert space [itex]\cal{H} = L^2(\mathbb{R}^n,\mathbb{C})[/itex], including all the unreachable, impossible, undefined states whose magnitude is not 1, and allowing distinct mathematical states - i.e. points in the state space, state vectors - to (redundantly) represent the same physical state,

    (b) a set of equivalence classes on the subset of [itex]L^2(\mathbb{R}^n,\mathbb{C})[/itex] consisting of all unit vectors, equivalent under the relation [itex](|z|=1) \Rightarrow (\Psi \sim z \Psi)[/itex], made into a Hilbert space by appending to the standard function-space definitions of scaling and vector addition a rule which says "rescale to unit length",

    or something else?

    2. While searching for the answer to this question, I came across what looks like an alternative formulation, using what's called a projective Hilbert space, [itex]\cal{P}\cal{H}[/itex], where the vectors comprise a set whose elements are equivalence classes of L2 vectors in the aforementioned subset, equivalent under the relation

    [tex](\forall z \in \mathbb{C}\setminus \left \{ 0 \right \})(\forall \Psi \in \cal{P}\cal{H})[\Psi \sim z \Psi].[/tex]

    But it seems this is not "how quantum mechanics is ordinarily formulated" (Brody & Hughston, 1999: Geometric Quantum Mechanics), and requires a more complicated-looking version of the Schrödinger equation. While (b) modified the usual definitions of scaling and vector addition to ensure the underlying set is closed under the vector space operations, I guess a projective Hilbert space must modify the definition of the inner product to ensure that [itex](\forall \Psi \in \cal{P}\cal{H})(\left \langle \Psi, \Psi \right \rangle = 1)[/itex]. Is this done by simply specifying that the function integrated to give the value of an inner product is chosen to be an element of the vector (the vector regarded as an equivalence class of functions) such that

    [tex]\int_{\mathbb{R}^3} ff = 1.[/tex]

    Is this projective Hilbert space what Kreyszig is describing on p. 574, "Hence we could equally well say that the state of our system is a one-dimensional subspace, [itex]y \subset L^2[/itex]..."? Is "state" synonymous (here or generally) with his "state vector"? And is a projective Hilbert space actually a Hilbert space?*

    Oh, one more question: what is the y in Brody and Hughston's projective Schrödinger equation (p. 4).

    *EDIT: It seems not! Suppose there exists a projective Hilbert space which is a Hilbert space, and that a is a vector of this space. Then

    a = -a;
    a + a = a + (-a) = a - a = 0;
    a + a = 2 a = a = 0.
    Last edited: Jun 24, 2011
  2. jcsd
  3. Jun 24, 2011 #2


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    I know very little about projective spaces, so I could be wrong, but it seems to me that there's no natural way to even define addition on the projective space whose points are the one-dimensional subspaces of a Hilbert space.

    The set of (pure) states can be defined as the set of one-dimensional subspaces, or as any other set that can be bijectively mapped onto it. It's common to take the set of pure states to be the set of unit rays (the set of equivalence classes you mentioned in (b)). I don't know if it can be given a vector space structure in a natural way (it looks to me like it can't), and I don't see why we would want to. The only reason I can think of to define some kind of structure on the set of states is to make the group of automorphisms smaller. If F and G are unit rays, we can define <F,G>=|<f,g>|. If this function is considered part of the structure, then automorphisms are required to preserve it (i.e. if h is an automorphism, <h(F),h(G)>=<F,G>). Note that when <F,G> is preserved, transition probabilities are preserved. This is very useful, because now it's very natural to incorporate symmetries into the theory by making assumptions about the automorphisms. For example, "space is isotropic" translates to "there's a group homomorphism from SO(3) into Aut(S), where S is the set of states".

    The prettiest definition of the set of states is as the set of probability measures on the lattice of closed linear subspaces of the Hilbert space. This is a convex set, which by Gleason's theorem is (convex) isomorphic to the set of bounded, trace-class, self-adjoint, non-negative operators with trace 1. So an alternative definition of "state" is to say that a state is a member of that set of operators. (Ballentine uses this definition). This includes all the mixed states as well as the pure states. The pure states are the extreme points (corners) of this convex set. They satisfy [itex]\rho^2=\rho[/itex], which makes them projection operators. Because of the condition [itex]\operatorname{Tr}\rho=1[/itex], they are projection operators of one-dimensional subspaces.

    However, in introductory courses, the standard appears to be to never mention mixed states and to say that each normalised square-integrable function represents a "state" (which in this context means "pure state"), while every other function that only differs from it by multiplication of a complex number with absolute value 1 represents the same state. Note however that a wavefunction representing a physical state is required to satisfy the Schrödinger equation, which means that it has to be differentiable, and therefore continuous.
    Last edited: Jun 24, 2011
  4. Jun 24, 2011 #3
    Hi, Fredrik! I'm always excited to see that you've replied to a question I've posted, as I know I'm sure to learn something - if not today, eventually... I'll have to come back and reread your answer when I understand more, but here's what I get for now. All corrections and comments welcome.

    I should distinguish between states (elements of the state space) and state vectors (elements of the Hilbert space which the state space is derived from). However defined, a quantum state space - i.e. the set of states of a quantum mechanical system - is not necessarily a vector space; it doesn't generally have an additive structure, nor does it need one. The state "space" doesn't actually need any structure beyond that of a set.

    Introductory texts are likely to deal only with a subset of states called pure states. These may be - and commonly are - defined as 1-dimensional subspaces of the underlying Hilbert space, called rays or, equivalently, points in the relevant "projective Hilbert space", which is not a Hilbert space!?

    Alternatively, pure states may be identified with equivalence classes of unit vectors, under the relation described in (b). But the set of such states doesn't need to be given a Hilbert space structure, so there's no need for the renormalising rule I suggested be appended to the usual function-space definitions of scaling and vector addition. Or is there?

    The only situation where I've come across vector space operations so far in this context is in expanding every state vector in the orthonormal eigen-Schauder-basis associated with a Hermitian operator. (Gillespie has only dealt with the case of a discrete set of eigenvalues so far.) Looking ahead, I see that addition of state vectors also features in describing something called superposition of pure states. This article says the result of a superposition is "is a different quantum state (possibly not normalized)". I wonder how this idea would be expressed if we're distinguishing between state vectors (which can be, and are, normalised) and the states themselves, which have no vector space structure, and hence no norm to be normalised by. I also wonder if it's best to get these definitions straight in my head now, or to read on, and hope they become clearer as I see more examples of how they're applied.

    I've studied the definition of a probability measure; I'm just surprised that there should be a whole set of them involved in a single structure/application! I'm looking up lattice now; is it this kind of lattice? Is this closed in the topological sense (complement of an open set), or, tautologically, the algebraic sense (closed under the vector space operations), or some other kind of closed?
  5. Jun 24, 2011 #4


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    Yes, this is an excellent summary.

    I'm not sure your definition of addition of unit rays makes sense, but I'm also not sure that I understood what you meant. If seems to me that if F and G are unit rays, and f and g are vectors in F and G respectively, you want to define F+G as the equivalence class of [itex](f+g)/\|f+g\|[/itex]. Is that what you meant? The problem is that F+G defined this way depends on the choice of representatives. (What if you change the phase of g but not f?) So this operation looks ill-defined to me, but maybe you had something else in mind.

    This sounds weird to me too. Sounds like he uses the term "state" for what I call a "state vector".

    There's a probability measure (of the traditional kind, with a σ-algebra as its domain) associated with each pair (s,A) where s is a state and A an observable, and there's a generalized probability measure (with a lattice as its domain) associated with each state.

    Think of it this way. A theory assigns probabilities to verifiable statements. Verifiable statements are statements of the form "if you use the measuring device [itex]\delta[/itex] on the object [itex]\pi[/itex], the result will be in the set [itex]E[/itex]", so they can be uniquely identified by triples [itex](\pi,\delta,E)\in\Pi\times\Delta\times\Sigma[/itex], where [itex]\Pi[/itex] is the set of objects on which measurements are performed (I call those objects preparations), [itex]\Delta[/itex] is the set of measuring devices, and [itex]\Sigma[/itex] is a σ-algebra of subsets of a set that's large enough to contain all possible measurement results (for all measuring devices). ℝ is a natural choice for that set, but so is ℂ. I've been told that it's occasionally useful to label measurement results by complex numbers, so let's pick ℂ, and choose [itex]\Sigma[/itex] to be the Borel algebra of ℂ (the smallest σ-algebra that includes all the open sets). A theory is supposed to assign probabilities to those triples, so each theory must define a function [tex]P:\Pi\times\Delta\times\Sigma\rightarrow[0,1],[/tex]
    such that for each compatible* pair [itex](\pi,\delta)[/itex], the map [itex]E\mapsto P(\pi,\delta,E)[/itex] is a probability measure. (This only makes sense if [itex]\Sigma[/itex] is a σ-algebra).

    *) I would consider an object to be incompatible with a measuring device if e.g. the object is too large to fit in the device. In this case, it wouldn't make sense to require that the map above is a probability measure. Instead we require that it takes every set to 0.

    I will finish this in a separate post. It will be a copy-and-paste from my notes about these things, with a few minor edits. I actually typed this up over the last few days, so your timing is excellent.

    The definition I have in mind is: A partially ordered set such that every finite set has a least upper bound and a greatest lower bound. A partially ordered set is just a pair (X,≤) where X is a set and ≤ is a partial order. A partial order is a binary relation that's reflexive (x≤x), transitive (x≤y, y≤ z [itex]\Rightarrow[/itex] x≤z) and antisymmetric (x≤y, y≤x [itex]\Rightarrow[/itex] x=y).
    Last edited: Jun 25, 2011
  6. Jun 25, 2011 #5


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    The function P implicitly defines several other functions, like the map [itex]E\mapsto P(\pi,\delta,E)[/itex] already mentioned above. We will be interested in the functions that are suggested by the following notations:[tex]\begin{align*}
    We use the [itex]P_\pi[/itex] and [itex]P^\delta[/itex] functions to define equivalence relations on [itex]\Pi[/itex] and [itex]\Delta[/itex]: [tex]\begin{align*}
    &\forall \pi,\rho\in\Pi\qquad &\pi \sim \rho\quad &\text{if}\quad P_\pi=P_\rho\\
    &\forall \delta,\epsilon\in\Delta\qquad &\delta \sim \epsilon\quad &\text{if}\quad P^\delta=P^\epsilon
    The sets of equivalence classes are denoted by [itex]\mathcal S[/itex] and [itex]\mathcal O[/itex] respectively. The members of [itex]\mathcal S=\Pi/\sim[/itex] are called states, and the members of [itex]\mathcal O =\Delta/\sim[/itex] are called observables. The idea behind these definitions is that if two members of the same set can't be distinguished by experiments, the theory shouldn't distinguish between them either. So from now on, we will be talking about states and observables instead of preparations and measuring devices.

    The symbol P will be used not only for the functions already mentioned, but also for several other functions, as suggested by the notations [tex]P(\pi,\delta,E)=P(s,\delta,E)=P(\pi,A,E)=P(s,A,E),[/tex]
    where [itex]s=[\pi][/itex] and [itex]A=[\delta][/itex]. For each [itex](A,E)\in\mathcal O\times\Sigma[/itex], let [itex]P^{(A,E)}[/itex] denote the map [itex]s\mapsto P(s,A,E)[/itex]. We define an equivalence relation on [itex]\mathcal O\times\Sigma[/itex] as well: [tex]\begin{align*}
    &\forall (A,E),(B,F)\in\mathcal O\times\Sigma\qquad &(A,E)\sim(B,F)\quad &\text{if}\quad P^{(A,E)}=P^{(B,F)}
    The set of equivalence classes is denoted by [itex]\mathcal L[/itex]. The members of [itex]\mathcal L=(\mathcal O\times\Sigma)/\sim[/itex] are called propositions. (Comment that's not copied from the notes: I may have misunderstood what that term is usually used for. I'm not sure at this point. But it doesn't really matter what term we use in this post, so I'll leave it as "propositions"). We will use the symbols [itex]s,t[/itex] for states, [itex]A,B[/itex] for observables, [itex]a,b[/itex] for propositions, and [itex]E,F[/itex] for Borel sets. This makes expressions like [itex]\forall s[/itex] unambiguous, so we won't have to use the longer notation [itex]\forall s\in\mathcal S[/itex] anymore. When we want to emphasize that propositions are equivalence classes, we will use the notation [itex][A,E][/itex]. As before, several new functions are implicitly defined by the ones we already have. We will be interested in the ones suggested by the following notations (where [itex]a=[A,E][/itex]): [tex]P(s,A,E)=P(s,a)=P_s(a)=P^a(s).[/tex]
    We use the [itex]P^a[/itex] maps to define a partial order on [itex]\mathcal L[/itex]. [tex]\begin{align}
    a\leq b\quad\text{if}\quad \forall s~~ P^a(s)\leq P^b(s).
    This makes [itex](\mathcal L,\leq)[/itex] a partially ordered set.

    We have already noted that each state [itex]s[/itex] defines a function [itex]P_s:\mathcal L\rightarrow[0,1][/itex]. The map [itex]s\mapsto P_s[/itex] is obviously injective, so [itex]\mathcal S[/itex] is bijectively mapped onto some set of functions with [itex]\mathcal L[/itex] as their domain. Similarly, [itex]\mathcal O[/itex] can be bijectively mapped onto a set of functions with [itex]\mathcal L[/itex] as their codomain. Each observable [itex]A[/itex] defines a function [itex]a_A:\Sigma\rightarrow\mathcal L[/itex] by [itex]a_A(E)=[A,E][/itex] for all [itex]E\in\Sigma[/itex], and the map [itex]A\mapsto a_A[/itex] is injective.


    OK, that was the copied-and-pasted part. Note that all of the above is just what we get when we think really hard about what a theory is. We haven't at any point used any of the assumptions that define quantum mechanics. The next logical step of this approach is to make additional assumptions. Some will be assumptions about what reality is like, and some will be mathematical idealizations. Some can be thought of as clarifications about what the terms we started with really meant, and others can only be thought of as defining smaller classes of theories. What we eventually find this way is that there's a class of theories that's large enough to contain all the classical theories and all the quantum theories, in which [itex]\mathcal L[/itex] is a lattice and the [itex]P_s[/itex] functions defined above satisfy the definition of a probability measure on a lattice (a generalization of the concept of probability measure on a σ-algebra). This is what I like about this definition. It's almost theory independent. It's certainly more general than the framework of quantum mechanics.

    In quantum theories [itex]\mathcal L[/itex] is the lattice of closed subspaces of a complex Hilbert space. In classical theories, [itex]\mathcal L[/itex] is the lattice of all subsets of phase space. (Maybe that should be "all Borel subsets". I will have to think about that)
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook