# Busch's Gleason-like theorem

1. Jun 15, 2014

### Fredrik

Staff Emeritus
I want to discuss the theorem proved in the article "Quantum states and generalized observables: a simple proof of Gleason's theorem" by P. Busch. http://arxiv.org/abs/quant-ph/9909073v3. I've been avoiding this article for some time because I thought it would require more knowledge of POVMs. I recently started reading about them, but I can't say that I understand them yet. It turned out that you don't need a lot of knowledge about POVMs.

I have written down my thoughts about the article below, but I'll start with my questions, so that you don't have to read the whole post just to find out what I want to ask.
1. Is it correct to say that the article definitely doesn't contain a simple proof of Gleason's theorem?
2. Is it correct to say that what this theorem does is to find all (generalized) probability measures on the partially ordered set $\mathcal E(\mathcal H)$?
3. Is there really a bijective correspondence between probability measures on $\mathcal E(\mathcal H)$ and probability measures on the lattice of projectors? (This would be the consequence if this theorem and Gleason's both establish a bijective correspondence with state operators).
4. What is the definition of $\mathcal E(\mathcal H)$? Is it the set of all bounded positive operators with a spectrum that's a subset of [0,1]?
5. Why is $\mathcal E(\mathcal H)$ interesting? (As I said, I don't really understand this POVM stuff yet). To be more specific, why should we think of probability measures on $\mathcal E(\mathcal H)$ as "states". (OK, if they correspond bijectively to probability measures on the lattice of projectors, then that's a reason, but is there another one?)
6. Suppose that $\Omega=\{\omega_1,\dots,\omega_n\}$ is the set of possible results of a measurement. Let's use the notation $p(\omega_i|\rho)$ for the probability of result $\omega_i$, given state $\rho$. The book (mentioned in my comments below) says that there are positive operators $E_i$ such that $p(\omega_i|\rho)=\operatorname{Tr}(\rho E_i)$. How do you prove this? (This could perhaps help me understand the significance of these "effects").
7. What does it mean for a linear functional to be "normal", and how do you prove that every normal linear functional on the vector space of positive bounded operators is of the form $A\mapsto\operatorname{Tr}(\rho A)$, where $\rho$ is a state operator?
8. How do you prove that the extremal elements of $\mathcal E(\mathcal H)$ are projection operators? (This is unrelated to the theorem, and perhaps a topic for another thread).

These are the thoughts I wrote down to get things straight in my head, and perhaps make it easier to answer my questions:

The proof is easy, but it's difficult to understand both the assumptions that go into it and (especially) the author's conclusions.

The title appears to be seriously misleading. This isn't Gleason's theorem at all. Gleason's theorem is about finding all the probability measures on the lattice of subspaces of a Hilbert space, or equivalently, about finding all the probability measures on the lattice of projection operators on a Hilbert space. This theorem is about a larger partially ordered set that contains that lattice.

He calls that partially ordered set "the full set of effects $\mathcal E(\mathcal H)$", but he doesn't define it in the article. There's also no clearly stated definition in the book he wrote ("Operational quantum physics") with two other guys (Grabowski and Lahti). The book starts by considering an experiment with a finite set of possible results $\Omega=\{\omega_1,\dots,\omega_n\}$. (This is on pages 5-6). It denotes the probability of result $\omega_i$, given state $T$, by $p(\omega_i|T)$, and says that the functional $E_i$ defined by $E_i(T)=p(\omega_i|T)$ is called an effect. Then it claims, without proof, that there's a sequence $\langle E_i\rangle_i$ of positive linear operators, such that $\sum_i E_i=I$ and $E_i(T)=\operatorname{Tr}(TE_i)$ for all i, and all states T. From this point on, the term "effect" refers to the operator $E_i$ that appears on the right, not the functional $E_i$ that appears on the left. This is certainly not an unambiguous definition of the term "effect".

Page 25 (of the book) comes closer to actually defining the term. It says that for each state T, the map $B\mapsto\operatorname{Tr}(TB)$ is a functional on the set of bounded linear operators, and that the requirement that the numbers $\operatorname{Tr}(TB)$ represent probabilities implies that B is positive and such that $B\leq I$ (meaning that $I-B$ is positive). The book claims that this conclusion is equivalent to this: The spectrum of any effect is a subset of [0,1]. (The book doesn't actually say that B is an effect, but I'm guessing that this is what the authors meant).

On the same page, the notation $\mathcal E(\mathcal H)$ is used for "the set of effects". They mention that it's a partially ordered set with a minimum element and a maximum element, but not a lattice. They also say that the set $\mathcal E(\mathcal H)$ is a convex subset of the set of bounded linear operators, and that its extremal elements are the projection operators.

So it appears that an effect is defined as a positive operator B such that $B\leq I$, or equivalently as a bounded linear operator with a spectrum that's a subset of [0,1]. (Is it too much to ask that they actually say that somewhere? It's pretty frustrating to read texts like this). The proof in the article also mentions that there are positive operators that aren't in $\mathcal E(\mathcal H)$.

The proof considers an arbitrary function $\nu:\mathcal E(\mathcal H)\to[0,1]$ that satisfies a number of conditions that are similar to the defining conditions of a probability measure on a lattice. I haven't verified it, but I suspect that if we had been dealing with the lattice of subspaces, then Busch's conditions would have been equivalent to those defining conditions. If I'm right, I think this explains the assumptions of the theorem.

The proof finds (easily) that the arbitrary function $\nu$ can be uniquely extended to a linear functional on the vector space of all positive operators. The proof says that this functional is "normal (due to σ-additivity)", and then claims that it's "well known" that any such functional is obtained from a density operator. (I guess Busch means that there's a density operator $\rho$ such that $\nu(B)=\operatorname{Tr}(\rho B)$ for all positive operators B). The article claims that this is proved in (lemma 1.6.1 of) "Quantum theory of open systems" by E.B. Davies, which I would have to go to a library to find, and also in von Neumann's book from 1932, which supposedly contains "a direct elementary proof". But it doesn't say where in the book. I spent 10-15 minutes looking for it, with no success.

The article then continues "The conclusion of our theorem is the same as that of Gleason's theorem". There's no explanation of what this means. I guess that it means that just like Gleason, he has found a bijection between the set of state operators and a set of generalized probability measures on a partially ordered set. If that's the case, then there's also a bijective correspondence between probability measures on the lattice of projectors and probability measures on the partially ordered set of effects.

Last edited: Jun 15, 2014
2. Jun 15, 2014

### micromass

Yes, this is equivalent.

I severely dislike the terminology in the article. He seems to be deal with a linear functional $F:\mathcal{B}(H)\rightarrow \mathbb{C}$. It is much better to use the C*-algebra formalism here, since $\mathcal{B}(H)$ is a C*-algebra is a canonical way. So, what we are dealing with are linear functionals $\tau:A\rightarrow \mathbb{C}$ (where $A$ is a C*-algebra with unit) that is positive and normalized. This means that $\tau(a*a)\geq 0$ and $\tau(1) = 1$. It will not come as a surprise that C*-algebraists call such a function a state on the C*-algebra. See http://en.wikipedia.org/wiki/State_(functional_analysis [Broken]) See also the section "properties of states" for the definition of a "normal state", and this is what I guess is meant with normal.

The normal states on $\mathcal{B}(H)$ are exactly those of the form $\tau(A) = \mathrm{tr}(HA)$. The proof can be found in Kadison & Ringrose page 462 but doesn't seem very elementary. If you wish, I can try to find an elementary proof for you.

Last edited by a moderator: May 6, 2017
3. Jun 15, 2014

### micromass

They actually do say it in footnote [1]

4. Jun 15, 2014

### Fredrik

Staff Emeritus
Thanks micromass. I will take a look at the proof in Kadison & Ringrose.

5. Jun 15, 2014

### Fredrik

Staff Emeritus
I've had a first look at the proof in K & R. I think I understand that (e) is our assumption and that (a) is the result we want. The implication (e) → (f) looks simple enough, but (f) → (a) could be a problem. They say that this is the content of theorem 7.1.9. The proof of 7.1.9 immediately refers to theorem 7.1.8. The proof of 7.1.8 refers to at least five different numbered theorems. This could be pretty difficult to sort out. On the other hand, I might want to learn some of these things anyway. I will take a break now and take a look at those theorems later.

6. Jun 15, 2014

### micromass

Let us immediately do this in a more general situation since it is easier. So let $A$ be a unital C*-algebra. We define $\mathcal{E}(A)$ as the set of all hermitian elements $a$ of $A$ such that $0\leq a \leq 1$. Recall that a projection in $A$ is a $p$ such that $p^* = p = p^2$.

First, assume that $A$ is abelian. Due to the Gelfand-Naimark theorem, it has the form $\mathcal{C}(X)$ for $X$ a compact topological Hausdorff space. In this case $\mathcal{E}(A)$ are all the functions $f:X\rightarrow [0,1]$. I don't think it's very difficult to prove that $f$ must be an indicator function. Thus $f$ is a projection.

Now, the general case. Let $p$ be a projection and let $p=(a+b)/2$. Then $b/2 = p - a/2 \leq p$. This implies that $b$ and $p$ commute (see lemma later). But then $p$, $a$ and $b$ commute. So we can look at the C*-algebra generated by $p,~a,~b,~1$. From the abelian case, we see that $p=a=b$. Thus $p$ is an extreme point.

Conversely, let $a$ be an extreme point. Take $B$ the C*-algebra generated by $a$ and $1$, this is a unital abelian C*-algebra and $a$ is still an extreme point of $\mathcal{E}(B)$. Thus $a$ is a projection by the abelian case.

Lemma: If $p$ is a projection and if $0\leq a\leq p$, then $pa = ap = p$.
Indeed, since from $0\leq a\leq p$ follows that for each $c\in A$, we have $0\leq cac* \leq cpc^*$. In particular, we have $0\leq (1-p)a(1-p)\leq (1-p)p(1-p) = 0$. Thus $0 = (1-p)a(1-p)$. But then by the C*-identity, we have
$$\|a^{1/2}(1-p)\|^2 = \|(1-p)a(1-p)\| = 0$$
Thus $a^{1/2} = a^{1/2}p$ and thus $a = ap$. By taking adjoint, we get $a^* = p^* a^*$ and thus $a = pa$.

7. Jun 15, 2014

### naima

Another paper on the subject
With small differences.

8. Jun 15, 2014

### micromass

9. Jun 15, 2014

### Staff: Mentor

I have been mucking around with that theorem for a while now and have come up with my own slightly simplified proof. My comments will be to that proof rather than the one in the article - its essentially the same though.

Its a Gleason like theorem based on the stronger assumption of POVM's rather than resolutions of the identity (ROI) ie Von Neumann measurements. But in modern times it is recognised that Von Neumann measurements are not the most general kind of measurement so an axiomatic treatment can start with POVM's rather than resolutions of the identity. In fact that's my personal preferred path. In fact one can, by means of a bit of physical insight and Neumarks theorem derive POVM's from ROI's. I used to view it that way but don't any more, and simply take POVM's as the starting point.

It shows, from the assumption of non contextuality and the strong principle of superposition, then the only probability measure that can be defined on a POVM is via the Born rule. Partial ordering isn't required. Of course the real key assumption is non-contextuality.

I think the rest of the questions can best be answered if I post up my proof and we can pull it to pieces.

It will take me a little while though.

Thanks
Bill

Last edited: Jun 15, 2014
10. Jun 15, 2014

### Staff: Mentor

OK guys here is the proof I came up with.

Just for completeness lets define a POVM. A POVM is a set of positive operators Ei ∑ Ei =1 from, for the purposes of QM, an assumed complex vector space.

Elements of POVM's are called effects and its easy to see a positive operator E is an effect iff Trace(E) <= 1.

First lets start with the foundational axiom the proof uses as its starting point.

An observation/measurement with possible outcomes i = 1, 2, 3 ..... is described by a POVM Ei such that the probability of outcome i is determined by Ei, and only by Ei, in particular it does not depend on what POVM it is part of.

Only by Ei means regardless of what POVM the Ei belongs to the probability is the same. This is the assumption of non contextuality and is the well known rock bottom essence of Born's rule via Gleason. The other assumption, not explicitly stated, but used, is the strong law of superposition ie in principle any POVM corresponds to an observation/measurement.

I will let f(Ei) be the probability of Ei. Obviously f(I) = 1 from the law of total probability. Since I + 0 = I f(0) = 0.

First additivity of the measure for effects.

Let E1 + E2 = E3 where E1, E2 and E3 are all effects. Then there exists an effect E E1 + E2 + E = E3 + E = I. Hence f(E1) + f(E2) = f(E3)

Next linearity wrt the rationals - its the usual standard argument from additivity from linear algebra but will repeat it anyway.

f(E) = f(n E/n) = f(E/n + ..... + E/n) = n f(E/n) or 1/n f(E) = f(E/n). f(m E/n) = f(E/n + ...... E/n) or m/n f(E) = f(m/n E) if m <= n to ensure we are dealing with effects.

Will extend the definition to any positive operator E. If E is a positive operator a n and an effect E1 exists E = n E1 as easily seen by the fact effects are positive operators with trace <= 1. f(E) is defined as nf(E1). To show well defined suppose nE1 = mE2. n/n+m E1 = m/n+m E2. f(n/n+m E1) = f(m/n+m E1). n/n+m f(E1) = m/n+m f(E2) so nf(E1) = mf(E2).

From the definition its easy to see for any positive operators E1, E2 f(E1 + E2) = f(E1) + f(E2). Then similar to effects show for any rational m/n f(m/n E) = m/n f(E).

Now we want to show continuity to show true for real's.

If E1 and E2 are positive operators define E2 < E1 as a positive operator E exists E1 = E2 + E. This means f(E2) <= f(E1). Let r1n be an increasing sequence of rational's whose limit is the irrational number c. Let r2n be a decreasing sequence of rational's whose limit is also c. If E is any positive operator r1nE < cE < r2nE. So r1n f(E) <= f(cE) <= r2n f(E). Thus by the pinching theorem f(cE) = cf(E).

Extending it to any Hermitian operator H.

H can be broken down to H = E1 - E2 where E1 and E2 are positive operators by for example separating the positive and negative eigenvalues of H. Define f(H) = f(E1) - f(E2). To show well defined if E1 - E2 = E3 - E4 then E1 + E4 = E3 + E1. f(E1) + f(E4) = f(E3) + f(E1). f(E1) - f(E2) = f(E3) - f(E4). Actually there was no need to show uniqueness because I could have defined E1 and E2 to be the positive operators from separating the eigenvalues, but what the heck - its not hard to show uniqueness.

Its easy to show linearity wrt to the real's under this extended definition.

Its pretty easy to see the pattern here but just to complete it will extend the definition to any operator O. O can be uniquely decomposed into O = H1 + i H2 where H1 and H2 are Hermitian. f(O) = f(H1) + i f(H2). Again its easy to show linearity wrt to the real's under this new definition then extend it to linearity wrt to complex numbers.

Now the final bit. The hard bit - namely linearity wrt to any operator - has been done by extending the f defined on effects. The well known Von Neumann argument can be used to derive Born's rule. But for completeness will spell out the detail.

First its easy to check <bi|O|bj> = Trace (O |bj><bi|).

O = ∑ <bi|O|bj> |bi><bj| = ∑ Trace (O |bj><bi|) |bi><bj|

Now we use the linearity that the forgoing extensions of f have led to.

f(O) = ∑ Trace (O |bj><bi|) f(|bi><bj|) = Trace (O ∑ f(|bi><bj|)|bj><bi|)

Define P as ∑ f(|bi><bj|)|bj><bi| and we have f(O) = Trace (OP).

P, by definition, is called the state of the quantum system. The following are easily seen. Since f(I) = 1, Trace (P) = 1. Thus P has unit trace. f(|u><u|) is a positive number >= 0 since |u><u| is an effect. Thus Trace (|u><u| P) = <u|P|u> >= 0 so P is positive.

Hence a positive operatotor of unit trace P exists such that the probability of Ei occurring in the POVM E1, E2 ..... is Trace (Ei P).

Now its out there we can pull it to pieces and see exactly what's going on.

Thanks
Bill

Last edited: Jun 15, 2014
11. Jun 15, 2014

### Fredrik

Staff Emeritus
I'll be happy to assist, since this is a topic that interests me a lot, but it could take some time, since I'm also looking at the stuff I'm discussing with micromass, in order to fill the gaps in Busch's proof.

That book by Blackadar is really nice. (Link in micro's post). I think I'm going to have to read a big part of it thoroughly. Right now I have to go to bed, but I'll see what I can do tomorrow.

12. Jun 15, 2014

### Staff: Mentor

Hopefully my proof has no gaps. I have seen a number of proofs and picked the eyes out of them so to speak to get the most elegant one.

It interests me as well because it leads to a very elegant axiomatic treatment of QM. Basically the two axioms used in my favourite QM book, Ballentine is now just one. Very very elegant.

Thanks
Bill

13. Jun 16, 2014

### naima

Busch writes that this is not enough for d=2.
Could you explain why?

14. Jun 16, 2014

### Staff: Mentor

It's not enough for D=2 in Gleason's usual proof based on resolutions of the identity. But no such restriction exists for the proof based on POVM's - which is one of its advantages. You can check the proof yourself and see no such restriction is required.

In fact he states exactly that - from his paper 'The statement of the present theorem also extends to the case of 2-dimensional Hilbert spaces where Gleason’s theorem fails.'

Thanks
Bill

15. Jun 17, 2014

### Fredrik

Staff Emeritus
It took me some time to refresh my memory about Gelfand transforms and that kind of stuff, but I think I understand this now, except for a detail that looks simple: $0\leq a\leq p$ implies $0\leq cac^*\leq cpc^*$. I can prove this easily if I can prove that the product of two positive operators is positive, so I tried to prove that. (I thought incorrectly that you had assumed that $c\geq 0$). After some time of failing to do that, I did a google search for "product of positive operators". What I found only made me suspect that there's no such theorem.

I tried to find this result in Blackadar, but the theorem I found assumes that the operators commute. So maybe it just isn't true. In that case, I don't see why the implication should hold for all $c\in A$.

Take your time. I still have a lot of other things to look at, in particular the proof (either Kadison & Ringrose or Blackadar) of the theorem about states, and bhobba's long post.

Last edited: Jun 17, 2014
16. Jun 17, 2014

### micromass

I guess you have defined a positive element $a$ as being self-adjoint and having positive spectrum? Or maybe you are only talking about operators on a Hilbert space and then you defined an operator $A$ to be positive if it is self-adjoint and $<Ax,x>>0$ for each $x$?

Both are fine definitions, but one can prove the following highly nontrivial theorem:

THEOREM: A element $a$ in a $C^*$-algebra is positive if and only if there exist a $d$ in the $C^*$-algebra such that $a=d^*d$.

The proof of this theorem utilizes the Gelfand transform again. See Murphy's "C*-algebras and operator theory". Theorem 2.3.5 gives the equivalence between the operator version of positive and the C*-algebra version of positive. Theorem 2.2.4 proves the above theorem.

The result I used then is that if $a,~b$ are self-adjoint, if $c$ is arbitrary and if $a\leq b$, then $c^*ac \leq c^*bc$. Using the theorem, this is now trivial. Indeed, we know that $b-a\geq 0$ and thus there exists a $d$ such that $b-a = d^*d$. Multiplying by $c$we get $c^*bc - c^*ac = c^*d^*dc = (dc)^*dc\geq 0$. Thus $c^*ac\leq c^*bc$.

17. Jun 18, 2014

### micromass

Now, I don't know much QM, so take this with a grain of salt. But whenever I see concepts like that, I always like to compare it with the commutative situation. In that situation, everything should work classically and we should get actual probability measures in the classical sense.

Indeed, if we work commutative, then we work in a space $\mathcal{C}(X)$ of continuous functions on some (compact) Hausdorff space.
What are the states on this algebra? They are by definition bounded linear functionals $\tau:\mathcal{C}(X)\rightarrow \mathbb{C}$ such that $\tau(f)>0$ if $f>0$ and $\tau(1)=1$. It turns out that every probability measure $\mu$ on $X$ (if $X$ is nice enough) determines a state, indeed, we set $\tau(f) = \int_X fd\mu$. The converse is also true. This is a theorem by Riesz, Markov and Kakutani: http://en.wikipedia.org/wiki/Riesz–Markov–Kakutani_representation_theorem

The space $\mathcal{P}(\mathcal{H})$ corresponds here to usual functions $f:X\rightarrow \{0,1\}$ which are continuous. So the projections are just continuous indicator functions.
But if $X = [0,1]$ (for example), then we only have two projections since $X$ is connected. So the probability measures on the projections don't really show us all the probability measures on $X$, so don't give us all the states.

The space $\mathcal{E}(\mathcal{H})$ corresponds here to functions $f:X\rightarrow [0,1]$. Probability measures on such function should know correspond to states on the entire algebra. I haven't proved it, but it seems reasonable since every function $f\in \mathcal{C}(X)$ can be decomposed as $f = f^+ - f^-$ and the $f^+$ and $f^-$ can be rescaled to be an effect.

Also, the fact that there is no bijection between the probability measures on the projections and the probability measuress on the effects in this case, might indicate that the answer to your question (3) is no. But of course, in (3) we are dealing with an entire different C*-algebra!

18. Jun 18, 2014

### micromass

Because of the close connections to Von Neumann algebras, I realized that it is probably better to look at abelian Von Neumann algebras instead of abelian C*-algebras for the classical situation. The difference between these two can be big since a Von Neumann algebra always has many projections.

So what is an abelian Von Neumann algebra? We can prove it is always the same as $L^\infty(X,\mu)$, so the a.e. bounded functions on some measure space. The states should again have the form $f\rightarrow \int_X fd\mu$ for $\mu$ probability measures, but I can't seem to find a reference for it. I can try to prove it if you want.

The projections are now all measurable indicator functions on $X$. This is a much better situation since this is the same as the $\sigma$-algebra on $X$. Thus the probability measures on this do in fact correspond to the states.

The effects are now the measurable functions $f:X\rightarrow [0,1]$. The same argument as my previous post should show that we indeed get that a probability measure on this is the same as a state.

19. Jun 18, 2014

### micromass

My previous post on Abelian Von Neumann algebra's suggested that this probably true since a Von Neumann algebra has a lot of projections. In particular, it can be shown that a Von Neumann algebra is the closure of the linear span of its projections. This is a consequence of the spectral theorem.

In particular, given a probability measure $\nu$ on the lattice of projectors and given a effect $A$, we can then write $A$ as a limit of linear combinations of projections. So $\sum\alpha_i P_i \rightarrow A$. I think it should then definitely be possible to define $\nu(A)$ as the limit of $\sum\alpha_i \nu(P_i)$.

Another possibility is to take the spectral measures for $A$. These are projections $E(S)$ for every measurable set $S$. Taking $\nu(E(S))$ then defines some kind of probability measure on the sets $S$. Since $A = \int xdE$, it might not be a bad idea to define $\nu(A) = \int xd\nu(E)$.

20. Jun 18, 2014

### Fredrik

Staff Emeritus
I tried to use both of these definitions.

I thought I would be able to prove this without looking at Murphy (because I just read about similar things in Sunder), but I had to assume that d is normal to complete my proof. I see that Murphy's 2.2.4 proves that $d^*d$ is positive without that assumption. The proof is something that I wouldn't have come up with in a very long time. I will have to study it carefully.

Crystal clear. Thanks.

21. Jun 18, 2014

### micromass

Like I said, it is highly nontrivial. It took the great operator algebraist Kaplansky to show this.

22. Jun 22, 2014

### Fredrik

Staff Emeritus
Sorry about being so slow. I haven't abandoned the thread. I decided that I want to know C*-algebras a little bit better before I continue. I expect to spend between 1 and 3 more days on that before I return here.

23. Jul 1, 2014

### naima

Take O = i * Id
f gives the probabikity of an operator. What would it be for i * Id?

24. Jul 1, 2014

### Staff: Mentor

Obviously i.

But remember I am extending f. Beyond effects its not interpreted as probability.

Thanks
Bill

25. Jul 2, 2014

### naima

Why have you to extend f for this theorem ?. You add a new axiom.