How to teach beginners in quantum theory the POVM concept

2019 Award

Summary:

Compared to Born's rule in its traditional squared probability amplitude form, the POVM concept is both more general and more easy to introduce on an elementary level.

Main Question or Discussion Point

[Edit 23.12.2019: A much extended, polished version of my contributions to this thread can be found in my paper Born's rule and measurement (arXiv:1912.09906).]

I'd still not know, how to teach beginners in QT using the POVM concept. [...] I don't think it's possible to introduce POVMs for physicists without using the standard formulation in the usual terms of observables and states.
Well, it is simpler than to introduce in full generality Born's rule.

Everything can be motivated and introduced nicely for a qubit, using polarization of classical light, as in my Insight article on A Classical View of the Qubit. That article concentrated on preparation (i.e., the states) rather than measurement (i.e., the POVMs). One can follow it up with the following discussion of measurement.

In a first course, I'd introduce pure states later than in the Insight article, deriving initially von Neumann's dynamics for the density operator rather than the Schrödinger equation. This would emphasize the idealization involved in the latter. The Schrödinger equation is really needed only much later, as a computational tool.

Having the Hilbert space and the unnormalized density operator for sources, one introduces a detector as a collection of detector elements of which at most one responds at any given time, defining a stochastic process of events. The measurement postulate takes the following simple form:

(DRP) (detector response principle)
A detector element ##k## responds to a stationary source in state ##\rho## with a rate ##p_k## depending linearly on the state ##\rho##.

The linearity is well motivated by beam experiments: Changing the intensity amounts to a scalar multiplication of densities, combining two sources to addition. Thus it is easy to check by experiment the linearity of typical instrument responses, and the motivation is complete.

Postulate (DRP) is the only measurement postulate; everything else can be derived from it when the Hilbert space is finite-dimensional.

By linearity, the rates satisfy ##p_k=\sum_{i,j} P_{kji}\rho_{ij}## for suitable complex numbers ##P_{kji}##. If the Hilbett space has finite dimension ##n##, these coefficients can be found operationally by approximately measuring the rates for at least ##n^2## linearly independent states ##\rho## and solving the resulting linear least squares problem for the coefficients. This is called quantum detection tomography.

Introducing the matrices ##P_k## with ##(j,i)## entries ##P_{kji}##, this can be written as
$$p_k=Tr~\rho P_k,$$
thus providing a derivation of the POVM extension of Born's probability formula from very simple first principles. The properties of the matrices can be deduced from the fact that the ##p_k## are rates of a stationary process. Hence they are nonnegative and sum to a constant. Since ##p_k## is real for all states ##\rho##, the ##P_k## must be Hermitian. Picking arbitrary pure states ##\rho=\psi\psi^*## shows that ##P_k## is positive semidefinite. Summing the probabilities shows that the sum of the ##P_k## is a multiple of the identity. Requiring this multiple to be 1 is conventional and amounts to a choice of units for the rate in such a way that if the state of the surce is normalized to trace 1, the ##p_k## bdcome probabilities rather than rates. Thus the ##P_k## form a POVM and we have derived everything.

If there are a large number of detector elements, the detection event are usually encoded numerically. The value assigned to the ##k##th detection event is pure convention, and can be any number ##a_k##, or even a vector when the elements are arranged in a multidimensional array. It is whatever has been written on the scale the pointer points to, or whatever has been programmed to be written by an automatic digital recording device.

The state dependent formula for the expectation of the observable measured that follows from POVM together with the value assignment is ##\langle A\rangle=Tr~\rho A## with the operator (or operator vector) ##A=\sum a_kP_k##. We may say, the detector measures an observable represented by the operator (vector) ##A##
Note that the same operator ##A## in the expectation can be decomposed in many ways into a linear combinaion of many POVM terms; thus there may be many different POVMs measuring observables corresponding to the same operator ##A##.

By picking the values carefully one can choose them to approximate a particular operator ##X## of interest, for example the position operator. This corrsponds to the classical situation of labeling the scale of a meter to optimally match a desired observable.

If the detector can be tuned by adjusting parameters ##\theta## affecting its responses, the ##P_k=P_k(\theta)## depend on these these parameters, giving ##A=\sum a_kP_k(\theta)##. Now both the labels ##a_k## and the parameters ##\theta## can be tuned to improve the accuracy with which the desired ##X## is approximated. This is the process called calibration. Constructing detector devices that allow a high quality measurement corresponding to theoreticlly important operators is the challenge of high precision experimental physics.

The derivation just given is simple, intuitive, and complete. It tells everything needed to check and if necessary calibrate arbitrary detectors for their claimed measurement properties.

The derivation is far simpler, far more intuitive, and far more complete than what is needed to introduce students new to quantum physics to Born's rule, with its initially very weird formula for probabilities in a pure state.

Born's rule in expectation form is the very idealized case (realized experimentally only approximately, in very special situations) where the ##P_k## are orthogonal projectors, .e., ##P_k^2=P_k=P_k^*## and ##P_jP_k=P_kP_j## for all ##j,k##. In this special case case, and only in this case, the components of ##A## commute and have a joint discrete spectrum, given by the ##a_k##. This special case is distinguished in that by relabeling the values ##a_k## to ##f(a_k)##, the same detector also measures any function ##f(A)## of ##A##.

To get Born's rule in its traditional textbook form, one has to specialize further the state to be a normalized pure state, ##\rho=\psi\psi^*## with ##\psi^*\psi=1##, and one finds ##p_k=\psi^*P_k\psi##.

Last edited:
Auto-Didact, bhobba, dextercioby and 2 others

Related Quantum Physics News on Phys.org
DarMM
Gold Member
How is the state taking the form of a trace class operator motivated prior to this? Or at least a complex matrix?

2019 Award
How is the state taking the form of a trace class operator motivated prior to this? Or at least a complex matrix?
For the qubit, in the way stated in the insight article cited. It leads very naturally to complex positive semidefinite Hermitian operators. Then by natural generalization to arbitrary Hilbert spaces.

DarMM
Gold Member
For the qubit, in the way stated in the insight article cited. It leads very naturally to complex positive semidefinite Hermitian operators. Then by natural generalization to arbitrary Hilbert spaces.
The insight article is great. It's the natural generalization I'm curious about. How do we get out the tensor product structure for multiple copies and the insight article only deals with photons, what kind of exposition are you imagining here to justify dealing with all physical systems in a formalism introduced for beams of light.

2019 Award
The insight article is great. It's the natural generalization I'm curious about.
From the Insight article one learns in a well-motivated way that a complex Hilbert space of dimension 2 models the simplest quantum phenomena with a positive definite Hermitian ##\rho## describing the state of an arbitrary source, the trace of ##\rho## as the intensity of the source, and certain Hermitian operators representing key quantities.

Probably having mastered this lets every beginning student of quantum physics accept the generalization. It is enough to say that in nearly 100 years of experimental work it was established beyond reasonable doubt that not only photon polarization but an arbitrary quantum system is describable in terms of an arbitrary complex Hilbert space, with a positive definite Hermitian ##\rho## with finite trace describing the state of an arbitrary source, the trace of ##\rho## defining the macroscopic intensity of the source, and certain Hermitian operators (with details depending on the quantum system) define key quantities.

Of course this is not a proof, but when creating foundations for a beginners course one may refer to authorities, and only make plausibility arguments that are easily grasped.
How do we get out the tensor product structure for multiple copies
This is a question quite different from the one you had asked in post #2.

After the qubit I'd introduce the anharmonic oscilator, the second simplest system of fundamental importance. This shows that finite-dimensional Hilbert spaces are not enough and infinite dimensions (i.e., functional analysis) is needed. (This is important even for people interested only in quantum information theory. They need to know that real systems oscillate and need an infinite-dimensional Hilbert space. Otherwise they are prone to draw misleading inferences from their limited ##N##-qubit point of view on quantum mechanics.)

Here a lot of elementary phenomena (related to boundary conditions, bound states and scattering states, tunneling) can be discussed. It is only here that the Schrödinger equation starts to become important. I'd motivate it through the considerations in Sections 2.2 and 2.3 of Part I of my foundational series of papers.

Then I'd introduce Ehrenfest's theorem and the classical limit, as in Part IV of my foundational series of papers. This establishes the close connection to classical mechanics, including the fact that the
Poisson bracket is the classical limit of the scaled commutator.

From this one can easily see that coupled harmonic oscillators require a tensor product of Hilbert spaces. Again, it is enough to say that this generalizes to arbitrary composite systems.

After that one can introduce annihilation and creation operators and bosonic Fock spaces over an ##N##-dimensional Hlbert space. This leads to the notion of indistinguishability of ##N## oscillators . One can then play with the consruction and look at fermionic Fock spaces. This is the Hilbert space of ##N## qubits, and leads to quantum infomation theory.

Then one can raise curiosity about Fock spaces over infinite-dimensional Hlbert spaces and relate it to quantum fields and systems of arbitrarily many free particles.

This concludes the foundation.

The next step would be to discuss approximation methods.

[If anyone is interested to work out a chapter of an introductory course along these lines I'd be happy to cooperate - offline!]

Last edited:
mattt, Auto-Didact and DarMM
DarMM
Gold Member
Then one can easily see that coupled harmonic oscillators require a tensor product of Hilbert spaces
Just curious about this. Recent work suggests it is Relativity that forces the tensor product structure. I'm wondering how you get at the tensor product of the two Hilbert spaces being the correct one to describe the product system as opposed to some other Hilbert space. What feature are you using?

I don't mean rigorously establish or similar, I'm only asking in terms of pedagogy.

2019 Award
Just curious about this. Recent work suggests it is Relativity that forces the tensor product structure. I'm wondering how you get at the tensor product of the two Hilbert spaces being the correct one to describe the product system as opposed to some other Hilbert space. What feature are you using?

I don't mean rigorously establish or similar, I'm only asking in terms of pedagogy.
The same as Dirac 1926, namely classical correspondence. The Poisson algbra of a composite system is also the tenspr product of the Poisson algebra of its constituents. I find this very compelling.

Last edited:
DarMM
vanhees71
Gold Member
2019 Award
Summary: Compared to Born's rule, the POVM concept is both more general and more easy to introduce on an elementary level.

Well, it is simpler than to introduce in full generality Born's rule.

Everything can be motivated and introduced nicely for a qubit, using polarization of classical light, as in my Insight article on A Classical View of the Qubit. That article concentrated on preparation (i.e., the states) rather than measurement (i.e., the POVMs). One can follow it up with the following discussion of measurement.

In a first course, I'd introduce pure states later than in the Insight article, deriving initially von Neumann's dynamics for the density operator rather than the Schrödinger equation. This would emphasize the idealization involved in the latter. The Schrödinger equation is really needed only much later, as a computational tool.

Having the Hilbert space and the unnormalized density operator for sources, one introduces a detector as a collection of detector elements of which at most one responds at any given time, defining a stochastic process of events. The measurement postulate takes the following simple form:

(DRP) (detector response principle)
A detector element ##k## responds to a stationary source in state ##\rho## with a rate ##p_k## depending linearly on the state ##\rho##.

The linearity is well motivated by beam experiments: Changing the intensity amounts to a scalar multiplication of densities, combining two sources to addition. Thus it is easy to check by experiment the linearity of typical instrument responses, and the motivation is complete.

Postulate (DRP) is the only measurement postulate; everything else can be derived from it when the Hilbert space is finite-dimensional.

By linearity, the rates satisfy ##p_k=\sum_{i,j} P_{kji}\rho_{ij}## for suitable complex numbers ##P_{kji}##. If the Hilbett space has finite dimension ##n##, these coefficients can be found operationally by approximately measuring the rates for at least ##n^2## linearly independent states ##\rho## and solving the resulting linear least squares problem for the coefficients. This is called quantum detection tomography.

Introducing the matrices ##P_k## with ##(j,i)## entries ##P_{kji}##, this can be written as
$$p_k=Tr~\rho P_k,$$
thus providing a derivation of the POVM extension of Born's probability formula from very simple first principles. The properties of the matrices can be deduced from the fact that the ##p_k## are rates of a stationary process. Hence they are nonnegative and sum to a constant. Since ##p_k## is real for all states ##\rho##, the ##P_k## must be Hermitian. Picking arbitrary pure states ##\rho=\psi\psi^*## shows that ##P_k## is positive semidefinite. Summing the probabilities shows that the sum of the ##P_k## is a multiple of the identity. Requiring this multiple to be 1 is conventional and amounts to a choice of units for the rate in such a way that if the state of the surce is normalized to trace 1, the ##p_k## bdcome probabilities rather than rates. Thus the ##P_k## form a POVM and we have derived everything.

If there are a large number of detector elements, the detection event are usually encoded numerically. The value assigned to the ##k##th detection event is pure convention, and can be any number ##a_k##, or even a vector when the elements are arranged in a multidimensional array. It is whatever has been written on the scale the pointer points to, or whatever has been programmed to be written by an automatic digital recording device.

The state dependent formula for the expectation of the observable measured that follows from POVM together with the value assignment is ##\langle A\rangle=Tr~\rho A## with the operator (or operator vector) ##A=\sum a_kP_k##. We may say, the detector measures an observable represented by the operator (vector) ##A##
Note that the same operator ##A## in the expectation can be decomposed in many ways into a linear combinaion of many POVM terms; thus there may be many different POVMs measuring observables corresponding to the same operator ##A##.

By picking the values carefully one can choose them to approximate a particular operator ##X## of interest, for example the position operator. This corrsponds to the classical situation of labeling the scale of a meter to optimally match a desired observable.

If the detector can be tuned by adjusting parameters ##\theta## affecting its responses, the ##P_k=P_k(\theta)## depend on these these parameters, giving ##A=\sum a_kP_k(\theta)##. Now both the labels ##a_k## and the parameters ##\theta## can be tuned to improve the accuracy with which the desired ##X## is approximated. This is the process called calibration. Constructing detector devices that allow a high quality measurement corresponding to theoreticlly important operators is the challenge of high precision experimental physics.

The derivation just given is simple, intuitive, and complete. It tells everything needed to check and if necessary calibrate arbitrary detectors for their claimed measurement properties.

The derivation is far simpler, far more intuitive, and far more complete than what is needed to introduce students new to quantum physics to Born's rule, with its initially very weird formula for probabilities in a pure state.

Born's rule in expectation form is the very idealized case (realized experimentally only approximately, in very special situations) where the ##P_k## are orthogonal projectors, .e., ##P_k^2=P_k=P_k^*## and ##P_jP_k=P_kP_j## for all ##j,k##. In this special case case, and only in this case, the components of ##A## commute and have a joint discrete spectrum, given by the ##a_k##. This special case is distinguished in that by relabeling the values ##a_k## to ##f(a_k)##, the same detector also measures any function ##f(A)## of ##A##.

To get Born's rule in its traditional textbook form, one has to specialize further the state to be a normalized pure state, ##\rho=\psi\psi^*## with ##\psi^*\psi=1##, and one finds ##p_k=\psi^*P_k\psi##.
Well, this is how I introduce my teachers students to QM too, of course without the POVM formalism. I start with polarization experiments with (of course idealized) polaroids, letting one linear-polarization component of a classical em. field through and blocking the perpendicularly polarized linear-polarization component completely. Then I argue that when dimming the corresponding laser more and more at some point the stochastic nature of the em. field becomes observable (of course also cautioning that these are detection events for single photons though the state is not a single-photon state). Then you get the probabilistic interpretation and the entire QT-formalism in terms of Hilbert-space vectors, self-adjoint (hermitean is of course fine in this case, because we deal with a finite-dimensional unitary space here) including Born's rule in the usual way.

Of course, in an advanced lecture for experts the POVM formalism is important and can for sure be nicely taught along the lines of your posting and Insights article. That the classical Maxwell theory is so close to QFT is clear, because you can get very far with linear-response theory, and there the equations for classical fields and their operator analogues don't differ very much. It's even hard to find true examples for the necessity of field quantization. The most simple one is spontaneous emission, which afaik cannot be described in the semiclassical approximation (i.e., with the "matter", i.e., in quantum optics mostly electrons, treated quantum-theoretically and the em. field as classical "background field"). There you are back at the historical development, i.e., the thermodynamical origin of QT, but I'd quote Einstein's derivation of the Planck Law using kinetic arguments, where he clearly discovered the necessity for spontaneous emission.

It's also clear that a complete classical theory of macroscopic electrodynamics needs statistical physics, e.g., that the intensity of light is the temporal average over rapidly oscillating energy-momentum tensor components of the em. field (i.e., quadratic functionals of the field).

Demystifier
Gold Member
One conceptual problem with POVM measurements is that it is not so clear what does it correspond to for classical measurements.

2019 Award
One conceptual problem with POVM measurements is that it is not so clear what does it correspond to for classical measurements.
How is this special to POVMs? It is also not clear what Born's rule corresponds to for classical measurements.

Demystifier
Gold Member
How is this special to POVMs? It is also not clear what Born's rule corresponds to for classical measurements.
In the usual projective measurement, you measure an observable (position, momentum, energy, ...) that has a classical counterpart. But what is measured in a generalized POVM measurement?

2019 Award
In the usual projective measurement, you measure an observable (position, momentum, energy, ...) that has a classical counterpart. But what is measured in a generalized POVM measurement?
Exactly measured is the operator ##A## constructed in post #1 from the POVM and the assigned values ##a_k##. To measure a prescribed operator ##X## with a classical counterpart (e.g., position, momentum, energy, ...) you must create a detector whose ##A## reproduces ##X## sufficiently well.

Apart from the simplicity and the straightforward motivation, the advantage of the POVM setting in #1 is that it is absolutely clear what measurement amounts to and how accurate it is in operational terms.

In contrast, Born's rule is completely silent about the notion of measurement, specifying only the statistics of the results of a mysterious measurement process. It tells one nothing about how to tune a concrete detector to produce accurate results.

Last edited:
2019 Award
Well, this is how I introduce my teachers students to QM too, of course without the POVM formalism.
But probably also without deriving the density operator and the Schrödinger equation from the classical optics. This is new in my approach; I have never seen it anywhere. Together with the POVM approach in post ##1, this provides a fully intelligible motivation for all basic features of quantum mechanics.

In the usual treatments, these basic features are addressed by just postulating the required items.
when dimming the corresponding laser more and more at some point the stochastic nature of the em. field becomes observable
Actually, following the semiclassical description of the photoeffect in the book by Mandel and Wolf, the correct explanation is that the stochastic nature of the detector response (to a sufficiently dim classical or quantum electromagnetic field) becomes observable.

Demystifier
Gold Member
Exactly measured is the operator ##A## constructed in post #1 from the POVM and the assigned values ##a_k##. To measure a prescribed operator ##X## (e.g., position, momentum, energy, ...) you must create a detector whose ##A## reproduces ##X## sufficiently well.

Apart from the simplicity and the straightforward motivation, the advantage of the POVM setting in #1 is that it is absolutely clear what measurement amounts to and how accurate it is in operational terms, whereas Born's rule is completely silent about the notion of measurement, specifying only the statistics of the results of a mysterious measurement process. It tells one nothing about how to tune a concrete detector to produce accurate results.
I understand all this, but I suspect that a student of physics familiar with classical concepts who learns QM for the first time will not find this very illuminating.

Anyway, an introductory QM textbook that teaches POVM's (without calling them so) is
https://www.amazon.com/dp/052187534X/?tag=pfamazon01-20&tag=pfamazon01-20
Sec. 9.5 Measurements on open systems.

andresB
2019 Award
I suspect that a student of physics familiar with classical concepts who learns QM for the first time will not find this very illuminating.
But surely not less illuminating than the standard introduction of Born's rule!

Last edited:
Demystifier
Gold Member
But surely not less illuminating than the sandard introduction of Born's rule!
I agree, but at least one has a feeling that one understands what is measured. QM is shocking at many levels, but perhaps one does not need to get all the shocks at the same time.

2019 Award
I agree, but at least one has a feeling that one understands what is measured. QM is shocking at many levels, but perhaps one does not need to get all the shocks at the same time.
Well, what is measured is not spelled out at all by Born's rule, only what you get when you measure it. So how can one get a feeling that one understands what is measured?

The truth is that one assumes in an introductory course that the student already knows what is measured, or at least trusts the physics community that it knows. The student would be hard pressed to explain which ingredients ensure that what is actually measured is what is claimed to be measured.

The POVM version in post #1 has no such problems. It gives actual understanding, not only the appearance of it.

2019 Award
Then perhaps you would also agree with the pedagogy of the book
https://www.amazon.com/dp/9814579394/?tag=pfamazon01-20&tag=pfamazon01-20
which starts with QFT and ends with classical mechanics.
Perhaps, but I don't currently have access to it, neither to the book you cited in post #14. Thus I cannot comment unless you summarize the essential points that differ from traditional expositions.

vanhees71
Gold Member
2019 Award
But probably also without deriving the density operator and the Schrödinger equation from the classical optics. This is new in my approach; I have never seen it anywhere. Together with the POVM approach in post ##1, this provides a fully intelligible motivation for all basic features of quantum mechanics.

In the usual treatments, these basic features are addressed by just postulating the required items.

Actually, following the semiclassical description of the photoeffect in the book by Mandel and Wolf, the correct explanation is that the stochastic nature of the detector response (to a sufficiently dim classical or quantum electromagnetic field) becomes observable.
Of course, I derive the Schrödinger equation in the traditional way too. Also the density operator comes much later (if at all). As I said, you need to start with the simple heuristic things first to understand the refined modern views of POVMs.

That the photoeffect does not prove the necessity of field quantization is quite old. I guess it was derived very early in the history of QM using the semiclassical approximation. Ironically the field quantization met much scepticism first: as far as I know the first appearance in the literature is in the famous "Dreimännerarbeit" by Born, Jordan, and Heisenberg, with the QFT part mostly due to Jordan. The usual argument was that the quantization of the em. field was "too much", and indeed the early experimental "proofs" of QM (photoeffect, hydrogen spectrum, Compton effect) don't need field quantization in the lowest order of approximation. There the semiclassical treatment is sufficient. The QFT treatment of the em. field had to be reinvented shortly later by Dirac (1927 or 1928).

2019 Award
Of course, I derive the Schrödinger equation in the traditional way too.
What is this traditional way? The tradition is to postulate the Schrödinger equation, not to derive it.
That the photoeffect does not prove the necessity of field quantization is quite old.
Given that you know this I don't understand how you can
argue that when dimming the corresponding laser more and more at some point the stochastic nature of the em. field becomes observable
because that ''the photoeffect does not prove the necessity of field quantization'' proves that the stochastic nature of the em. field is irrelevant for the occurrence of the photoeffect (and is needed only for quantitative predictions in case nonclassical light is used - not a topic for an introductory course).

Instead, one can argue only that
the stochastic nature of the detector response (to a sufficiently dim classical or quantum electromagnetic field) becomes observable.
Indeed, this is what follows from the semiclassical analysis of Wentzel 1926 (as found in Mandel and Wolf).

ftr
vanhees71
Gold Member
2019 Award
Of course you are right. I argue that you can do the polarization experiment with true single-photon states nowadays, but that you cannot really understand before the QT formalism is established.

Of course one cannot "derive" any fundamental equation of physics in a mathematical sense. The standard "derivation" argues via the Einstein-de Broglie relations ##E=\hbar \omega##, ##\vec{p}=\hbar \vec{k}##. It's of course also heuristic and in no way strict.

2019 Award
By picking the values carefully one can choose them to approximate a particular operator X of interest, for example the position operator. This corrsponds to the classical situation of labeling the scale of a meter to optimally match a desired observable.

ftr
Hi Arnold
Just few questions. Would you tie POVM concept to TI in your "textbook"? If so have you published TI in any peer review journal, what was the response of the mainstream. Also, for TI to become a legitimate interpretation, how does the process must proceed, I mean what kind of authority approval is involved. Thanks.