What is really that density matrix in QM?

Jamister · Aug 9, 2019

Summary: What are the basic assumptions of QM about the density matrix?

The subject of density matrix in quantum mechanics is very unclear to me.
In the books I read (for example Sakurai),they don't tell what are the basic assumptions and how you derive from them the results of the density matrix.
Is the basic assumption of the Copenhagen interpretation* of QM that there exists a wave function of the system, and we just don't know it, because of that we use the density matrix as an statistical method? And if so, how is it formally derived from the laws of QM.
Is the density matrix formulation is necessary to understand results from real experiments, or is it possible to do it without it.Copenhagen interpretation* = regular QM

DarMM · Aug 9, 2019

orisomech said:

Summary: What are the basic assumptions of QM about the density matrix?

Is the basic assumption of the Copenhagen interpretation* of QM that there exists a wave function of the system, and we just don't know it, because of that we use the density matrix as an statistical method? And if so, how is it formally derived from the laws of QM.

You'll see that in some books but it's not really valid. More recent introductory books (e.g. Englert's Lectures on Quantum Mechanics) don't describe it this way.

In truth a density matrix are just the most general form of quantum states, not ignorance of a pure state. A pure state is the special case of maximal knowledge, i.e. zero entropy.

A. Neumaier · Aug 9, 2019

DarMM said:

In truth a density matrix are just the most general form of quantum states, not ignorance of a pure state. A pure state is the special case of maximal knowledge, i.e. zero entropy.

The first half of the second sentence seems to contradict the first sentence. If the density operator is not about ignorance, the pure state cannot be deemed to be about maximal knowledge. Thus:

A density operator is the most general form of a quantum state, not ignorance of a pure state. A pure state is the special case of zero entropy.

DarMM · Aug 9, 2019

A. Neumaier said:

The first half of the second sentence seems to contradict the first sentence. If the density operator is not about ignorance, the pure state cannot be deemed to have maximal knowledge

I don't understand. Lower entropy means higher knowledge, i.e. greater predictive ability right?

Jamister · Aug 9, 2019

DarMM said:

You'll see that in some books but it's not really valid. More recent introductory books (e.g. Englert's Lectures on Quantum Mechanics) don't describe it this way.

Thank you for your answer.
So can you derive this density matrix formalism or is it new axiom of QM?

Michael Price · Aug 9, 2019

A. Neumaier · Aug 9, 2019

DarMM said:

I don't understand. Lower entropy means higher knowledge, i.e. greater predictive ability right?

No. Quantum entropy is unrelated to knowledge.

If you know everything about the state of a quantum system it means that you know its state exactly. This state can be any pure or mixed state. You cannot know more!

Michael Price · Aug 9, 2019

A. Neumaier said:

No. Quantum entropy is unrelated to knowledge.

If you know everything about the state of a quantum system it means that you know its state exactly. This state can be any pure or mixed state. You cannot know more!

But in a mixture you don't know the state exactly.

A. Neumaier · Aug 9, 2019

Michael Price said:

But in a mixture you.don't know the state exactly.

According to many textbooks, you would be right. But DarMM was referring to the modern view, see Englert's paper.

Michael Price · Aug 9, 2019

A. Neumaier said:

According to many textbooks, you would be right. But DarMM was referring to the modern view, see Englert's paper.

Unfortunately it has a paywall.

vanhees71 · Aug 9, 2019

The standard interpretation says that in all generality the state is described by a statistical operator (I'd not call it density matrix, which is only the special case that you work in the position representation).

The statistical operator has the following properties

Mathematical
----------------------
(a) it's a self-adjoint positive semidefinite operator
(b) its trace is 1

In addition there are self-adjoint operators representing observables. The observables can take values given by the eigenvalues of the operator representing this observable.

A self-adjoint operator also defines a complete set of eigenvectors (in the somewhat sloppy sense of physicists, also including "generalized eigenvectors" if there are continuous eigenvalues like for position and momentum operators).

Physics
-----------

A state, represented by a statistical operator, describes how a system is initially prepared, before you make a measurement.

Then measuring an observable ##A##, represented by a self-adjoint operator ##\hat{A}## with eigen values ##a## with orthonormalized eigenstates ##|a,\alpha \rangle##, where ##\alpha## is some parameter(s) labeling the different eigenvectors to the eigenvalue ##a##.

Then the meaning of the statistical operator (and according to the minimal statistical interpretation, which is a flavor of the family of Copenhagen interpretations) is that the probability to get the value ##a_k## when measuring the observalbe ##A## on a system prepared in the state described by the statistical operator ##\hat{\rho}## is
$$P(a|\hat{\rho})=\sum_{\alpha} \langle a,\alpha|\hat{\rho}|a,\alpha \rangle.$$

Now ##\hat{\rho}## as a self-adjoint operator has itself a complete set of orthonormalized eigenvectors ##|p_k \rangle##.

Because ##\hat{\rho}## is positive semidefinite, you have
$$p_k=\langle p_k|\hat{\rho}|p_k \rangle \geq 0.$$
and we have defined that
$$\mathrm{Tr} \rho = \sum_k \langle a_k|\hat{\rho}|a_k\rangle = \sum_k p_k=1.$$
There's one "extreme case", namely that exactly one eigenvalue of ##\hat{\rho}## is 1 and all others then must necessarily vanish. So say ##p_1=1## and ##p_k=0## for ##k \in \{2,3,\ldots \}##.

Due to the completeness of the eigenvectors we then have
$$\hat{\rho}=\sum_k p_k |p_k \rangle \langle p_k|=|p_1 \rangle \langle p_1|.$$
In this case the probability for finding the value ##a## when measuring the observable ##A## is
$$P(a|\hat{\rho})=\sum_{\alpha} \langle a,\alpha|p_1 \rangle \langle p_1|a,\alpha \rangle=\sum_{\alpha} |\langle a,\alpha|p_1 \rangle|^2.$$
Now ##\psi(a,\alpha):=\langle a,\alpha|p_1 \rangle## is nothing else than the wave function in the ##A## representation, and the general formalism with the statistical operator gets you also the special case when you describe the state in terms of a wave function.

It should be clear that the pure states are very special. They are the most detailed kind of states you can have about the quantum system, i.e., it provides the most complete description of the state the system is in. One thus also says this special case provides a "pure state".

Now in this threads also some more advanced things have been stated. One is entropy. That's a very important concept. You may have heard about entropy as a thermodynamical quantity appearing in the 1st and 2nd fundamental law of thermodynamics. The concept is, however, much more general, and the thermodynamical entropy is a special application to equilibrium states.

The entropy is rather a measure for the missing information relative to complete knowledge. This measure of information is by definition a positive quantity, and it was discovered in this general sense by C. Shannon in his analysis of communication via noisy channels. The concept then has been also applied in physics, particularly in quantum theory. Nowadays it's even a quite new and important field of research, called quantum information theory. The upshot is that the measure for the missing information (the entropy), given you know the system is prepared in the state ##\hat{\rho}## is given by
$$S=-\mathrm{Tr}(\hat{\rho} \ln \hat \rho)=-\sum_{k} p_k \ln p_k.$$
If there are values ##p_k=0## in the sum it has to be understood as
$$\lim_{x \rightarrow +0} x \ln x=0.$$
Now if you have a pure state, only one ##p_k=1##. All the others are 0. Since also ##1 \ln 1=0##, this entropy (named after its discoverers the Shannon-von Neumann-Jaynes entropy) this means that the entropy vanishes if and only if the system is prepared in a pure state.

That makes sense: Since entropy is non-negative by definition (note that ##0 \leq p_k \leq 1## and thus ##\ln p_k \leq 0##) it's lowest value is 0, and a statistical operator that has 0 entropy means that it provides complete knowledge in the sense of this entropy measure for missing information. This shows that the Shannon-von Neumann-Jaynes definition of entropy makes sense for QT: According to QT the most complete information you can have about a system is that it is prepared in a pure state, and that's precisely the states for which entropy vanishes.

The information theoretical point of view is also an important way to guess the best statistical operator given some (usually incomplete) information. The best choice would be from a scientific point of view, to choose the statistical description of the situation reflecting the least possible prejudice, i.e., you want to describe the situation based on the (maybe incomplete) information, i.e., you want to choose the very statistical operator that maximizes the entropy under the constraint of which information you have. This describes the statistics such that the probabilities (encoded in the so chosen "maximum-entropy stat. op.") do not falsely mimik more knowledge than you have since the maximum of the entropy ensures that it is the state of maximum missing information under the constraint of what's really known.

@DarMM made an important remark in #2, and that's at the heart of the profound difference between the description of nature comparing this description by quantum vs. classical physics:

In classical physics you use statistical methods only, because you can't have the complete information about the system for some reason. Most profoundly that's clear for macroscopic systems. Within classical mechanics in principle everything is determined: If you know the precise positions and momenta of all particles making up the macroscopic system, you know everything completely. That's the "true state" the system is in according to classical mechanics, but of course in practice it's not possible to accurately now all precise values of the position and momentum of ##10^{24}## particles. Thus you choose some important "macroscopic observables" like the center of mass of the body and its total momentum and maybe some more quantities of this kind, depending on the problem you want to solve. In statistical physics you describe this e.g., by a single-particle phase-space distribution function, giving the phase-space density for finding a particle with momentum ##\vec{p}## at the position ##\vec{x}##. This ignores all kinds of correlations between particles. For that you'd need two-particle, three-particle phase space distribution functions etc. No matter, how detailed you get, you can in practice never know the complete state of the macroscopic body, but it always is in a definite state, i.e., all the possible observables (which are functions of the phase-space variables) take a determined value. In other words the true state is just one point in phase space, but you can't know this because of the complexity of the macroscopic body and thus you use statistics to describe the really important ("relevant") macroscopic observables as statistical averages over the microscopic degrees of freedom.

Now in quantum theory there are situations, where you can have a system in a pure state (providing complete information about it in the quantum sense!) but parts of it are not in a pure state. That's the case if you have entanglement. Thus there are well defined systems like an electron being entangled in both spin and momentum with a positron coming from a decay of a neutral pion at rest; due to the conservation laws fulfilled in electromagnetic transitions the total ##e^+ e^-## system must be in the spin-singlet state and due to momentum conservation, their momenta are back-to-back since the total momentum must stay 0. One can now ask, what's the state of the electron alone, just ignoring the positron. As it turns out the spin-z component of the electron is in its maximum-entropy state, which is simply ##\hat{\rho}_{\text{spin of electron}}=\hat{1}/2##, i.e., it is totally unpolarized. Though the spin-z component of the complete system consisting of electron and the positron is completely determined to be 0, the spin state of the electron is as incomplete as it can get.

By the way: The entropy of the spin state in this case is
$$S=-1/2 \ln(1/2)-1/2 \ln(1/2)=-\ln(1/2)=\ln 2.$$
Thus the "information content" (here completely missing) of a "qu-bit" (i.e., a two-level system) is ##\ln 2##.

A. Neumaier · Aug 9, 2019

Michael Price said:

Unfortunately it has a paywall.

https://arxiv.org/abs/1308.5290 is the preprint

A. Neumaier · Aug 9, 2019

vanhees71 said:

The entropy is rather a measure for the missing information relative to complete knowledge.

It is this only in the very special case where you can interpret the system as being a classical mixture of pure eigenstates!

In general, the classical analogy does not make sense at all: When you prepare an unpolarized beam of light its state is not pure, but no information about the state is missing!

vanhees71 · Aug 9, 2019

As I stressed, in information theory entropy provides a measure for missing information relative to complete knowledge, i.e., ##S=\ln 2## for the unpolarized-photon state ##\hat{\rho}_{\text{unpol}}=\hat{1}/2## is relative to the case of complete knowledge about the photon's polarization, i.e., relative to the case if it were prepared in a pure state.

Of course for the very photon (or rather the ensemble of photons) prepared in the state ##\hat{\rho}_{\text{unpol}}## no information is missing. That was what was stressed in #2 (as I said also above).

DarMM · Aug 9, 2019

orisomech said:

Thank you for your answer.
So can you derive this density matrix formalism or is it new axiom of QM?

A state ##\omega## in QM will assign an expectation value to each operator ##\omega\left(A\right)## with ##A## some operator. You can prove that given the algebra of operators you have in QM all states can be represented by a statistical operator, i.e. ##\omega\left(A\right) = Tr\left(\rho A\right)##

So it can be derived. However it's not derived from the pure states in any sense if that is what you mean.

DarMM · Aug 9, 2019

A. Neumaier said:

If you know everything about the state of a quantum system it means that you know its state exactly. This state can be any pure or mixed state. You cannot know more!

As @vanhees71 is saying this is knowledge in the information theoretic sense.

So even for a classical information source it might be characterised by ##p_i## for each of it's ##i## outcomes and the entropy measures how informative each outcome is. A classical pure state is one with vanishing entropy as an outcome is completely uninformative, because you know exactly what the outcome will be in advance. Thus you have maximum knowledge of sequences of outcomes/ensembles of outcomes. It's not reflecting lack of knowledge of the state.

So when we say that a mixed state does not have maximum knowledge we do not mean we lack knowledge of the true state, it means lack of ability to guess at outcomes relative to the pure case. Thus each outcome is more informative in a mixed state than in a pure state, as reflected in the non-zero entropy.

A. Neumaier · Aug 9, 2019

DarMM said:

As @vanhees71 is saying this is knowledge in the information theoretic sense.

So even for a classical information source it might be characterised by ##p_i## for each of it's ##i## outcomes and the entropy measures how informative each outcome is. A classical pure state is one with vanishing entropy as an outcome is completely uninformative, because you know exactly what the outcome will be in advance. Thus you have maximum knowledge of sequences of outcomes/ensembles of outcomes. It's not reflecting lack of knowledge of the state.

So when we say that a mixed state does not have maximum knowledge we do not mean we lack knowledge of the true state, it means lack of ability to guess at outcomes relative to the pure case. Thus each outcome is more informative in a mixed state than in a pure state, as reflected in the non-zero entropy.

I know this story but it is not consistent!

If I measure the up operator on a large sample prepared in state rho, the relevant entropy is that of the diagonal of rho and not that of the eigenvalues of rho. But the latter figures in statistical mechanics!

They agree only in the classical case.

Thus the association with knowledge is justified only in the classical case.

DarMM · Aug 9, 2019

A. Neumaier said:

If I measure the up operator on a large sample prepared in state rho, the relevant entropy is that of the diagonal of rho and not that of the eigenvalues of rho. But the latter figures in statistical mechanics!

I'm not sure I understand this. I understand the mathematics but I'm not sure what the effect is on an information theoretic reading of the entropy. Could you be more explicit.

Most textbooks on Quantum Information theory tend to define entropy in terms of knowledge, so I'm not sure what the disagreement is here.

atyy · Aug 9, 2019

orisomech said:

Summary: What are the basic assumptions of QM about the density matrix?

The subject of density matrix in quantum mechanics is very unclear to me.
In the books I read (for example Sakurai),they don't tell what are the basic assumptions and how you derive from them the results of the density matrix.
Is the basic assumption of the Copenhagen interpretation* of QM that there exists a wave function of the system, and we just don't know it, because of that we use the density matrix as an statistical method? And if so, how is it formally derived from the laws of QM.
Is the density matrix formulation is necessary to understand results from real experiments, or is it possible to do it without it.Copenhagen interpretation* = regular QM

There are two types of density matrices. Both cases can be seen as applications of the Born rule for pure states.
(1) "Proper" mixtures: Assume each system is in a pure state, but the preparation procedure prepares a mixture of pure states, ie. sometimes the preparation procedure prepares a particular pure state, other times the procedure prepares another pure state. In this case probabilities for measurement outcome will be given by the Born rule for each pure state and the probability with which each pure state is prepared. The density matrix is a formalism that calculates the probabilities of measurement outcomes for the mixture, can be considered a generalization of the Born rule for pure states to mixtures.
(2) "Improper" mixtures: Assume a system is in a pure state, but you only make a measurement on a subsystem. The Born rule for pure states applies to the whole system, and allows us to derive rules in which only the subsystem is explicitly considered in order to calculate the probabilities of measurement outcomes. The derived rules involve a density matrix. In this case called an improper mixture, the density matrix is also called the reduced density matrix.

The above can be found in many textbooks, but sometimes the two types of density matrices are described separately. An article discussing both is (sorry, I think it is not freely available):

The role of the density operator in the statistical description of quantum systems
American Journal of Physics 75, 1162 (2007)
Jamshid Sabbaghzadeh and Ali Dalafi
https://doi.org/10.1119/1.2785194
Notes by Bertlemann about the density matrix in proper mixtures:
https://homepage.univie.ac.at/reinhold.bertlmann/pdfs/T2_Skript_Ch_9corr.pdf
Notes from a course run by Stefan Kröll, Peter Samuelsson, Andreas Walther about the density matrix matrix for proper and improper mixtures:
http://www.matfys.lth.se/education/quantinfo/QIlect2.pdf
In all elementary quantum mechanics, one can take pure states to be the basis of quantum mechanics. In very, very advanced cases that I have only just learned about from @DarMM (see his post above) and that I don't have much understanding of, there may not be a pure state. However, what I have written above will be ok for lots of things (eg. Weinberg's QM text still gives the postulates using pure states). Edit: I have just read the article by Englert linked to by @A. Neumaier , and nothing in there contradicts the elementary view I have given. In Copenhagen, the pure state is already an as-if reality, so there is no problem with as-if-as-if-realities. The case of pure states not existing that I learned about from @DarMM is not referred to by Englert. Also, I consider some of Englert's comments on the Bell theorem to be potentially misleading, even if they are not exactly wrong, so I would not recommend his article without hesitation. I would recommend Cohen-Tanoudji, Diu and Laloe's textbook as an example of giving the postulates using pure states, and the textbook by Nielsen and Chuang as an example of giving the postulates using density operators. If the elementary postulates with pure states are unsatisfactory, rather than worrying about pure states, I would point more to the less general notion of measurement usually used for them, compared to the more general notion given in Nielsen and Chuang's textbook (even then the elementary notion can motivate the general notion, especially for measurements whose outcomes are discrete).

A. Neumaier · Aug 10, 2019

DarMM said:

I'm not sure I understand this. I understand the mathematics but I'm not sure what the effect is on an information theoretic reading of the entropy. Could you be more explicit.

Most textbooks on Quantum Information theory tend to define entropy in terms of knowledge, so I'm not sure what the disagreement is here.

They define it by analogy to the classical case, following tradition. But they never make use of it. They need the concept of entropy, once they have it they forget about knowledge. It is only a play with words and cannot be made consistent.

In fact the only concise definition of the term knowledge in this context is to equate it with an entropy difference. The latter is well-defined, the former is vague if not defined in this way.

An entangled pure state lacks precisely as much knowledge about predicting the outcomes of up measurements as the mixture obtained by taking its diagonal.

vanhees71 · Aug 10, 2019

A. Neumaier said:

I know this story but it is not consistent!

If I measure the up operator on a large sample prepared in state rho, the relevant entropy is that of the diagonal of rho and not that of the eigenvalues of rho. But the latter figures in statistical mechanics!

They agree only in the classical case.

Thus the association with knowledge is justified only in the classical case.

You are contradicting yourself, and I'm sure that is by some misunderstanding, because the math is indeed utterly simple and is taught in the 1st semester in the linear-algebra lecture: The trace of a matrix is independent of the basis used to calculate it. Thus it's not only a property of the matrix but a property of the basis-independent linear map on the Hilbert space (since the math generalizes from finite-dimensional vector spaces to Hilbert spaces with the usual care you as a mathematician take much more seriously than physicists usually do, but as my math professor ironically said: "The separable Hilbert space is so good-natured that it almost always behaves as if it were a finite-dimensional complex vector space, and that's why the physicists are lucky to get almost always through with their non-rigorous manipulations.")

It's only that, if you have the eigenvectors ##|p_k \rangle## and the eigenvalues ##p_k## of the statistiscal operator that you can most simply calculate traces of functions of this operator,
$$\mathrm{Tr} f(\hat{\rho})=\sum_k p_k f(p_k).$$
Now set ##f(x)=-\ln x##, and it applies to the entropy.

microsansfil · Aug 10, 2019

vanhees71 said:

The entropy is rather a measure for the missing information relative to complete knowledge.

In the context of information theory it is defined the notion of proper information which is interpreted as
a “surprise” function. It is a measure of how surprised we will be to find that a random variable has value x.

The proper information of x must be a function of its probability: I(x) = f(p(x)) with p(x) probability of the event x. The information specific to the event x (proper information) is defined by I(x) = −log p(x). For binary messages, log base 2 is used.

Shannon's entropy of a probability space X is defined by :

And thus it is an expected value of I(x) on the set of events x.

/Patrick

microsansfil · Aug 10, 2019

Hi,

Entropy is a concept that is difficult to capture and therefore to interpret. There are different notions of entropy :

- Thermodynamic entropy: S = k log W
- Tsallis Entropy : It is a generalization of Thermodynamic/Boltzmann-Gibbs entropy given earlier. For a parameter q also called entropic index, it is defines as:

In the limit q → 1, it gives the usual Boltzmann-Gibbs entropy. For continuous probability distribution p(x), it takes the following form:

- The von Neumann entropy is its quantum version and is defined by : H(X) = Tr (ρ.lnρ)Here, ρ is the density matrix

In the context of the classical information theory :
- Shannon entropy : H(X) = ##\sum_x p(x) log_2(p(x)).##
- Alfred Renyi (1961) generalized the Shannon definition of information entropy. He searched for the most general case compatible with axioms of probability and applicable to information. For a discrete random variable X = {x1, x2, … ,xN} with different probabilities {p1, p2, …, pN}, the Renyi entropy is defined as:

for α > 0, α ≠ 1. Shannon entropy is its special case in the α → 1 limit, ##H_s(X) = H_1(X)##

- The α = 2 value of Renyi entropy is known also as collision entropy :

- The α → ∞ value of Renyi entropy is known also as min-entropy. It is the smallest entropy measure out of all Renyi entropies, given by: ##H (X) = - log_2 max p_i##SHARMA-MITTAL ENTROPY ( https://arxiv.org/abs/cond-mat/0703277 ) : It unifies both thermodynamic and information entropies in a single expression by using two parameters :

This expression has the following limits: (i) β → 1 gives Renyi entropy, (ii) β→ α, gives Tsallis entropy, and (iii) β → 1, α → 1 gives Shannon entropy.

/Patrick-

Demystifier · Aug 10, 2019

A non-zero entropy can be associated even with a pure state. Let
$$|\psi\rangle=\sum_k |k\rangle\langle k|\psi\rangle$$
be a pure state expanded in the basis ##\{|k\rangle\}##. The corresponding probability ##p_k=|\langle k|\psi\rangle|^2## defines the entropy
$$S=-\sum_k p_k {\rm ln}\,p_k$$
Unlike the von Neumann entropy, this entropy depends on the basis. It makes physical sense if the basis is chosen such that it corresponds to the basis in which the observations are performed.

A. Neumaier · Aug 10, 2019

Demystifier said:

A non-zero entropy can be associated even with a pure state. Let
$$|\psi\rangle=\sum_k |k\rangle\langle k|\psi\rangle$$
be a pure state expanded in the basis ##\{|k\rangle\}##. The corresponding probability ##p_k=|\langle k|\psi\rangle|^2## defines the entropy
$$S=-\sum_k p_k {\rm ln}\,p_k$$
Unlike the von Neumann entropy, this entropy depends on the basis. It makes physical sense if the basis is chosen such that it corresponds to the basis in which the observations are performed.

Yes, and this is the only entropy that can be interpreted in terms of surprise or gaeof knowledge about outcomes.

vanhees71 · Aug 10, 2019

Well, you can define all kinds of entropy. It depends on the context which entropy you use. Here you use it to a different question and thus defining it relative to a different notion of "complete knowledge".

Here you take an observable represented by the operator
$$\hat{K}=\sum_k k |\langle k \rangle \langle k |.$$
Your random experiment here is the measurement of the observable ##K## on a system prepared in a pure state represented by ##\hat{\rho}=|\psi \rangle \langle \psi|##. This defines the probabilities for finding a given value ##k## as
$$p_k=|\langle k|\psi \rangle|^2.$$
Thus you define for this random experiment complete knowledge to mean knowledge of the value ##k##. That's why in general the entropy for this random experiment is
$$\tilde{S}_k=-\sum_k p_k \ln p_k,$$
which is indeed ##0## if and only if only one ##p_k=1## and all others 0, i.e., if the state is prepared in the pure state ##\hat{\rho}_k=|k \rangle \langle k|##. Thus the entropy principle is applied correctly to this random experiment.

It's of course not the von Neumann entropy, which defines another entropy for another "random experiment". It does not ask for the specific value of a specific observable (or a set of observables) but it asks the question of "state determination", i.e., it considers all possible quantum states ##\hat{\rho}## (i.e., pure and mixed).
Here the question is what are the states of complete knowledge of the system as a whole, not specified knowledge about the value of an observable. It's important to get this clear, because that's what really distinguishes the notion of state in the classical sense (complete knowledge means to know the always determined values of all possible observables) and of state in the quantum sense, where complete knowledge means that the system is prepared in a pure state.

This is equivalent to have determined values for some complete set of compatible observables. That's what makes QT really distinct from classical physics: The determination of the values of two observables can be impossible, i.e., in general one cannot prepare the stystem in a way that both observable take determined values. Thus one has to define the compatibility of observables: Two observables ##A## and ##B## are compatible iff you can, for any value ##A## can take (one of the eigenvalues ##a## of the operator ##\hat{A}## representing the observable) and any value ##B## can take, there's at least one state, where ##A## and ##B## take with certainty the given possible values ##a## and ##b##.

It's much simpler to state the math, but it follows from the above physical definition: Observables are compatible if there's a complete set of common eigenvectors of their corresponding operators, which is equivalent to that these operators commute.

A complete set of compatible observables is one that determines their common eigenvectors uniquely (up to a phase factor of course).

That's why von Neumann entropy is the right "measure of missing information" (I found the above state as a measure for the "surprise" in getting some result from a random experiment also a nice metaphor to get an intuitive idea about entropy as information measure) in this question of what's the most complete information you can have for a quantum system in general. It turns out that these are the pure states, corresponding to the determination of the values of a complete set of compatible observables when the system is prepared in such a state.

The above example is of course not an exception. It just specifies in more detail, what you are interested in, namely in a given specific observable. If the spectrum is non-degenerate it's a complete set and saying to have prepared the system such that the observable takes a definite value is equivalent to preparing it in the corresponding pure state (i.e., the stat. op. is the projector defined by the eigenvector to this eigenvalue).

If the spectrum is degenerate, just knowing that the observable takes a determined value ##a##, is incomplete knowledge. There are then many states describing the situation, and the question is, what's the best guess for it. Here, the maximum-entropy principle helps: One assigns that state operator to the situation that takes into account the information you have, but no further bias or prejudice.

Now there's an orthonormal set ##|a,\alpha \rangle## of eigenvectors to the eigenvalue ##a## of the operator ##\hat{A}##. Since it's known that the value of ##A## is with certainty ##a##, the probability for finding any other value ##a' \neq a## must vanish, i.e., ##\langle a',\alpha|\hat{\rho}|a',\alpha =0##. Since this is valid for any linear combination of vectors spanned by the ##|a',\alpha \rangle## with ##a'\neq a##, the statistical operator must be of the form
$$\hat{\rho}=\sum_{\alpha_1,\alpha_2} p_{\alpha_1 \alpha_2} |a,\alpha_1 \rangle \langle |a,\alpha_2 \rangle.$$
Since the matrix elements with respect to all ##|a',\alpha \rangle## with ##a \neq a'## thus vanish, the eigenvectors of ##\hat{\rho}## must be in the eigenspace ##\text{Eig}(a,\alpha)## of ##\hat{A}##, and since ##\hat{A} \hat{\rho}=a \hat{\rho}, \quad \hat{\rho} \hat{A}=\hat{\rho} \hat{A}^{\dagger}=a \hat{\rho}## we have ##[\hat{A},\hat{\rho}]=0## and thus there's a common set of eigenvectors of ##\hat{A}## and ##\hat{\rho}##, and this must span thus ##\text{Eig}(a,\alpha)##. We can assume that the ##|a,\alpha \rangle## are those common eigenvectors, and thus the stat. op. simplifies to
$$\hat{\rho}=\sum_{\alpha} p_{\alpha} |a,\alpha \rangle \langle a,\alpha|.$$
The entropy
$$S=-\sum_{\alpha} p_{\alpha} \ln p_{\alpha}$$
must be maximized under the contraint ##\sum_{\alpha} p_{\alpha}=1##, i.e., with the Lagrange parameter for this constraint we find
$$\delta [S+\Omega(\sum_{\alpha} p_\alpha -1)]=0,$$
where now the ##p_{\alpha}## can be varied independently. Thus we have
$$\sum_{\alpha} \delta p_{\alpha} (-\ln p_{\alpha}-1+\Omega)=0$$
leading to
$$\ln p_{\alpha} =\Omega-1=\text{const}.$$
Thus all the ##p_{\alpha}## must be equal, i.e., if the eigenvalue ##a## is ##d_a## fold degenerate, you must have
$$p_{\alpha}=\frac{1}{d_{\alpha}},$$
and finally
$$\hat{\rho}=\frac{1}{d_{\alpha}} \sum_{\alpha=1}^{d_{\alpha}} |a,\alpha \rangle \langle a,\alpha|.$$
For the special case that the eigenvalue is not degenerate, i.e., ##d_{\alpha}=1## you are again back at the corresponding pure state as discussed above.

Morbert · Aug 10, 2019

A. Neumaier said:

Yes, and this is the only entropy that can be interpreted in terms of surprise or gaeof knowledge about outcomes.

A. Neumaier said:

If I measure the up operator on a large sample prepared in state rho, the relevant entropy is that of the diagonal of rho and not that of the eigenvalues of rho. But the latter figures in statistical mechanics!

Since the diagonals of rho and the eigenvalues of rho are the same when rho is expressed in its eigenbasis, and this eigenbasis corresponds to a measurement for which knowledge of possible outcomes is maximal, could we interpret von Neumann entropy in terms of the measurement for which knowledge of possible outcomes is maximal? We would expect the trace to be preserved under rotation into different measurement bases

DarMM · Aug 10, 2019

A. Neumaier said:

They define it by analogy to the classical case, following tradition. But they never make use of it. They need the concept of entropy, once they have it they forget about knowledge. It is only a play with words and cannot be made consistent.

In fact the only concise definition of the term knowledge in this context is to equate it with an entropy difference. The latter is well-defined, the former is vague if not defined in this way.

An entangled pure state lacks precisely as much knowledge about predicting the outcomes of up measurements as the mixture obtained by taking its diagonal.

I still don't really see the issue with thinking of it in terms of knowledge or surprise. I mean pick out a context and the resultant Gelfand representation makes it equivalent to the classical case in that context. Hence directly it seems to me that you can think of quantum entropy in similar terms to classical entropy, i.e. how informative/surprising a given measurement outcome is.

Perhaps it would be easier to ask what is entropy in QM then in your view?

DarMM · Aug 10, 2019

A. Neumaier said:

An entangled pure state lacks precisely as much knowledge about predicting the outcomes of up measurements as the mixture obtained by taking its diagonal

Does it? Consider measurements in the Bell basis. The induced sample space for the Bell basis context has a probability distribution with a wider spread.

True there are basis, e.g. ##\{|\uparrow\uparrow\rangle ,|\downarrow\uparrow\rangle ,|\uparrow\downarrow\rangle , |\downarrow\downarrow\rangle\}##, where the spread is the same and thus outcomes are equally informative. However there are basis where those for the mixture are more informative.

microsansfil · Aug 10, 2019

DarMM said:

Hence directly it seems to me that you can think of quantum entropy in similar terms to classical entropy, i.e. how informative/surprising a given measurement outcome is.

According to Mark M. Wilde, it's not that immediate.

/Patrick

What is really that density matrix in QM?

Similar threads

Undergrad One does not “prove” the basic principles of Quantum Mechanics

Undergrad Question about discussions around quantum interpretations

Undergrad Physicists disagree wildly on what quantum mechanics says about real…

Undergrad Does Time-Symmetry Imply Retrocausality? How does the Quantum World Say “Maybe”?

Undergrad Phase and group velocity for the wave function

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers