# What is really that density matrix in QM?

• I

## Summary:

What are the basic assumptions of QM about the density matrix?
Summary: What are the basic assumptions of QM about the density matrix?

The subject of density matrix in quantum mechanics is very unclear to me.
In the books I read (for example Sakurai),they don't tell what are the basic assumptions and how you derive from them the results of the density matrix.
Is the basic assumption of the Copenhagen interpretation* of QM that there exists a wave function of the system, and we just don't know it, because of that we use the density matrix as an statistical method? And if so, how is it formally derived from the laws of QM.
Is the density matrix formulation is necessary to understand results from real experiments, or is it possible to do it without it.

Copenhagen interpretation* = regular QM

Related Quantum Interpretations and Foundations News on Phys.org
DarMM
Gold Member
Summary: What are the basic assumptions of QM about the density matrix?

Is the basic assumption of the Copenhagen interpretation* of QM that there exists a wave function of the system, and we just don't know it, because of that we use the density matrix as an statistical method? And if so, how is it formally derived from the laws of QM.
You'll see that in some books but it's not really valid. More recent introductory books (e.g. Englert's Lectures on Quantum Mechanics) don't describe it this way.

In truth a density matrix are just the most general form of quantum states, not ignorance of a pure state. A pure state is the special case of maximal knowledge, i.e. zero entropy.

bhobba and orisomech
A. Neumaier
2019 Award
In truth a density matrix are just the most general form of quantum states, not ignorance of a pure state. A pure state is the special case of maximal knowledge, i.e. zero entropy.
The first half of the second sentence seems to contradict the first sentence. If the density operator is not about ignorance, the pure state cannot be deemed to be about maximal knowledge. Thus:

A density operator is the most general form of a quantum state, not ignorance of a pure state. A pure state is the special case of zero entropy.

Demystifier
DarMM
Gold Member
The first half of the second sentence seems to contradict the first sentence. If the density operator is not about ignorance, the pure state cannot be deemed to have maximal knowledge
I don't understand. Lower entropy means higher knowledge, i.e. greater predictive ability right?

You'll see that in some books but it's not really valid. More recent introductory books (e.g. Englert's Lectures on Quantum Mechanics) don't describe it this way.
So can you derive this density matrix formalism or is it new axiom of QM?

As an operator for a pure state:
ρ=|ψ><ψ|
As matrix elements:
ρij=<i|ψ><ψ|j>

For a mixture:
ρ=∑|ψi>pi<ψi|

Last edited:
A. Neumaier
2019 Award
I don't understand. Lower entropy means higher knowledge, i.e. greater predictive ability right?
No. Quantum entropy is unrelated to knowledge.

If you know everything about the state of a quantum system it means that you know its state exactly. This state can be any pure or mixed state. You cannot know more!

No. Quantum entropy is unrelated to knowledge.

If you know everything about the state of a quantum system it means that you know its state exactly. This state can be any pure or mixed state. You cannot know more!
But in a mixture you don't know the state exactly.

A. Neumaier
2019 Award
But in a mixture you.don't know the state exactly.
According to many textbooks, you would be right. But DarMM was referring to the modern view, see Englert's paper.

According to many textbooks, you would be right. But DarMM was referring to the modern view, see Englert's paper.
Unfortunately it has a paywall.

vanhees71
Gold Member
2019 Award
The standard interpretation says that in all generality the state is described by a statistical operator (I'd not call it density matrix, which is only the special case that you work in the position representation).

The statistical operator has the following properties

Mathematical
----------------------
(a) it's a self-adjoint positive semidefinite operator
(b) its trace is 1

In addition there are self-adjoint operators representing observables. The observables can take values given by the eigenvalues of the operator representing this observable.

A self-adjoint operator also defines a complete set of eigenvectors (in the somewhat sloppy sense of physicists, also including "generalized eigenvectors" if there are continuous eigenvalues like for position and momentum operators).

Physics
-----------

A state, represented by a statistical operator, describes how a system is initially prepared, before you make a measurement.

Then measuring an observable ##A##, represented by a self-adjoint operator ##\hat{A}## with eigen values ##a## with orthonormalized eigenstates ##|a,\alpha \rangle##, where ##\alpha## is some parameter(s) labeling the different eigenvectors to the eigenvalue ##a##.

Then the meaning of the statistical operator (and according to the minimal statistical interpretation, which is a flavor of the family of Copenhagen interpretations) is that the probability to get the value ##a_k## when measuring the observalbe ##A## on a system prepared in the state described by the statistical operator ##\hat{\rho}## is
$$P(a|\hat{\rho})=\sum_{\alpha} \langle a,\alpha|\hat{\rho}|a,\alpha \rangle.$$

Now ##\hat{\rho}## as a self-adjoint operator has itself a complete set of orthonormalized eigenvectors ##|p_k \rangle##.

Because ##\hat{\rho}## is positive semidefinite, you have
$$p_k=\langle p_k|\hat{\rho}|p_k \rangle \geq 0.$$
and we have defined that
$$\mathrm{Tr} \rho = \sum_k \langle a_k|\hat{\rho}|a_k\rangle = \sum_k p_k=1.$$
There's one "extreme case", namely that exactly one eigenvalue of ##\hat{\rho}## is 1 and all others then must necessarily vanish. So say ##p_1=1## and ##p_k=0## for ##k \in \{2,3,\ldots \}##.

Due to the completeness of the eigenvectors we then have
$$\hat{\rho}=\sum_k p_k |p_k \rangle \langle p_k|=|p_1 \rangle \langle p_1|.$$
In this case the probability for finding the value ##a## when measuring the observable ##A## is
$$P(a|\hat{\rho})=\sum_{\alpha} \langle a,\alpha|p_1 \rangle \langle p_1|a,\alpha \rangle=\sum_{\alpha} |\langle a,\alpha|p_1 \rangle|^2.$$
Now ##\psi(a,\alpha):=\langle a,\alpha|p_1 \rangle## is nothing else than the wave function in the ##A## representation, and the general formalism with the statistical operator gets you also the special case when you describe the state in terms of a wave function.

It should be clear that the pure states are very special. They are the most detailed kind of states you can have about the quantum system, i.e., it provides the most complete description of the state the system is in. One thus also says this special case provides a "pure state".

Now in this threads also some more advanced things have been stated. One is entropy. That's a very important concept. You may have heard about entropy as a thermodynamical quantity appearing in the 1st and 2nd fundamental law of thermodynamics. The concept is, however, much more general, and the thermodynamical entropy is a special application to equilibrium states.

The entropy is rather a measure for the missing information relative to complete knowledge. This measure of information is by definition a positive quantity, and it was discovered in this general sense by C. Shannon in his analysis of communication via noisy channels. The concept then has been also applied in physics, particularly in quantum theory. Nowadays it's even a quite new and important field of research, called quantum information theory. The upshot is that the measure for the missing information (the entropy), given you know the system is prepared in the state ##\hat{\rho}## is given by
$$S=-\mathrm{Tr}(\hat{\rho} \ln \hat \rho)=-\sum_{k} p_k \ln p_k.$$
If there are values ##p_k=0## in the sum it has to be understood as
$$\lim_{x \rightarrow +0} x \ln x=0.$$
Now if you have a pure state, only one ##p_k=1##. All the others are 0. Since also ##1 \ln 1=0##, this entropy (named after its discoverers the Shannon-von Neumann-Jaynes entropy) this means that the entropy vanishes if and only if the system is prepared in a pure state.

That makes sense: Since entropy is non-negative by definition (note that ##0 \leq p_k \leq 1## and thus ##\ln p_k \leq 0##) it's lowest value is 0, and a statistical operator that has 0 entropy means that it provides complete knowledge in the sense of this entropy measure for missing information. This shows that the Shannon-von Neumann-Jaynes definition of entropy makes sense for QT: According to QT the most complete information you can have about a system is that it is prepared in a pure state, and that's precisely the states for which entropy vanishes.

The information theoretical point of view is also an important way to guess the best statistical operator given some (usually incomplete) information. The best choice would be from a scientific point of view, to choose the statistical description of the situation reflecting the least possible prejudice, i.e., you want to describe the situation based on the (maybe incomplete) information, i.e., you want to choose the very statistical operator that maximizes the entropy under the constraint of which information you have. This describes the statistics such that the probabilities (encoded in the so chosen "maximum-entropy stat. op.") do not falsely mimik more knowledge than you have since the maximum of the entropy ensures that it is the state of maximum missing information under the constraint of what's really known.

@DarMM made an important remark in #2, and that's at the heart of the profound difference between the description of nature comparing this description by quantum vs. classical physics:

In classical physics you use statistical methods only, because you can't have the complete information about the system for some reason. Most profoundly that's clear for macroscopic systems. Within classical mechanics in principle everything is determined: If you know the precise positions and momenta of all particles making up the macroscopic system, you know everything completely. That's the "true state" the system is in according to classical mechanics, but of course in practice it's not possible to accurately now all precise values of the position and momentum of ##10^{24}## particles. Thus you choose some important "macroscopic observables" like the center of mass of the body and its total momentum and maybe some more quantities of this kind, depending on the problem you want to solve. In statistical physics you describe this e.g., by a single-particle phase-space distribution function, giving the phase-space density for finding a particle with momentum ##\vec{p}## at the position ##\vec{x}##. This ignores all kinds of correlations between particles. For that you'd need two-particle, three-particle phase space distribution functions etc. No matter, how detailed you get, you can in practice never know the complete state of the macroscopic body, but it always is in a definite state, i.e., all the possible observables (which are functions of the phase-space variables) take a determined value. In other words the true state is just one point in phase space, but you can't know this because of the complexity of the macroscopic body and thus you use statistics to describe the really important ("relevant") macroscopic observables as statistical averages over the microscopic degrees of freedom.

Now in quantum theory there are situations, where you can have a system in a pure state (providing complete information about it in the quantum sense!) but parts of it are not in a pure state. That's the case if you have entanglement. Thus there are well defined systems like an electron being entangled in both spin and momentum with a positron coming from a decay of a neutral pion at rest; due to the conservation laws fulfilled in electromagnetic transitions the total ##e^+ e^-## system must be in the spin-singlet state and due to momentum conservation, their momenta are back-to-back since the total momentum must stay 0. One can now ask, what's the state of the electron alone, just ignoring the positron. As it turns out the spin-z component of the electron is in its maximum-entropy state, which is simply ##\hat{\rho}_{\text{spin of electron}}=\hat{1}/2##, i.e., it is totally unpolarized. Though the spin-z component of the complete system consisting of electron and the positron is completely determined to be 0, the spin state of the electron is as incomplete as it can get.

By the way: The entropy of the spin state in this case is
$$S=-1/2 \ln(1/2)-1/2 \ln(1/2)=-\ln(1/2)=\ln 2.$$
Thus the "information content" (here completely missing) of a "qu-bit" (i.e., a two-level system) is ##\ln 2##.

bhobba, microsansfil, dextercioby and 2 others
A. Neumaier
2019 Award
The entropy is rather a measure for the missing information relative to complete knowledge.
It is this only in the very special case where you can interpret the system as being a classical mixture of pure eigenstates!

In general, the classical analogy does not make sense at all: When you prepare an unpolarized beam of light its state is not pure, but no information about the state is missing!

vanhees71
Gold Member
2019 Award
As I stressed, in information theory entropy provides a measure for missing information relative to complete knowledge, i.e., ##S=\ln 2## for the unpolarized-photon state ##\hat{\rho}_{\text{unpol}}=\hat{1}/2## is relative to the case of complete knowledge about the photon's polarization, i.e., relative to the case if it were prepared in a pure state.

Of course for the very photon (or rather the ensemble of photons) prepared in the state ##\hat{\rho}_{\text{unpol}}## no information is missing. That was what was stressed in #2 (as I said also above).

DarMM
DarMM
Gold Member
So can you derive this density matrix formalism or is it new axiom of QM?
A state ##\omega## in QM will assign an expectation value to each operator ##\omega\left(A\right)## with ##A## some operator. You can prove that given the algebra of operators you have in QM all states can be represented by a statistical operator, i.e. ##\omega\left(A\right) = Tr\left(\rho A\right)##

So it can be derived. However it's not derived from the pure states in any sense if that is what you mean.

vanhees71, orisomech and dextercioby
DarMM
Gold Member
If you know everything about the state of a quantum system it means that you know its state exactly. This state can be any pure or mixed state. You cannot know more!
As @vanhees71 is saying this is knowledge in the information theoretic sense.

So even for a classical information source it might be characterised by ##p_i## for each of it's ##i## outcomes and the entropy measures how informative each outcome is. A classical pure state is one with vanishing entropy as an outcome is completely uninformative, because you know exactly what the outcome will be in advance. Thus you have maximum knowledge of sequences of outcomes/ensembles of outcomes. It's not reflecting lack of knowledge of the state.

So when we say that a mixed state does not have maximum knowledge we do not mean we lack knowledge of the true state, it means lack of ability to guess at outcomes relative to the pure case. Thus each outcome is more informative in a mixed state than in a pure state, as reflected in the non-zero entropy.

vanhees71
A. Neumaier
2019 Award
As @vanhees71 is saying this is knowledge in the information theoretic sense.

So even for a classical information source it might be characterised by ##p_i## for each of it's ##i## outcomes and the entropy measures how informative each outcome is. A classical pure state is one with vanishing entropy as an outcome is completely uninformative, because you know exactly what the outcome will be in advance. Thus you have maximum knowledge of sequences of outcomes/ensembles of outcomes. It's not reflecting lack of knowledge of the state.

So when we say that a mixed state does not have maximum knowledge we do not mean we lack knowledge of the true state, it means lack of ability to guess at outcomes relative to the pure case. Thus each outcome is more informative in a mixed state than in a pure state, as reflected in the non-zero entropy.
I know this story but it is not consistent!

If I measure the up operator on a large sample prepared in state rho, the relevant entropy is that of the diagonal of rho and not that of the eigenvalues of rho. But the latter figures in statistical mechanics!

They agree only in the classical case.

Thus the association with knowledge is justified only in the classical case.

DarMM
Gold Member
If I measure the up operator on a large sample prepared in state rho, the relevant entropy is that of the diagonal of rho and not that of the eigenvalues of rho. But the latter figures in statistical mechanics!
I'm not sure I understand this. I understand the mathematics but I'm not sure what the effect is on an information theoretic reading of the entropy. Could you be more explicit.

Most textbooks on Quantum Information theory tend to define entropy in terms of knowledge, so I'm not sure what the disagreement is here.

vanhees71
atyy
Summary: What are the basic assumptions of QM about the density matrix?

The subject of density matrix in quantum mechanics is very unclear to me.
In the books I read (for example Sakurai),they don't tell what are the basic assumptions and how you derive from them the results of the density matrix.
Is the basic assumption of the Copenhagen interpretation* of QM that there exists a wave function of the system, and we just don't know it, because of that we use the density matrix as an statistical method? And if so, how is it formally derived from the laws of QM.
Is the density matrix formulation is necessary to understand results from real experiments, or is it possible to do it without it.

Copenhagen interpretation* = regular QM
There are two types of density matrices. Both cases can be seen as applications of the Born rule for pure states.
(1) "Proper" mixtures: Assume each system is in a pure state, but the preparation procedure prepares a mixture of pure states, ie. sometimes the preparation procedure prepares a particular pure state, other times the procedure prepares another pure state. In this case probabilities for measurement outcome will be given by the Born rule for each pure state and the probability with which each pure state is prepared. The density matrix is a formalism that calculates the probabilities of measurement outcomes for the mixture, can be considered a generalization of the Born rule for pure states to mixtures.
(2) "Improper" mixtures: Assume a system is in a pure state, but you only make a measurement on a subsystem. The Born rule for pure states applies to the whole system, and allows us to derive rules in which only the subsystem is explicitly considered in order to calculate the probabilities of measurement outcomes. The derived rules involve a density matrix. In this case called an improper mixture, the density matrix is also called the reduced density matrix.

The above can be found in many textbooks, but sometimes the two types of density matrices are described separately. An article discussing both is (sorry, I think it is not freely available):

The role of the density operator in the statistical description of quantum systems
American Journal of Physics 75, 1162 (2007)
https://doi.org/10.1119/1.2785194
Notes by Bertlemann about the density matrix in proper mixtures:
https://homepage.univie.ac.at/reinhold.bertlmann/pdfs/T2_Skript_Ch_9corr.pdf
Notes from a course run by Stefan Kröll, Peter Samuelsson, Andreas Walther about the density matrix matrix for proper and improper mixtures:
http://www.matfys.lth.se/education/quantinfo/QIlect2.pdf
In all elementary quantum mechanics, one can take pure states to be the basis of quantum mechanics. In very, very advanced cases that I have only just learned about from @DarMM (see his post above) and that I don't have much understanding of, there may not be a pure state. However, what I have written above will be ok for lots of things (eg. Weinberg's QM text still gives the postulates using pure states). Edit: I have just read the article by Englert linked to by @A. Neumaier , and nothing in there contradicts the elementary view I have given. In Copenhagen, the pure state is already an as-if reality, so there is no problem with as-if-as-if-realities. The case of pure states not existing that I learned about from @DarMM is not referred to by Englert. Also, I consider some of Englert's comments on the Bell theorem to be potentially misleading, even if they are not exactly wrong, so I would not recommend his article without hesitation. I would recommend Cohen-Tanoudji, Diu and Laloe's textbook as an example of giving the postulates using pure states, and the textbook by Nielsen and Chuang as an example of giving the postulates using density operators. If the elementary postulates with pure states are unsatisfactory, rather than worrying about pure states, I would point more to the less general notion of measurement usually used for them, compared to the more general notion given in Nielsen and Chuang's textbook (even then the elementary notion can motivate the general notion, especially for measurements whose outcomes are discrete).

Last edited:
orisomech, Demystifier, dextercioby and 1 other person
A. Neumaier
2019 Award
I'm not sure I understand this. I understand the mathematics but I'm not sure what the effect is on an information theoretic reading of the entropy. Could you be more explicit.

Most textbooks on Quantum Information theory tend to define entropy in terms of knowledge, so I'm not sure what the disagreement is here.
They define it by analogy to the classical case, following tradition. But they never make use of it. They need the concept of entropy, once they have it they forget about knowledge. It is only a play with words and cannot be made consistent.

In fact the only concise definition of the term knowledge in this context is to equate it with an entropy difference. The latter is well-defined, the former is vague if not defined in this way.

An entangled pure state lacks precisely as much knowledge about predicting the outcomes of up measurements as the mixture obtained by taking its diagonal.

vanhees71
Gold Member
2019 Award
I know this story but it is not consistent!

If I measure the up operator on a large sample prepared in state rho, the relevant entropy is that of the diagonal of rho and not that of the eigenvalues of rho. But the latter figures in statistical mechanics!

They agree only in the classical case.

Thus the association with knowledge is justified only in the classical case.
You are contradicting yourself, and I'm sure that is by some misunderstanding, because the math is indeed utterly simple and is taught in the 1st semester in the linear-algebra lecture: The trace of a matrix is independent of the basis used to calculate it. Thus it's not only a property of the matrix but a property of the basis-independent linear map on the Hilbert space (since the math generalizes from finite-dimensional vector spaces to Hilbert spaces with the usual care you as a mathematician take much more seriously than physicists usually do, but as my math professor ironically said: "The separable Hilbert space is so good-natured that it almost always behaves as if it were a finite-dimensional complex vector space, and that's why the physicists are lucky to get almost always through with their non-rigorous manipulations.")

It's only that, if you have the eigenvectors ##|p_k \rangle## and the eigenvalues ##p_k## of the statistiscal operator that you can most simply calculate traces of functions of this operator,
$$\mathrm{Tr} f(\hat{\rho})=\sum_k p_k f(p_k).$$
Now set ##f(x)=-\ln x##, and it applies to the entropy.

The entropy is rather a measure for the missing information relative to complete knowledge.
In the context of information theory it is defined the notion of proper information which is interpreted as
a “surprise” function. It is a measure of how surprised we will be to find that a random variable has value x.

The proper information of x must be a function of its probability: I(x) = f(p(x)) with p(x) probability of the event x. The information specific to the event x (proper information) is defined by I(x) = −log p(x). For binary messages, log base 2 is used.

Shannon's entropy of a probability space X is defined by :

And thus it is an expected value of I(x) on the set of events x.

/Patrick

Last edited:
vanhees71
Hi,

Entropy is a concept that is difficult to capture and therefore to interpret. There are different notions of entropy :

- Thermodynamic entropy: S = k log W
- Tsallis Entropy : It is a generalization of Thermodynamic/Boltzmann-Gibbs entropy given earlier. For a parameter q also called entropic index, it is defines as:

In the limit q → 1, it gives the usual Boltzmann-Gibbs entropy. For continuous probability distribution p(x), it takes the following form:

- The von Neumann entropy is its quantum version and is defined by : H(X) = Tr (ρ.lnρ)Here, ρ is the density matrix

In the context of the classical information theory :
- Shannon entropy : H(X) = ##\sum_x p(x) log_2(p(x)).##
- Alfred Renyi (1961) generalized the Shannon definition of information entropy. He searched for the most general case compatible with axioms of probability and applicable to information. For a discrete random variable X = {x1, x2, … ,xN} with different probabilities {p1, p2, …, pN}, the Renyi entropy is defined as:

for α > 0, α ≠ 1. Shannon entropy is its special case in the α → 1 limit, ##H_s(X) = H_1(X)##

- The α = 2 value of Renyi entropy is known also as collision entropy :

- The α → ∞ value of Renyi entropy is known also as min-entropy. It is the smallest entropy measure out of all Renyi entropies, given by: ##H (X) = - log_2 max p_i##

SHARMA-MITTAL ENTROPY ( https://arxiv.org/abs/cond-mat/0703277 ) : It unifies both thermodynamic and information entropies in a single expression by using two parameters :

This expression has the following limits: (i) β → 1 gives Renyi entropy, (ii) β→ α, gives Tsallis entropy, and (iii) β → 1, α → 1 gives Shannon entropy.

/Patrick

-

Last edited:
dextercioby
Demystifier
Gold Member
A non-zero entropy can be associated even with a pure state. Let
$$|\psi\rangle=\sum_k |k\rangle\langle k|\psi\rangle$$
be a pure state expanded in the basis ##\{|k\rangle\}##. The corresponding probability ##p_k=|\langle k|\psi\rangle|^2## defines the entropy
$$S=-\sum_k p_k {\rm ln}\,p_k$$
Unlike the von Neumann entropy, this entropy depends on the basis. It makes physical sense if the basis is chosen such that it corresponds to the basis in which the observations are performed.

A. Neumaier
2019 Award
A non-zero entropy can be associated even with a pure state. Let
$$|\psi\rangle=\sum_k |k\rangle\langle k|\psi\rangle$$
be a pure state expanded in the basis ##\{|k\rangle\}##. The corresponding probability ##p_k=|\langle k|\psi\rangle|^2## defines the entropy
$$S=-\sum_k p_k {\rm ln}\,p_k$$
Unlike the von Neumann entropy, this entropy depends on the basis. It makes physical sense if the basis is chosen such that it corresponds to the basis in which the observations are performed.
Yes, and this is the only entropy that can be interpreted in terms of surprise or gaeof knowledge about outcomes.

Last edited:
Demystifier