# I Boltzmann with degenerate levels

1. Feb 1, 2017

### jostpuur

Suppose we have some model for some system, and that model has given us a sequence $\mathcal{E}_1,\mathcal{E}_2,\mathcal{E}_3,\ldots$, whose values are interpreted as the energy levels of the system. Denoting the energy levels slightly redundantly for future modification soon below, we state that the energy levels are

$E_1=\mathcal{E}_1$
$E_2=\mathcal{E}_2$
$E_3=\mathcal{E}_3$
$\vdots$

The probabilities defined by the Boltzmann distribution under a temperature $T$ will be

$p(1) = \frac{1}{Z(T)} e^{-\frac{\mathcal{E}_1}{T}}$
$p(2) = \frac{1}{Z(T)} e^{-\frac{\mathcal{E}_2}{T}}$
$p(3) = \frac{1}{Z(T)} e^{-\frac{\mathcal{E}_3}{T}}$
$\vdots$

where the partition function is
$Z(T) = e^{-\frac{\mathcal{E}_1}{T}} + e^{-\frac{\mathcal{E}_2}{T}} + e^{-\frac{\mathcal{E}_3}{T}}+ \cdots$

Suppose we find out that the model was only an approximation of a more accurate model, and according to the new more accurate model the energy values are going to be $\mathcal{E}_n$ and $\mathcal{E}_{2n}+\epsilon$ with some small positive epsilon. Now the energy levels are

$E_1 = \mathcal{E}_1$
$E_2 = \mathcal{E}_2$
$E_3 = \mathcal{E}_2+ \epsilon$
$E_4 = \mathcal{E}_3$
$E_5 = \mathcal{E}_4$
$E_6 = \mathcal{E}_4 + \epsilon$
$E_7 = \mathcal{E}_5$
$\vdots$

Now the probabilities defined by

$p(n) = \frac{1}{Z(T)}e^{-\frac{E_n}{T}}$

turn out to be

$p(1) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_1}{T}}$
$p(2) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_2}{T}}$
$p(3) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_2+\epsilon}{T}}$
$p(4) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_3}{T}}$
$p(5) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_4}{T}}$
$p(6) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_4+\epsilon}{T}}$
$p(7) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_5}{T}}$
$\vdots$

where the partition function is

$Z(T) = e^{-\frac{\mathcal{E}_1}{T}} + e^{-\frac{\mathcal{E}_2}{T}} + e^{-\frac{\mathcal{E}_2 + \epsilon}{T}} + e^{-\frac{\mathcal{E}_3}{T}} + e^{-\frac{\mathcal{E}_4}{T}} + e^{-\frac{\mathcal{E}_4 + \epsilon}{T}} + e^{-\frac{\mathcal{E}_5}{T}} + \cdots$

Suppose we decide that the epsilon is so small that it has not much significance, and we might as well simplify the formulas by taking the limit $\epsilon\to 0$. This limit is going to give us a new probability distribution

$p(1) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_1}{T}}$
$p(2) = \frac{2}{Z(T)}e^{-\frac{\mathcal{E}_2}{T}}$
$p(3) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_3}{T}}$
$p(4) = \frac{2}{Z(T)}e^{-\frac{\mathcal{E}_4}{T}}$
$p(5) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_5}{T}}$
$\vdots$

where the partition function is

$Z(T) = e^{-\frac{\mathcal{E}_1}{T}} + 2e^{-\frac{\mathcal{E}_2}{T}} + e^{-\frac{\mathcal{E}_3}{T}} + 2e^{-\frac{\mathcal{E}_4}{T}} + e^{-\frac{\mathcal{E}_5}{T}} + \cdots$

Now we have two different probability distributions for the case $\epsilon = 0$. Is one of them right, and the other one wrong? Which way around would be the right answer?

2. Feb 2, 2017

### stevendaryl

Staff Emeritus
You have to be careful here. The partition function is defined to be:

$Z = \sum_i e^{\frac{-E_i}{kT}}$

where $i$ ranges over all states. It's not a sum over energy eigenvalues, it's a sum over states. Each state makes a contribution, not just states with distinguishable energy levels.

If you want to sum over energy values, as well, then you have to include a degeneracy factor: $Z = \sum_i g_i e^{\frac{-E_i}{kT}}$, where now the sum is over energy levels, and $g_i$ is the number of states with energy level $E_i$.

3. Feb 2, 2017

### jostpuur

I understand the claim, but don't believe. Why would the probability distribution have such uniform background measure over all states?

For example, suppose you have lot of holes on some special table, and suppose little balls are being thrown at that table so that the balls eventually fall through the small holes. The events where the balls hit the holes are going to be random events. If the holes are not uniformly distributed, and if some of the holes are extremely close to each other, they are going to be competing for the same random events, hence reducing their individual chances of getting a hit.

Perhaps the probability distribution I wrote down above for the case $\epsilon > 0$ was wrong, because perhaps the probabilities should have been
$p(1) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_1}{T}}$
$p(2) = \frac{1}{2Z(T)}e^{-\frac{\mathcal{E}_2}{T}}$
$p(3) = \frac{1}{2Z(T)}e^{-\frac{\mathcal{E}_2 + \epsilon}{T}}$
$p(4) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_3}{T}}$
$p(5) = \frac{1}{2Z(T)}e^{-\frac{\mathcal{E}_4}{T}}$
$p(6) = \frac{1}{2Z(T)}e^{-\frac{\mathcal{E}_4 + \epsilon}{T}}$
$p(7) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_5}{T}}$
$\vdots$
where the partition function would be
$Z(T) = e^{-\frac{\mathcal{E}_1}{T}} + \frac{1}{2}e^{-\frac{\mathcal{E}_2}{T}} + \frac{1}{2}e^{-\frac{\mathcal{E}_2+\epsilon}{T}} + e^{-\frac{\mathcal{E}_3}{T}} + \frac{1}{2}e^{-\frac{\mathcal{E}_4}{T}} + \frac{1}{2}e^{-\frac{\mathcal{E}_4+\epsilon}{T}} + e^{-\frac{\mathcal{E}_5}{T}} + \cdots$

Perhaps there should have been factors $\frac{1}{2}$ like this because some of the states are so similar that they are competing for the same random events? Why not like this?

Isn't this what being careful looks like, by the way?

4. Feb 2, 2017

### stevendaryl

Staff Emeritus
Well, you can take it as the definition of a system being at equilibrium that if you fix all the macroscopic conserved quantities--total energy, total momentum, total angular momentum, total charge, total number of particles of each type, etc.--then every microscopic state consistent with those macroscopic properties is equally likely. I don't know if there is a justification for that assumption, other than the principle of indifference: if you don't have any other way of distinguishing states, then they have to be equally likely.

Your example is not relevant to this assumption, because there is no notion of "competition" for states. When you compute the partition function it's for the entire system, not for a single particle. So for your example, specifying the state means specifying the location and momentum of each ball. The constraint that two balls can't occupy the same location could either be imposed by just some states as not allowed, or it could be done in a "soft" way by putting in a short-range repulsive force between balls, so that the energy of the system as a whole shoots up if two balls get too close together.

I'll have to think about your example some more to see if I can model it using statistical mechanics.

5. Feb 2, 2017

### rubi

The density matrix of the canonical ensemble is given by $\rho=e^{-\beta\hat H}$, where $\hat H$ is the Hamiltonian operator. The partition function is given by $Z=\mathrm{Tr}(\rho)$. Assume that $\hat H$ has discrete, but possibly degenerate spectrum. Then it can be diagonalized as $\hat H=\sum_n\sum_{k=1}^{g(n)} E_n\left|\Psi_{nk}\right>\left<\Psi_{nk}\right|$, where $\hat H\left|\Psi_{nk}\right>=E_n\left|\Psi_{nk}\right>$, $k$ labels the degeneracy of $E_n$ (i.e. $1\leq k \leq g(n)=\mathrm{dim}(\mathrm{Eig}(\hat H,E_n))$) and the $\left|\Psi_{nk}\right>$ can be choosen to be orthogonal. We then find $e^{-\beta\hat H}=\sum_n \sum_{k=1}^{g(n)} e^{-\beta E_n} \left|\Psi_{nk}\right>\left<\Psi_{nk}\right|$ (by the spectral theorem) and $Z=\sum_n\sum_{k=1}^{g(n)} e^{-\beta E_n}$. The term $e^{-\beta E_n}$ appears $g(n)$ times in this sum, i.e. $Z=\sum_n g(n) e^{-\beta E_n}$, so each term $e^{-\beta E_n}$ must be weighted by its degeneracy.

6. Feb 2, 2017

### Staff: Mentor

It is a fundamental assumption of equilibrium statistical physics: all accessible microstates are equally probable.

7. Feb 2, 2017

### jostpuur

In classical models (not quantum) that "fundamental assumption" is contradictory and leads to paradoxes, because the microstates often form some kind of continuum, and the only way to apply the Boltzmann's distribution is to first discretize the model somehow. However, there is always multiple ways of discretizing the model, and the different discretizations can lead to different probability distributions for the original continuous model. For this reason I think it is obvious that in general the probability distribution has to be allowed to be proportional to some function $f(n)e^{-\frac{E_n}{T}}$, where $f(n)$ is some "background measure". To me it seems that people have not understood the need for this background measure, because in many examples it is something very uniform and often only a constant. Anyway, if you insist that the probability distribution will be proportional to precisely $e^{-\frac{E_n}{T}}$ and nothing else, it will lead to paradoxes.

8. Feb 2, 2017

### Staff: Mentor

In classical mechanics the definition of states can look a bit arbitrary, and you won't get the correct results e. g. for blackbody radiation. That should not be surprising - blackbody radiation was one of the key observations that lead to the discovery that we do not live in a classical world. With knowledge about quantum mechanics - the more fundamental theory - we also got a deeper motivation for the states in classical physics.

9. Feb 2, 2017

### jostpuur

Can the (often axiomatic) assumption, that all accessible (under some energy constraint) microstates are equally probable, be derived out from a Schrödinger's equation (as some accurate approximation)?

Wouldn't the derivation need some model of the form $H=H_0+\epsilon I$, where the eigenstates (eigenvectors) of $H_0$ would be considered as the microstates (of statistical physics), and the term $I$ would be something that somehow mixes the wave function in a statistical way over macroscopic time intervals?

10. Feb 3, 2017

### Staff: Mentor

Objects don't have to be in thermal equilibrium. You cannot force them to be - the Schrödinger equation doesn't tell you if something is in thermal equilibrium.

11. Feb 3, 2017

### jostpuur

If you assume that the true time evolution comes from Schrödinger's equation and also assume something else, something reasonable that would be related to statistics, the assumptions together might imply Boltzmann's distribution as an accurate approximation. A proper derivation of Boltzmann's distribution should look like something on those lines, since it is the Schrödinger's equation that ultimately produces the true time evolution. I already knew that Schrödinger's equation alone is not going to imply Boltzmann's distribution.

When the Boltzmann's distribution is derived without any use of Schrödinger's equation, some things in the derivation have to be working by accident and good luck.

Related to this topic, I would like to remind you that the assumption that all microstates (accessible under some energy constraint) are equally probable, when the microstates have been identified with the energy eigenstates, contains implicit assumption which severely contradicts quantum mechanics, because according to quantum mechanics the state of a system does not need to be any energy eigenstate. The states can be linear combinations of energy eigenstates. For this reason it is obvious that the use of microstates is supposed to be some kind of approximative model that merely has similar statistical behavior as some more accurate quantum model with proper wave functions. A proper derivation of Boltzmann's distribution should take into account the nature of this approximation.

12. Feb 3, 2017

### stevendaryl

Staff Emeritus
Yes, you're right. The Boltzmann prescription, that at equilibrium, all states with the same energy are equally likely, is strictly speaking only applicable to a system with a finite number of states. To do classical statistical mechanics, people divide phase space into little cells, and use the volume of the cells as the measure of likelihood. The volume in phase space is a particular way of giving a particularly simple "background measure". You could certainly use other measures, and I don't know whether that has been explored, or not.

I'm assuming that when you talk about paradoxes, you're just saying that you can get different results depending on how you divide the continuum into "states" and take the limit? I don't think that there is anything paradoxical about the usual approach of using phase space volume. At least not for nonrelativistic physics. Famously, Planck's attempt to apply statistical mechanics to electromagnetic radiation led to infinities when he tried to take the continuum limit, but he got reasonable results by using a discrete number of states (leading to QM). Maybe there is some sense in which Boltzmann's rule is nonsensical in the continuum limit, which is a hint that the world isn't classical.

Anyway, getting back to the original post, I think it has been answered. Whether you use one "background measure" or another, it will not be the case that $Z$ is computed by summing over energy levels; you have to include a measure $g(E)$ giving the degeneracy (in the case of discrete states) or the measure (in the case of a continuum of states).

13. Feb 3, 2017

### stevendaryl

Staff Emeritus
Yeah, you can't treat every linear combination of eigenstates as different states for statistical purposes, because they are overlapping. The Boltzmann rule requires the notion of "state" to be exclusive: you can't be in two different states simultaneously. It works to just pick a complete orthonormal set of states and do statistics on those, but there might be a more sophisticated treatment that doesn't require first coming up with a basis. It's getting beyond my knowledge at this point.

14. Feb 3, 2017

### jostpuur

I see that there is a standard answer available to my question.

My original question contained the slight ambiguity that I didn't specify whether it was supposed to be in classical or quantum setting. It might affect the answers. For example the quantum mechanical energy states (eigenvectors) will always be orthogonal, no matter how close the energy levels (eigenvalues) are, so the states are perhaps never going to be so similar that they would "compete for the same random events".

15. Feb 3, 2017

### stevendaryl

Staff Emeritus
I don't think that "competing for the same events" really makes sense as a concept. Or maybe I just don't understand what you mean.

16. Feb 3, 2017

### jostpuur

What I meant becomes evident with the following modification to the derivation of the Maxwell's speed distribution in classical setting.

The possible states of gas particles can be parametrized by the momentum vector $\vec{p}$ whose allowed values can be anything from $\mathbb{R}^3$. The continuum brings serious problems, and the allowed values must be discretized. The most obvious choice is to choose some small $\Delta p>0$, and then decide that the allowed values are from $\Delta p\;\mathbb{Z}^3$. If you then decide that probabilities must be proportional to $e^{-\frac{E_{\vec{p}}}{T}}$, and at the end take the limit $\Delta p\to 0$, you get the right Maxwell's speed distribution.

Suppose that for some reason you are given a different discretization: Some discrete set $\Lambda\subset\mathbb{R}^3$, where for example the points are denser close to the origin, and sparser far away from the origin. If you then decide that the probabilities again must be proportional to $e^{-\frac{E_{\vec{p}}}{T}}$ for all $\vec{p}\in\Lambda$, you are going to get a wrong result -- a distorted version of the Maxwell's speed distribution.

This does not necessarily mean that the discretization $\Lambda$ would be wrong. All you have to do is to find nice weights $f(\vec{p})$ which are suitably smaller close to the origin, and larger far away from the origin, and then you have to postulate that the probabilities are going to be proportional to $f(\vec{p})e^{-\frac{E_{\vec{p}}}{T}}$. If the weights are right, you get the correct Maxwell's speed distribution again.

In this case I would say that the points of $\Lambda$, which were denser close to the origin, were "competing for the same random events".

Due to this example, I'm not convinced that the question of finding the "right discretization" would necessarily be the correct question. Equivalently we might think that almost any discretization will be fine, while the real task is going to be finding a way to find the right weights.

17. Feb 3, 2017

### Staff: Mentor

In classical mechanics, that leads to the same answers, but it ruins the feature that classical mechanics is a special case of quantum mechanics. In quantum mechanics it does not work at all.
I don't see any advantage of a system that is more complicated, less motivated by physics and has a smaller range of applications.

18. Feb 3, 2017

### jostpuur

The quantum mechanical derivation of Maxwell's speed distribution is symbolic garbage that produces the right result by accident and good luck. Or at least it looks like it.

Suppose you have a cube of macroscopic size 10m$\times$10m$\times$10m, and lot of gas particles in it. The gas particles are going to obey the Maxwell's speed distribution, and you can derive it by assuming that the gas particles would be occupying the quantum energy eigenstates, which will be spatially spread over the entire macroscopic cube, and which can be written in terms of trigonometric functions. Seriously speaking the gas particles are not going to be spatially spread like that though, because the macroscopic gas wouldn't feel like macroscopic gas like that. The gas particles are probably on such states which can be described by spatially localized wave packets with some sensical momentum expectation values $\langle \vec{p}\rangle$, which will behave as their classical momentums.

Is there any serious reason to believe that quantum mechanics would have anything to do with the Maxwell's speed distribution?

19. Feb 3, 2017

### Staff: Mentor

Is your intent to add new energy levels, or to shift some of the energy levels without adding any? The actual modified levels you posted indicate the former, but your verbal description seems to indicate the latter. Also, your derivation of the different distributions for $\epsilon = 0$ requires the former, but I don't see why different distributions would be a problem unless you are assuming the latter.

20. Feb 3, 2017

### jostpuur

The original question contained some ambiguities because it was only brainstorming.

21. Feb 3, 2017

### Staff: Mentor

Maybe to you, I can't judge that. It is perfectly fine and the right thing to do.

Nothing in the derivation assumes that the particles are perfectly in energy eigenstates. The classical case is the limit of the quantum case for h->0. For observables that do not depend on h, this limit is trivial to calculate.

22. Feb 3, 2017

### Staff: Mentor

But your question can't be answered unless the ambiguity is resolved. Can you resolve it?

23. Feb 3, 2017

### jostpuur

I think that my original question has been answered in the sense that the relevant ambiguities have gotten pinpointed.

Originally I did not have an answer ready for this. I can see that the standard answer to my original question will depend on which way the clarification will be made.

24. Feb 3, 2017

### Staff: Mentor

Please give a specific reference.

No, you can't, because when you take quantum statistics into account, the correct distribution is not Maxwell-Boltzmann, it's either Bose-Einstein or Fermi-Dirac.

25. Feb 6, 2017

### jostpuur

I'm only responding to the request.

Introductory Statistical Mechanics (second edition) by Bowley and Sanchez has Chapter 7 with title "Maxwell distribution of molecular speeds". Section 7.1 has title "The probability that a particle is in a quantum state", and it starts on page 144 with this type of content:

Then the book goes on about issues with densities of states in three dimensions, and eventually on page 152 they get to this
Nowhere in between did they speak about artificial discretization of classical velocities or momentums, so the reader is left under impressions, that this result comes naturally from the Schrödinger's equation.

The formula appears to contain $\hbar$, but actually the factor $\lambda_D$ was defined in such way that the Planck's constants cancel.

Later in Chapter 10 they speak about Fermi and Bose particles.

Last edited: Feb 6, 2017