Boltzmann with degenerate levels

In summary, the conversation discusses the concept of equilibrium and the assumption that all microscopic states consistent with macroscopic properties are equally likely. The example of balls falling through holes on a table is used to illustrate this concept. The partition function and probabilities for a system with energy levels are also discussed, as well as the possibility of different probability distributions depending on the choice of degeneracy factor. The importance of being careful in understanding these concepts is emphasized.
  • #1
jostpuur
2,116
19
Suppose we have some model for some system, and that model has given us a sequence [itex]\mathcal{E}_1,\mathcal{E}_2,\mathcal{E}_3,\ldots[/itex], whose values are interpreted as the energy levels of the system. Denoting the energy levels slightly redundantly for future modification soon below, we state that the energy levels are

[itex]
E_1=\mathcal{E}_1
[/itex]
[itex]
E_2=\mathcal{E}_2
[/itex]
[itex]
E_3=\mathcal{E}_3
[/itex]
[itex]\vdots[/itex]

The probabilities defined by the Boltzmann distribution under a temperature [itex]T[/itex] will be

[itex]
p(1) = \frac{1}{Z(T)} e^{-\frac{\mathcal{E}_1}{T}}
[/itex]
[itex]
p(2) = \frac{1}{Z(T)} e^{-\frac{\mathcal{E}_2}{T}}
[/itex]
[itex]
p(3) = \frac{1}{Z(T)} e^{-\frac{\mathcal{E}_3}{T}}
[/itex]
[itex]
\vdots
[/itex]

where the partition function is
[itex]
Z(T) = e^{-\frac{\mathcal{E}_1}{T}} + e^{-\frac{\mathcal{E}_2}{T}} + e^{-\frac{\mathcal{E}_3}{T}}+ \cdots
[/itex]

Suppose we find out that the model was only an approximation of a more accurate model, and according to the new more accurate model the energy values are going to be [itex]\mathcal{E}_n[/itex] and [itex]\mathcal{E}_{2n}+\epsilon[/itex] with some small positive epsilon. Now the energy levels are

[itex]
E_1 = \mathcal{E}_1
[/itex]
[itex]
E_2 = \mathcal{E}_2
[/itex]
[itex]
E_3 = \mathcal{E}_2+ \epsilon
[/itex]
[itex]
E_4 = \mathcal{E}_3
[/itex]
[itex]
E_5 = \mathcal{E}_4
[/itex]
[itex]
E_6 = \mathcal{E}_4 + \epsilon
[/itex]
[itex]
E_7 = \mathcal{E}_5
[/itex]
[itex]
\vdots
[/itex]

Now the probabilities defined by

[itex]
p(n) = \frac{1}{Z(T)}e^{-\frac{E_n}{T}}
[/itex]

turn out to be

[itex]
p(1) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_1}{T}}
[/itex]
[itex]
p(2) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_2}{T}}
[/itex]
[itex]
p(3) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_2+\epsilon}{T}}
[/itex]
[itex]
p(4) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_3}{T}}
[/itex]
[itex]
p(5) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_4}{T}}
[/itex]
[itex]
p(6) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_4+\epsilon}{T}}
[/itex]
[itex]
p(7) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_5}{T}}
[/itex]
[itex]
\vdots
[/itex]

where the partition function is

[itex]
Z(T) = e^{-\frac{\mathcal{E}_1}{T}} + e^{-\frac{\mathcal{E}_2}{T}} + e^{-\frac{\mathcal{E}_2 + \epsilon}{T}} + e^{-\frac{\mathcal{E}_3}{T}} + e^{-\frac{\mathcal{E}_4}{T}} + e^{-\frac{\mathcal{E}_4 + \epsilon}{T}} + e^{-\frac{\mathcal{E}_5}{T}} + \cdots
[/itex]

Suppose we decide that the epsilon is so small that it has not much significance, and we might as well simplify the formulas by taking the limit [itex]\epsilon\to 0[/itex]. This limit is going to give us a new probability distribution

[itex]
p(1) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_1}{T}}
[/itex]
[itex]
p(2) = \frac{2}{Z(T)}e^{-\frac{\mathcal{E}_2}{T}}
[/itex]
[itex]
p(3) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_3}{T}}
[/itex]
[itex]
p(4) = \frac{2}{Z(T)}e^{-\frac{\mathcal{E}_4}{T}}
[/itex]
[itex]
p(5) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_5}{T}}
[/itex]
[itex]
\vdots
[/itex]

where the partition function is

[itex]
Z(T) = e^{-\frac{\mathcal{E}_1}{T}} + 2e^{-\frac{\mathcal{E}_2}{T}} + e^{-\frac{\mathcal{E}_3}{T}} + 2e^{-\frac{\mathcal{E}_4}{T}} + e^{-\frac{\mathcal{E}_5}{T}} + \cdots
[/itex]

Now we have two different probability distributions for the case [itex]\epsilon = 0[/itex]. Is one of them right, and the other one wrong? Which way around would be the right answer?
 
Physics news on Phys.org
  • #2
You have to be careful here. The partition function is defined to be:

[itex]Z = \sum_i e^{\frac{-E_i}{kT}}[/itex]

where [itex]i[/itex] ranges over all states. It's not a sum over energy eigenvalues, it's a sum over states. Each state makes a contribution, not just states with distinguishable energy levels.

If you want to sum over energy values, as well, then you have to include a degeneracy factor: [itex]Z = \sum_i g_i e^{\frac{-E_i}{kT}}[/itex], where now the sum is over energy levels, and [itex]g_i[/itex] is the number of states with energy level [itex]E_i[/itex].
 
  • Like
Likes mfb and DrClaude
  • #3
I understand the claim, but don't believe. Why would the probability distribution have such uniform background measure over all states?

For example, suppose you have lot of holes on some special table, and suppose little balls are being thrown at that table so that the balls eventually fall through the small holes. The events where the balls hit the holes are going to be random events. If the holes are not uniformly distributed, and if some of the holes are extremely close to each other, they are going to be competing for the same random events, hence reducing their individual chances of getting a hit.

Perhaps the probability distribution I wrote down above for the case [itex]\epsilon > 0[/itex] was wrong, because perhaps the probabilities should have been
[itex]
p(1) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_1}{T}}
[/itex]
[itex]
p(2) = \frac{1}{2Z(T)}e^{-\frac{\mathcal{E}_2}{T}}
[/itex]
[itex]
p(3) = \frac{1}{2Z(T)}e^{-\frac{\mathcal{E}_2 + \epsilon}{T}}
[/itex]
[itex]
p(4) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_3}{T}}
[/itex]
[itex]
p(5) = \frac{1}{2Z(T)}e^{-\frac{\mathcal{E}_4}{T}}
[/itex]
[itex]
p(6) = \frac{1}{2Z(T)}e^{-\frac{\mathcal{E}_4 + \epsilon}{T}}
[/itex]
[itex]
p(7) = \frac{1}{Z(T)}e^{-\frac{\mathcal{E}_5}{T}}
[/itex]
[itex]
\vdots
[/itex]
where the partition function would be
[itex]
Z(T) = e^{-\frac{\mathcal{E}_1}{T}} + \frac{1}{2}e^{-\frac{\mathcal{E}_2}{T}} + \frac{1}{2}e^{-\frac{\mathcal{E}_2+\epsilon}{T}}
+ e^{-\frac{\mathcal{E}_3}{T}} + \frac{1}{2}e^{-\frac{\mathcal{E}_4}{T}} + \frac{1}{2}e^{-\frac{\mathcal{E}_4+\epsilon}{T}}
+ e^{-\frac{\mathcal{E}_5}{T}} + \cdots
[/itex]

Perhaps there should have been factors [itex]\frac{1}{2}[/itex] like this because some of the states are so similar that they are competing for the same random events? Why not like this?

Isn't this what being careful looks like, by the way?
 
  • #4
Well, you can take it as the definition of a system being at equilibrium that if you fix all the macroscopic conserved quantities--total energy, total momentum, total angular momentum, total charge, total number of particles of each type, etc.--then every microscopic state consistent with those macroscopic properties is equally likely. I don't know if there is a justification for that assumption, other than the principle of indifference: if you don't have any other way of distinguishing states, then they have to be equally likely.

Your example is not relevant to this assumption, because there is no notion of "competition" for states. When you compute the partition function it's for the entire system, not for a single particle. So for your example, specifying the state means specifying the location and momentum of each ball. The constraint that two balls can't occupy the same location could either be imposed by just some states as not allowed, or it could be done in a "soft" way by putting in a short-range repulsive force between balls, so that the energy of the system as a whole shoots up if two balls get too close together.

I'll have to think about your example some more to see if I can model it using statistical mechanics.
 
  • #5
The density matrix of the canonical ensemble is given by ##\rho=e^{-\beta\hat H}##, where ##\hat H## is the Hamiltonian operator. The partition function is given by ##Z=\mathrm{Tr}(\rho)##. Assume that ##\hat H## has discrete, but possibly degenerate spectrum. Then it can be diagonalized as ##\hat H=\sum_n\sum_{k=1}^{g(n)} E_n\left|\Psi_{nk}\right>\left<\Psi_{nk}\right|##, where ##\hat H\left|\Psi_{nk}\right>=E_n\left|\Psi_{nk}\right>##, ##k## labels the degeneracy of ##E_n## (i.e. ##1\leq k \leq g(n)=\mathrm{dim}(\mathrm{Eig}(\hat H,E_n))##) and the ##\left|\Psi_{nk}\right>## can be choosen to be orthogonal. We then find ##e^{-\beta\hat H}=\sum_n \sum_{k=1}^{g(n)} e^{-\beta E_n} \left|\Psi_{nk}\right>\left<\Psi_{nk}\right|## (by the spectral theorem) and ##Z=\sum_n\sum_{k=1}^{g(n)} e^{-\beta E_n}##. The term ##e^{-\beta E_n}## appears ##g(n)## times in this sum, i.e. ##Z=\sum_n g(n) e^{-\beta E_n}##, so each term ##e^{-\beta E_n}## must be weighted by its degeneracy.
 
  • #6
jostpuur said:
I understand the claim, but don't believe. Why would the probability distribution have such uniform background measure over all states?
It is a fundamental assumption of equilibrium statistical physics: all accessible microstates are equally probable.
 
  • #7
DrClaude said:
It is a fundamental assumption of equilibrium statistical physics: all accessible microstates are equally probable.

In classical models (not quantum) that "fundamental assumption" is contradictory and leads to paradoxes, because the microstates often form some kind of continuum, and the only way to apply the Boltzmann's distribution is to first discretize the model somehow. However, there is always multiple ways of discretizing the model, and the different discretizations can lead to different probability distributions for the original continuous model. For this reason I think it is obvious that in general the probability distribution has to be allowed to be proportional to some function [itex]f(n)e^{-\frac{E_n}{T}}[/itex], where [itex]f(n)[/itex] is some "background measure". To me it seems that people have not understood the need for this background measure, because in many examples it is something very uniform and often only a constant. Anyway, if you insist that the probability distribution will be proportional to precisely [itex]e^{-\frac{E_n}{T}}[/itex] and nothing else, it will lead to paradoxes.
 
  • #8
In classical mechanics the definition of states can look a bit arbitrary, and you won't get the correct results e. g. for blackbody radiation. That should not be surprising - blackbody radiation was one of the key observations that lead to the discovery that we do not live in a classical world. With knowledge about quantum mechanics - the more fundamental theory - we also got a deeper motivation for the states in classical physics.
 
  • #9
Can the (often axiomatic) assumption, that all accessible (under some energy constraint) microstates are equally probable, be derived out from a Schrödinger's equation (as some accurate approximation)?

Wouldn't the derivation need some model of the form [itex]H=H_0+\epsilon I[/itex], where the eigenstates (eigenvectors) of [itex]H_0[/itex] would be considered as the microstates (of statistical physics), and the term [itex]I[/itex] would be something that somehow mixes the wave function in a statistical way over macroscopic time intervals?
 
  • #10
Objects don't have to be in thermal equilibrium. You cannot force them to be - the Schrödinger equation doesn't tell you if something is in thermal equilibrium.
 
  • #11
If you assume that the true time evolution comes from Schrödinger's equation and also assume something else, something reasonable that would be related to statistics, the assumptions together might imply Boltzmann's distribution as an accurate approximation. A proper derivation of Boltzmann's distribution should look like something on those lines, since it is the Schrödinger's equation that ultimately produces the true time evolution. I already knew that Schrödinger's equation alone is not going to imply Boltzmann's distribution.

When the Boltzmann's distribution is derived without any use of Schrödinger's equation, some things in the derivation have to be working by accident and good luck.

Related to this topic, I would like to remind you that the assumption that all microstates (accessible under some energy constraint) are equally probable, when the microstates have been identified with the energy eigenstates, contains implicit assumption which severely contradicts quantum mechanics, because according to quantum mechanics the state of a system does not need to be any energy eigenstate. The states can be linear combinations of energy eigenstates. For this reason it is obvious that the use of microstates is supposed to be some kind of approximative model that merely has similar statistical behavior as some more accurate quantum model with proper wave functions. A proper derivation of Boltzmann's distribution should take into account the nature of this approximation.
 
  • #12
jostpuur said:
In classical models (not quantum) that "fundamental assumption" is contradictory and leads to paradoxes, because the microstates often form some kind of continuum, and the only way to apply the Boltzmann's distribution is to first discretize the model somehow. However, there is always multiple ways of discretizing the model, and the different discretizations can lead to different probability distributions for the original continuous model.

Yes, you're right. The Boltzmann prescription, that at equilibrium, all states with the same energy are equally likely, is strictly speaking only applicable to a system with a finite number of states. To do classical statistical mechanics, people divide phase space into little cells, and use the volume of the cells as the measure of likelihood. The volume in phase space is a particular way of giving a particularly simple "background measure". You could certainly use other measures, and I don't know whether that has been explored, or not.

I'm assuming that when you talk about paradoxes, you're just saying that you can get different results depending on how you divide the continuum into "states" and take the limit? I don't think that there is anything paradoxical about the usual approach of using phase space volume. At least not for nonrelativistic physics. Famously, Planck's attempt to apply statistical mechanics to electromagnetic radiation led to infinities when he tried to take the continuum limit, but he got reasonable results by using a discrete number of states (leading to QM). Maybe there is some sense in which Boltzmann's rule is nonsensical in the continuum limit, which is a hint that the world isn't classical.

Anyway, getting back to the original post, I think it has been answered. Whether you use one "background measure" or another, it will not be the case that [itex]Z[/itex] is computed by summing over energy levels; you have to include a measure [itex]g(E)[/itex] giving the degeneracy (in the case of discrete states) or the measure (in the case of a continuum of states).
 
  • #13
jostpuur said:
Related to this topic, I would like to remind you that the assumption that all microstates (accessible under some energy constraint) are equally probable, when the microstates have been identified with the energy eigenstates, contains implicit assumption which severely contradicts quantum mechanics, because according to quantum mechanics the state of a system does not need to be any energy eigenstate. The states can be linear combinations of energy eigenstates. For this reason it is obvious that the use of microstates is supposed to be some kind of approximative model that merely has similar statistical behavior as some more accurate quantum model with proper wave functions. A proper derivation of Boltzmann's distribution should take into account the nature of this approximation.

Yeah, you can't treat every linear combination of eigenstates as different states for statistical purposes, because they are overlapping. The Boltzmann rule requires the notion of "state" to be exclusive: you can't be in two different states simultaneously. It works to just pick a complete orthonormal set of states and do statistics on those, but there might be a more sophisticated treatment that doesn't require first coming up with a basis. It's getting beyond my knowledge at this point.
 
  • #14
stevendaryl said:
Anyway, getting back to the original post, I think it has been answered.

I see that there is a standard answer available to my question.

My original question contained the slight ambiguity that I didn't specify whether it was supposed to be in classical or quantum setting. It might affect the answers. For example the quantum mechanical energy states (eigenvectors) will always be orthogonal, no matter how close the energy levels (eigenvalues) are, so the states are perhaps never going to be so similar that they would "compete for the same random events".
 
  • #15
jostpuur said:
I see that there is a standard answer available to my question.

My original question contained the slight ambiguity that I didn't specify whether it was supposed to be in classical or quantum setting. It might affect the answers. For example the quantum mechanical energy states (eigenvectors) will always be orthogonal, no matter how close the energy levels (eigenvalues) are, so the states are perhaps never going to be so similar that they would "compete for the same random events".

I don't think that "competing for the same events" really makes sense as a concept. Or maybe I just don't understand what you mean.
 
  • #16
What I meant becomes evident with the following modification to the derivation of the Maxwell's speed distribution in classical setting.

The possible states of gas particles can be parametrized by the momentum vector [itex]\vec{p}[/itex] whose allowed values can be anything from [itex]\mathbb{R}^3[/itex]. The continuum brings serious problems, and the allowed values must be discretized. The most obvious choice is to choose some small [itex]\Delta p>0[/itex], and then decide that the allowed values are from [itex]\Delta p\;\mathbb{Z}^3[/itex]. If you then decide that probabilities must be proportional to [itex]e^{-\frac{E_{\vec{p}}}{T}}[/itex], and at the end take the limit [itex]\Delta p\to 0[/itex], you get the right Maxwell's speed distribution.

Suppose that for some reason you are given a different discretization: Some discrete set [itex]\Lambda\subset\mathbb{R}^3[/itex], where for example the points are denser close to the origin, and sparser far away from the origin. If you then decide that the probabilities again must be proportional to [itex]e^{-\frac{E_{\vec{p}}}{T}}[/itex] for all [itex]\vec{p}\in\Lambda[/itex], you are going to get a wrong result -- a distorted version of the Maxwell's speed distribution.

This does not necessarily mean that the discretization [itex]\Lambda[/itex] would be wrong. All you have to do is to find nice weights [itex]f(\vec{p})[/itex] which are suitably smaller close to the origin, and larger far away from the origin, and then you have to postulate that the probabilities are going to be proportional to [itex]f(\vec{p})e^{-\frac{E_{\vec{p}}}{T}}[/itex]. If the weights are right, you get the correct Maxwell's speed distribution again.

In this case I would say that the points of [itex]\Lambda[/itex], which were denser close to the origin, were "competing for the same random events".

Due to this example, I'm not convinced that the question of finding the "right discretization" would necessarily be the correct question. Equivalently we might think that almost any discretization will be fine, while the real task is going to be finding a way to find the right weights.
 
  • #17
jostpuur said:
Due to this example, I'm not convinced that the question of finding the "right discretization" would necessarily be the correct question. Equivalently we might think that almost any discretization will be fine, while the real task is going to be finding a way to find the right weights.
In classical mechanics, that leads to the same answers, but it ruins the feature that classical mechanics is a special case of quantum mechanics. In quantum mechanics it does not work at all.
I don't see any advantage of a system that is more complicated, less motivated by physics and has a smaller range of applications.
 
  • #18
The quantum mechanical derivation of Maxwell's speed distribution is symbolic garbage that produces the right result by accident and good luck. Or at least it looks like it.

Suppose you have a cube of macroscopic size 10m[itex]\times[/itex]10m[itex]\times[/itex]10m, and lot of gas particles in it. The gas particles are going to obey the Maxwell's speed distribution, and you can derive it by assuming that the gas particles would be occupying the quantum energy eigenstates, which will be spatially spread over the entire macroscopic cube, and which can be written in terms of trigonometric functions. Seriously speaking the gas particles are not going to be spatially spread like that though, because the macroscopic gas wouldn't feel like macroscopic gas like that. The gas particles are probably on such states which can be described by spatially localized wave packets with some sensical momentum expectation values [itex]\langle \vec{p}\rangle[/itex], which will behave as their classical momentums.

Is there any serious reason to believe that quantum mechanics would have anything to do with the Maxwell's speed distribution?
 
  • #19
jostpuur said:
Suppose we find out that the model was only an approximation of a more accurate model, and according to the new more accurate model the energy values are going to be ##\mathcal{E}_n## and ##\mathcal{E}_{2n}+\epsilon## with some small positive epsilon. Now the energy levels are

Is your intent to add new energy levels, or to shift some of the energy levels without adding any? The actual modified levels you posted indicate the former, but your verbal description seems to indicate the latter. Also, your derivation of the different distributions for ##\epsilon = 0## requires the former, but I don't see why different distributions would be a problem unless you are assuming the latter.
 
  • #20
PeterDonis said:
Is your intent to add new energy levels, or to shift some of the energy levels without adding any?

The original question contained some ambiguities because it was only brainstorming.
 
  • #21
jostpuur said:
The quantum mechanical derivation of Maxwell's speed distribution is symbolic garbage that produces the right result by accident and good luck. Or at least it looks like it.
Maybe to you, I can't judge that. It is perfectly fine and the right thing to do.

Nothing in the derivation assumes that the particles are perfectly in energy eigenstates. The classical case is the limit of the quantum case for h->0. For observables that do not depend on h, this limit is trivial to calculate.
 
  • #22
jostpuur said:
The original question contained some ambiguities because it was only brainstorming.

But your question can't be answered unless the ambiguity is resolved. Can you resolve it?
 
  • #23
I think that my original question has been answered in the sense that the relevant ambiguities have gotten pinpointed.

PeterDonis said:
Is your intent to add new energy levels, or to shift some of the energy levels without adding any?

Originally I did not have an answer ready for this. I can see that the standard answer to my original question will depend on which way the clarification will be made.
 
  • #24
jostpuur said:
The quantum mechanical derivation of Maxwell's speed distribution

Please give a specific reference.

jostpuur said:
The gas particles are going to obey the Maxwell's speed distribution, and you can derive it by assuming that the gas particles would be occupying the quantum energy eigenstates

No, you can't, because when you take quantum statistics into account, the correct distribution is not Maxwell-Boltzmann, it's either Bose-Einstein or Fermi-Dirac.
 
  • #25
I'm only responding to the request.

Introductory Statistical Mechanics (second edition) by Bowley and Sanchez has Chapter 7 with title "Maxwell distribution of molecular speeds". Section 7.1 has title "The probability that a particle is in a quantum state", and it starts on page 144 with this type of content:

For a particle in a box, the eigenfunction describing standing waves is
[tex]
\phi_i(x,y,z) = A\sin\Big(\frac{n_1\pi x}{L_x}\Big)\sin\Big(\frac{n_2\pi y}{L_y}\Big)\sin\Big(\frac{n_3\pi z}{L_z}\Big)
[/tex]
The corresponding energy eigenvalue is
[tex]
\epsilon_i = \frac{\hbar^2\pi^2}{2m}\Big(\frac{n_1^2}{L_x^2}+\frac{n_2^2}{L_y^2}+\frac{n_3^2}{L_z^2}\Big)
[/tex]
According to Boltzmann, the probability of finding a given particle in a particular single-particle state of energy [itex]\epsilon_i[/itex] is
[tex]
p_i = \frac{e^{-\epsilon_i/k_{\textrm{B}}T}}{Z}
[/tex]

Then the book goes on about issues with densities of states in three dimensions, and eventually on page 152 they get to this
Let there be [itex]n(u)du[/itex] particles with speeds between [itex]u[/itex] and [itex]u+du[/itex]. For a gas in three dimensions we get
[tex]
n(u)du = \Big(\frac{N\lambda_D^3m^3}{2\pi^2\hbar^3}\Big)u^2e^{-mu^2/2k_{\textrm{B}}T}du
[/tex]
This is called the Maxwell distribution of molecular speeds.

Nowhere in between did they speak about artificial discretization of classical velocities or momentums, so the reader is left under impressions, that this result comes naturally from the Schrödinger's equation.

The formula appears to contain [itex]\hbar[/itex], but actually the factor [itex]\lambda_D[/itex] was defined in such way that the Planck's constants cancel.

Later in Chapter 10 they speak about Fermi and Bose particles.
 
Last edited:
  • #26
Let me quote Callen, Thermodynamics and an Introduction to Thermostatics, 2nd ed., sec. 16-9:
Callen said:
[...] the partition function becomes
$$
z = \frac{1}{h^3} \int e^{-\beta \mathcal{H}} dx \, dy \, dz \, dp_x \, dp_y \, dp_z \quad \quad (16.68)
$$
Except for the appearance of the classically inexplicable prefactor (##1/h^3##), this representation of the partition sum (per mode) is fully classical. It was in this form that statistical mechanics was devised by Josiah Willard Gibbs in a series of papers in the Journal of the Connecticut Academy between 1875 and 1878. Gibbs' postulate of equation 16.68 (with the introduction of the quantity ##h##, for which there was no a priori classical justification) must stand as one of the most inspired insights in the history of physics. To Gibbs, the numerical value of ##h## was simply to be determined by comparison with empirical thermophysical data.
 
  • #27
@jostpuur,
Pardon me for chiming in so late but I would also point out that you get the same derivation from a fundamental assumption that at equilibrium the entropy is maximized. This works for both classical and quantum settings, solving the constrained optimization problem via Lagrange multipliers gives you the partition function from the probability normalization. The equi-partition principle is built into the definition of entropy, because entropy is calculate as a sum over states (or trace over dimension in the density operator formulation for the quantum case). the - p_k log p_k sum defining entropy is alway larger when the p's are uniformly equal across equivalent states... (variations in probabilities between states manifests when you optimize subject to constraints, e.g. that <E> is some specific value.)

Work through the Lagrange multiplier optimization problem yourself and you'll begin to see that it really can't be any other way than this. I was quite inspired to see the definition of temperature, emerge as the (reciprocal) Lagrange multiplier for the fixed expected energy constraint and the chemical potential similarly emerge from the expected particle number constraint. And you can further invoke other constraints by prescribing a fixed expected value for any system observable (mean magnetization, charge polarization, angular momentum, ...)
 
  • Like
Likes DrClaude
  • #28
Are you speaking about the calculation where we wish to maximize the function

[tex]
f:[0,\infty[^N\to\mathbb{R},\quad f(p_1,p_2,\ldots, p_N) = -\sum_{n=1}^N p_n\log(p_n)
[/tex]

under the constraint

[tex]
0 = \phi(p_1,p_2,\ldots, p_N) = p_1 + p_2 + \cdots + p_N - 1
[/tex]

The equation [itex]\nabla f = \lambda \nabla\phi[/itex] then implies that all [itex]p_n[/itex] have to be equal.

The problems in the classical statistical physics start from the fact that there is no right way to discretize the continuous parameters, and this entropy calculation assumes that the discrete set [itex]\{1,2,\ldots, N\}[/itex] has already been fixed and given some interpretation right from the start, so this calculation is not giving much aid those issues.

If you assume that a wave function of a very large dimensional quantum system obeys the Schrödinger's equation, and also assume something else concerning statistics, can you prove that the quantity

[tex]
S(t) = -\sum_{n=1}^N |\psi_n(t)|^2 \log(|\psi_n(t)|^2)
[/tex]

has a habit of growing upwards?
 
  • #29
What you say about all the probabilities being equal is true only if you do not impose further constraints. You can constrain the range of probabilities so that the expected value of the energy is a(n arbitrary) fixed value <E> = e. This gives you the classic (or quantum) distribution and the partition function emerges as the probability normalizing Lagrange multiplier. You can work with classical probability densities over phase space and the entropy defined as proportional to [itex]-\kappa \int_S \rho \ln(\rho) dxdp [/itex] or the quantum case either the discrete trace or integral trace. There's no difficulty with needing discrete states in the classical continuum case, you just need to pick an arbitrary scale for the entropy.

And as I mentioned, you can further constrain the system to have an arbitrary fixed expected particle number <n> , or arbitrary fixed expected *insert observable here*. These constraints further affect the state probabilities. Shall I work out the details here?
 
  • #30
How do you justify that the formula [itex]\int p\log(p)dx[/itex] is a good formula for entropy? All the mathematical sources I have seen only give it as an axiomatic definition. In the context of Boltzmann's distribution the formula [itex]\log(W)[/itex] is justified via the need for the formula [itex]\log(W_1W_2)=\log(W_1)+\log(W_2)[/itex] to hold, and only logarithm has this property.
 
  • #31
jostpuur,
There is a great deal of literature out there for the justification for the Gibbs formula for entropy as a generalization of Boltzmann's. Note that the Boltzmann entropy formula presupposes the system is already in thermodynamic equilibrium. Gibb's is a natural generalization which reduces to Boltzmann's under the equilibrium assumption and retaining the additivity within the classical domain when you combine systems.

But the short answer to how I would justify that formula is that it leads to empirically confirmed predictions of the behavior of a broad class of systems from ideal gasses to magnetizable materials. You are welcome to try to improve upon that if you like.
 
  • #32
If we've agreed that the entropy for discrete valued random variables is what it is, it's not going directly imply any unique entropy for continuously valued random variables. For example, suppose we have some probability density [itex]\rho(x)[/itex] defined on the interval [itex][0,1][/itex] so that
[tex]
\int\limits_0^1 \rho(x)dx = 1
[/tex]
holds. Suppose we want to approximate this random variable by a discretely valued random variable, but let's not use the most obvious discretization, but instead use some non-trivial points [itex]0<x_1<x_2<\cdots <x_N<1[/itex]. Then we get probabilities [itex]p_1,p_2,\ldots, p_N[/itex] by the formula
[tex]
p_n \approx \rho(x_n) (x_{n+1}-x_n)
[/tex]
and they will satisfy
[tex]
\sum_{n=1}^N p_n=1
[/tex]
It turns out that
[tex]
-\sum_{n=1}^N p_n\log(p_n) \approx
-\int\limits_0^1 \rho(x)\log\big(\rho(x)f(x)\big)dx + \underbrace{\log(N)}_{\to\infty}
[/tex]
holds, where the additional function [itex]f(x)[/itex] is something that satisfies [itex]f(x)\approx N(x_{n+1}-x_n)[/itex]. If [itex]x_{n+1}-x_n\approx \frac{1}{N}[/itex] holds at least conserning the magnitudes, this [itex]f(x)[/itex] makes sense. Based on this it would seem reasonable to state that the entropy should be given by the formula
[tex]
S = -\int\limits_0^1 \rho(x)\log\big(\rho(x)f(x)\big)dx
[/tex]

With a change of notation the entropy could also be written by a formula
[tex]
-\int\limits_0^1 \bar{\rho}(x)\log(\bar{\rho}(x))w(x)dx
[/tex]
with a weight [itex]w(x)=\frac{1}{f(x)}[/itex], which perhaps looks nicer.

This means that the use of entropy is not going solve the issues that arise from the different discretizations of continuous variables in the context of classical physics and Boltzmann's distribution.
 
Last edited:
  • #33
jostpuur said:
If we've agreed that the entropy for discrete valued random variables is what it is, it's not going directly imply any unique entropy for continuously valued random variables.

Right. One way to say it is that even though there is a unique "most natural" probability distribution on a finite set of possible events, which is to assume they are equally likely, there is no unique "most natural" probability distribution on any infinite set. To get agreement between statistical mechanics and thermodynamics, I think you have to assume something along the lines of "equal volumes in phase space imply equal probabilities". Classically, there is really no motivation for making this assumption, I don't think, other than the fact that it works.
 
  • Like
Likes mfb
  • #34
jostpuur,
What your exposition boils down to is simply the fact that you can carry out an arbitrary change of variable over the state space and change the form of Gibb's entropy to include the weighting factor you mentioned, or I would point out that once you're done I can introduce a change of coordinates to recover the Gibb's form.

6 one way, a half a dozen the other. The question is what is particular about the state-space coordinates which yield Gibb's form? We answer that in quantum theory and link it to the classical case via the correspondence principle.
 
  • #35
Some further elaboration... look at the dimensional factors in the entropy integral.
[tex]S= -\kappa \int \rho(\xi) \log(\rho(\xi)) d\xi[/tex]
where [itex]\xi[/itex] is our state space coordinate multi-variable. We should understand that the state space coordinates are not generally dimensionless quantities. As such the probability density function is not a dimensionless number either, but rather a probability density. Its occurrence alone within the logarithm begs the introduction of a unit canceling factor, let's say [itex]\eta[/itex]. Call this the gauge factor.

Our entropy formula would then become:
[tex]S= -\kappa \int \rho(\xi) \log(\rho(\xi)\eta) d\xi[/tex]
And naturally the begged question, in considering arbitrary coordinate systems on state space is the local change of this gauge and hence coordinate dependence on the gauge factor.
[tex]S= -\kappa \int \rho(\xi) \log(\rho(\xi)\eta(\xi)) d\xi[/tex]
But if a unimodular coordinates are used then the gauge factor will be a constant and as mentioned before we recover Gibb's form plus an additive factor corresponding to, in this case, the expectation value of a constant. It is just shifting the zero entropy point. [itex] S = S' +\kappa \log(\eta)[/itex].

Mind you I'm saying nothing new here, just using an alternative narrative.
 

1. What is Boltzmann with degenerate levels?

Boltzmann with degenerate levels refers to a statistical mechanics concept that describes the distribution of particles among energy levels that have the same energy. This is often seen in systems with identical particles such as gases or solids.

2. How does Boltzmann with degenerate levels differ from regular Boltzmann statistics?

In regular Boltzmann statistics, each energy level is unique and has a different energy value. In Boltzmann with degenerate levels, some energy levels have the same energy value, leading to a different distribution of particles among these levels.

3. What is the Boltzmann distribution equation for degenerate levels?

The Boltzmann distribution equation for degenerate levels is P_i = (g_i / g) * e^(-E_i / kT), where P_i is the probability of finding a particle in the i-th energy level, g_i is the degeneracy of that level, g is the total degeneracy of all levels, E_i is the energy of the i-th level, k is the Boltzmann constant, and T is the temperature.

4. How does degeneracy affect the distribution of particles in a system?

Degeneracy increases the number of available energy states for particles to occupy, leading to a more even distribution among energy levels. This means that in systems with degenerate levels, there is a higher probability of finding particles in higher energy levels compared to systems without degeneracy.

5. What are some real-world applications of Boltzmann with degenerate levels?

Boltzmann with degenerate levels is used in various fields such as physics, chemistry, and engineering to understand the behavior of particles in systems with degeneracy. It is also used in the study of quantum systems, semiconductors, and superconductors, among others.

Similar threads

Replies
27
Views
935
Replies
2
Views
101
  • Quantum Physics
Replies
1
Views
1K
  • Introductory Physics Homework Help
Replies
1
Views
118
  • Introductory Physics Homework Help
Replies
7
Views
64
  • Quantum Physics
Replies
9
Views
793
  • Quantum Physics
Replies
1
Views
787
  • Introductory Physics Homework Help
Replies
5
Views
144
  • Quantum Physics
Replies
1
Views
613
Replies
2
Views
575
Back
Top