# Questions about deriving the Maxwell-Boltzmann Distribution

Hello,

I was watching a video on the derivation of the Maxwell-Boltzmann distribution function which would eventually lead to:
$$\frac{e^{-\beta \cdot \epsilon_i}}{\sum_{i=0}^n e^{-\beta \cdot \epsilon_i}}$$
To do this, initially, the number of possible permutations ##\Omega## of a total of ##N## molecules in an isolated system distributed over ##n## energy compartments is first concluded:
$$\Omega = \frac{N!}{n_1! n_2! n_3! … n_n!}$$
It is said that one would have to solve this equation for the maximum amount of possible permutations because a state of thermal equilibrium should have the highest probability.

I have 3 questions regarding this permutations formula:
1. I still have difficulty grasping the concept why a system in thermal equilibrium must have the maximum amount of possible permutations. Why isn’t it possible for a system in thermal equilibrium to have less?

2. To get the maxima, the formula is written in terms of ##ln(..)## and differentiated so that one can solve for 0. However, I can already see from the formula that the maximum amount of possible permutations is when each compartment ##n_i## contains just 1 molecule, so that there are ##N## compartments just as there are ##N## molecules of the whole system. Why can't it be reasoned this way?

3. I understand that this derivation is classical and thus ignoring the discrete quantum energy levels between the energy compartments. However, shouldn’t this mean that the difference in energies between the energy compartments is continuous so that there are an infinite amount of energy compartments? If so, how is it then possible to calculate the amount of permutations with a formula that shows a limited number of energy compartments?

Really looking forward to some clarifications on these questions.

BvU
Homework Helper
Hi John,

1) it's the definition of equilibrium . Less than maximum is a statistical deviation that is highly probable on an incredibly small scale (molelular), but incredibly improbable on a macro scale. Just follow the reasoning and judge afterwards, looking back.

It's like what you get when you mix white sand with black sand: you get grey sand, but on a microscale grains are still black or white.

2) the number of available energy compartments has nothing to do with the number of particles. And the total energy constraint excludes extremely high energies for large numbers of particles.

3) Can you give a reference ? I wonder how this is meant.

Hi John,

1) it's the definition of equilibrium . Less than maximum is a statistical deviation that is highly probable on an incredibly small scale (molelular), but incredibly improbable on a macro scale. Just follow the reasoning and judge afterwards, looking back.

It's like what you get when you mix white sand with black sand: you get grey sand, but on a microscale grains are still black or white.

2) the number of available energy compartments has nothing to do with the number of particles. And the total energy constraint excludes extremely high energies for large numbers of particles.

3) Can you give a reference ? I wonder how this is meant.

Thanks a lot for your explanations @BvU . 2) I indeed understand that it has nothing to do with the number of particles. What I meant is that the formula would give the highest amount of permutations if the denominator is 1, which is only achieved if there is 1 molecule in each compartment. The number of compartments must then be "coincidentally" the same as the number of particles, ##n = N##.
I was indeed considering the exclusion of extremely high energies but then the following argument came to mind: the energy in each compartment doesn't have to differ very much from other compartments, for example Δ0.00000.....001 Joules. This way, I can put just 1 molecule in each compartment and still have a very small differing energy range without the need to reach extremely high energies. Why isn't this possible?

3) This question came to mind when I read about probability densities of velocities of molecules in a system. One can not speak of a probability of a very specific velocity but only of a range (##dv##) because velocities are continuous and therefore there is an infinite amount of possible specific velocities. Since energy is a function of velocity, it should also be continuous. That's why I'm surprised that one can derive a limited number of possible permutations with the formula while in reality, there could be an infinite amount of energy compartments because energy is continuous.
This whole question would be answered if each compartment covers a certain range of energies but I'm not sure if this is indeed the case.

Thanks a lot for your explanations @BvU . 2) I indeed understand that it has nothing to do with the number of particles. What I meant is that the formula would give the highest amount of permutations if the denominator is 1, which is only achieved if there is 1 molecule in each compartment. The number of compartments must then be "coincidentally" the same as the number of particles, ##n = N##.
I was indeed considering the exclusion of extremely high energies but then the following argument came to mind: the energy in each compartment doesn't have to differ very much from other compartments, for example Δ0.00000.....001 Joules. This way, I can put just 1 molecule in each compartment and still have a very small differing energy range without the need to reach extremely high energies. Why isn't this possible?

3) This question came to mind when I read about probability densities of velocities of molecules in a system. One can not speak of a probability of a very specific velocity but only of a range (##dv##) because velocities are continuous and therefore there is an infinite amount of possible specific velocities. Since energy is a function of velocity, it should also be continuous. That's why I'm surprised that one can derive a limited number of possible permutations with the formula while in reality, there could be an infinite amount of energy compartments because energy is continuous.
This whole question would be answered if each compartment covers a certain range of energies but I'm not sure if this is indeed the case.

Anyone who could help me with the above remarks?

Stephen Tashi
Anyone who could help me with the above remarks?

If I can't help, at least I can commisurate. The typical expositions of statistical mechanics are at odds with the modern approach to statistics.

What is the most fundamental idea in modern statistics? It is the concept of a "probability space" (or "sample space"). We begin by defining a set of possible outcomes and talk about a probability measure defined on certain subsets of that set.

From that point of view, we don't begin talking about something iike "What's the probability it will rain?" without defining the probability space. For example, are we talking about the probability of rain at 8 AM in Denver Colorado on a Monday in 2018? If so do we intend to select a Monday from 2018 by a sampling process that gives each Monday an equal probability of being the one selected?

You've asked good questions, but I suggest that you need to answer a more fundamental question before they can be answered. When a lecturer begins to talk about "the probability the system is in state j", what probability space is he using?

If he doesn't specify a probability space, he is being too vague to correctly apply statistics. For example, do we have a set of jars on a shelf, each containing a system and do we select a system "at random" by picking one of those jars and measuring its state? Or perhaps we have one system in a jar on lab table from 8 AM to 10 AM and we select a random time between 8 AM and 10 AM from a uniform probability distribution and we measure the state of the system at that time? Or perhaps we pick a system by a process that combines the random selection of a jar with the a random time to measure the system's state?

The exposition given in https://courses.physics.ucsd.edu/2017/Spring/physics4e/boltzmann.pdf fails to clearly define a probability space, but it is clearer than the link you gave in two respects.

we will assume that the energy E of any individual particle is restricted to one or #another of the values ##0, \delta e, 2 \delta e, 3 \delta e,...##

So the energy states are all multiples of ##\delta e##, instead of being defined in some arbitrary "uneven" fashion.

If we now make the reasonable assumption that all microstates occur with the same probability, then the relative probability Pj that macrostate j will occur is proportional to the number of microstates that exist for that state.

There is no information about what probability space is used, or why it a "reasonable" assumption that all microstates occur with the same probability in that probability space. But at least, it leads naturally to using combinatorics because a macrostate is defined as as subset of the microstates and the goal is to compute the probabilities of the macrostates.

Both your questions are related to the problem of getting an answer for continuous physical model by using discrete approximations. I'm not sure the typical exposition of classical statistical physics accomplishes this from a mathematical point of view!

From calculus, we know that the "double" limit ##lim_{(x,y)\rightarrow (\infty,\infty)} f(x,y)## need not given by either of the "iterated limits". ##lim_{x \rightarrow \infty} ( lim_{y \rightarrow \infty} f(x,y))## and ##lim_{y \rightarrow \infty} ( lim_{x \rightarrow \infty} f(x,y))##. One might get different answers for an iterated iimit by picking different ways of letting ##(x,y)## get large. For example, one might say "Let ##y = 2x## and let ##x \rightarrow \infty##".

A typical exposition of statistical mechanics wants to find a physical model given by ##lim_{(N,M) \rightarrow (\infty,\infty)} f(N,M,...)## where ##N## is the number of particles and ##M## is the number of discrete states. (For example, ##M =## total energy ##/ \delta e## as ##\delta e \rightarrow 0##). Even if the exposition begins by talking about a fixed number of particles ##N##, the use of Stirlings approximation for ##N!## implicitly says ##N## is "large".

I suspect that the double limit may not actually exist in the mathematical sense. So physics must be used to argue why it is reasonable to use one of the iterated limits. Your question 2 is pertinent to a situation where we have let the number of states grow faster than the number of particles with the consequence that we are most likely to find at most one particle per state. Perhaps someone can give the physical justification for "No, that's not what we're talking about".

Last edited:
• BvU