As this is in the math section forgive my ignorance if I misunderstand the OP motives for the question, but here is my understanding on this from the perspective of foundations of physical law.
The second law, and the principle of least action are all kind of special cases of the same underlying idea - to maximize the a priori probability, wether a probability of a state or a transition probability. The only thing making "entropy" interesting is that its a LOG measure of probability; meaning that multiplicative constructions become additive. Other than this, I see now magic in "entropy" vs just "a priori probability".
atyy said:
The thinking in physics is that entropy increase has little to do with quantum mechanics. Instead it is due to limitations on what we are able to observe.
...
The intuition is provided
by Villani (slide #79): "Information still present, but not observable (goes away in very fast velocity oscillations)". What he means is that if we can only measure things up to a certain frequency, then although information is not truly lost, for the observer with finite observation resolution, information is irreversibly lost, since the information is carried to higher and higher frequencies.
Reasoning along these lines are IMO fruitful and can be generalized alot, and can probably play a part in a reconstruction of QM required for progress.
I would light to highlight this:
If you can make the observer side (wether you call it "background" or matter) reactive then it should be crystal clear how the mechanism of information beeing lost to an "observer with finite resolution" MUST stabilize the chaos, and probably EXPLAIN emergence of stable rules, in despited of the "in principle" lack of detailed knowledge. In this like on KEY, to TAME, the madness you get when removing the classical background.
Ultimately, all P-measures and consequently entropies are fundamentally attached to an observer. And its precisely in this, that the explanatory power in a reconstruction lies.
So what I would suggest is, that forget about the old shannon entropy, at least in these contexts.
A IMO simpler derivation is also to consider the multinomial distribution, you can ponder about the "probability" to draw a finite sequence of dices throws (and approximate a discretized frequency from this, abn consider this to be a "probability" p) for a future sequence, based on a prior, then one gets a log P(p|p_prior) that pops out not shannone entropy, but the relative kullback-leibler entropy.
ie log P = a * S_KL + b. But if you assume a equipotential prior, beeing independent of the outcomes, then one can add that part to the constant terms, and instead the shannon entropy pops out. the first terms a, seems to be the number of samples defining the discretized vesion of the draw p.
Interestingly if one makes a differential of the Kullback-leibler entropy, one can get the fischer information riemann metric.
So principle of least action, can be understood as the principle of minimum information gain.
But this the interesting thins is if one can couple thsi to reconstruction of QM.
I noted one Einstein quite from the Cohen paper atyy posted.
“UsuallyWequals the number of complexions. In order to computeW[however] one needs a complete(molecular-mechanical) theory of the system.Therefore
it is dubious that the Boltzmann principle has any meaning withouta completemolecular-mechanical theory or some other theory which describesthe elementary [dynamical] processes [of the system]. In equilibrium, the ex-pressionS=klogW+c, seems [therefore] devoid of [any] content from aphenomenological point of view, without giving in addition such an elemen-tary theory.”
Indeed, this translations in the light of what i tried to outline here, as the need for the for a description of the microstructure of the observer side. This is part of the task of rescontructing QM. This is also why its instructive to take an explicit example of the multinomial distribution, where one can define states as a sequence of dice draws - binned up as a relative frequency distribution. then the derivation also becomes explicit. Its also possible the "simlpest of simplest casese" (the reason why i looked at it) to consider a multinomal ditribution, as ther can can consider toy models for markov processes with any memory you want. I think in the reconstruction of QM wee need; an observer in principle may have "infinite memory" but for the saome reasons above, an observer with limited capacity will necessarily loose some information and be required to recode. So in a extremely speculative outook, any given observer possible corresponds to somethink like a markov chain where its memory depends both on its choice of internal structure (ie coding) ANd its physical mass, assuming it constrains what's possible. Then one would need to model "interacting markov chains" whose memory and internal state machine are in constant evolution.
So a computer simulation of this, probably would suggest, having two algoritms "interact", and look for what they negotiated upon. So quite different from an initial value problems with boundary constraints subject to diff equations.
If the objective was certain proofs of mathematical theorems, then ignore this.
/Fredrik