ipsky said:
			
		
	
	
		
		
			I came through to the maximum entropy hypothesis from probability and information theory, and ergodic hypothesis. Using ergodicity and the concepts of extensive and intensive quantities, I have come to understand some foundations of equilibrium and non-equilibrium mechanics. The concept of ensembles appears to me as vestigial rather than fundamental. And most modern books appear to be ensemble heavy. Am I mistaken?I had consulted the chapter on kinetic theory of gases from his book (vol. 1), and I found it hard to follow. I could not detect the physical significance of the concepts from the text. Perhaps it is designed for those who take his courses, or follow his videos online. I'll browse through Kardar's videos.
As my previous degrees have been in engineering, I have not had the chance to study physical mathematics in sufficient detail. If you consult my original post carefully, you will see that the book list covers 'physical mathematics'. All except Piskunov's cover at least some, if not all, aspects of statistical physics. I have been using Piskunov to refresh my mathematics skills, and it does contain very useful tools necessary for studying statistical physics. I'm very satisfied with all of these books. But I'm not knowledgeable enough to know whether they are sufficient and relevant enough for doing physics today. Hence the post!
		
		
	 
I don't know what is considered "vestigial" or "fundamental".  Here's how I was taught:
At a elementary level, all the ensembles are just entropy maximized distributions for the microstates of the system, with different constraints.  The reason why they all give the same answers in the large sample size limit is explained in any grad stat mech textbook and is a consequence of CLT (cartoon level explanation).   The fact they give the same answer means you just pick the ensemble that makes calculation easiest.
Why ensembles are entropy maximizing?
At an undergraduate level, the fact "stuff" maximizes entropy is assumed using some heuristics.  At a graduate level like kardar, this is proven to be the limiting distributions of the equilibration process by deriving Boltzmann H theorem from very benign assumptions.  Once you know the distribution of the microstates, you can compute ensemble averaged values which you heuristically are convinced are equal to the time-averaged values (ergodic hypothesis).  Digging deeper down this path is rarely done in a stat mech textbook because it is highly intricate.  This is the subject of 
ergodic theory *.
Difficulty of kinetic theory ch.3 kardar:
Yes! it's very difficult :).  I empathize.  It has to be because you are literally deriving a fundamental result of statmech (boltzmann H) from microscopic laws of motion (hamiltonian flow).  Kardar will walk you from F=ma (poisson bracket) -> BBKYG-> Boltzmann equation -> Boltzmann H in just a few pages, so strap yourself. You can of course skip that chapter and just assume entropy is maximized as a starting point and go straight to ch. 4, which is perfectly fine btw.  However it seems you are not satisfied with a heuristic understanding of why things maximize entropy in the first place.  If so, you got to do the hard work!  No free lunch.
Note btw, that on the MIT ocw website, he posted all his lecture notes.  If you notice he covers about 4-5 pages of his book in 1.5 hours of lecture: the book is dense.
A slightly more inspired derivation of Boltzmann equations is found in Landau vol. 10, section 1-5 (first 10 pages or so).  Whether you like landau's or kardar better is a matter of taste.
The benefit of course is ch.3 is the gateway to other non-equilibrium studies (plasma physics, transport phenomena, hydrodynamics)
Math needed:
Regarding math prerequisites:  the pre-requisite to study a grad stat mech textbook or anything in landau series is solid multi-variable calculus, differential equations and linear algebra.  Depending on the type of engineering, there's significant overlap with the engineering curriculum.  It seems you are already armed with a slew of those books.
Knowing the lagrangian & hamiltonian formulation of mechanics and some facilities with quantum is also assumed.
I just wanted to clarify "why" stat mech is taught in a certain way.  There's a pretty coherent logic to it.
* Symmetry breaking is example of ergodicity breaking, but that's more advanced than kardar vol. 1 and you'll have to master that to appreciate the former.  Most of the time, ergodicity "makes sense" just intuitively.