Why do they introduce the partition function. I have seen it in the derivation of the Boltzmann distribution. But I dont know the physical significance of it here? And how do they get to (L.11) after that? I get everything until L.7. Including L.7.

The rest of the proof is here just in case you are interested:

# I Usage of partition function in derivation of Sackur-Tetrode

