Check out the Wikipedia article at
http://en.wikipedia.org/wiki/Maxwell–Boltzmann_statistics
Basically, you have the number of particles N and a bunch of energy levels with energy, say 0,1,2,3... for example. Then you have the total energy E. Now you want to know how many ways you can fill those energy levels with N particles to get that total energy. Suppose you have 3 particles and the total energy is 4. There are 4 ways to do this:
0 1 2 3 4 = energy of level
---------
2 0 0 0 1 = number of particles in each level
1 1 0 1 0 "
1 0 2 0 0 "
0 2 1 0 0 "
4 3 3 1 1 = 4 times the average number in each level
You can see the distribution is high at level 0, dropping off for higher energy levels. Boltzmann (and Gibbs) carried out this analysis for N particles with total energy E, and figured out that if you have N_i particles in energy level i with energy \epsilon_i and total energy E, the number of ways (W) this could be done is:
W=\prod_i \frac{1}{N_i!}
for large N. Now we want to find the N_i such that the sum of all the N_i equals N and the sum of all the \epsilon_i N_i equals E. In the table above, it was done for N=3 and E=4, now we want to do it for the general case. Since N is large, we can use Stirlings approximation for the factorial x!\approx x^xe^{-x}. Its better to work in the log of W, so we can do sums instead of products.
\ln(W)=\sum_i N_i-N_i\ln N_i
Now we want to find the N_i where W is maximum. It turns out that that maximum is a HUGE maximum. The set of N_i that gives the largest number of ways gives a number of ways that is MUCH larger than any other configuration. The way we find this maximum is to form a function:
f=\sum_i N_i-N_i\ln N_i +(N-\alpha \sum_iN_i) +(E-\sum_i\beta \epsilon_i N_i)
You can see that if you find the maximum of this function, it will give you the W you are looking for, that also gives you the right total N and E. So now take the derivative and set to zero:
\frac{d \ln(W)}{dN_i}=0=\sum_i -\ln N_i-\alpha-\beta \epsilon_i
or
N_i=e^{-\alpha-\beta\epsilon_i}
and that's Boltzmann's equation. Now you solve for \alpha using the fact that \sum_i N_i=N to get
N_i=N e^{-\beta\epsilon_i}/Z(\beta)
and you can solve for \beta knowing E. PLEASE NOTE - there is a lot more things that go into the derivation. The above derivation leaves a lot out, but if you can follow the above derivation, then you will be very ready for the real derivation.
As far a showing that \beta=1/kT, you have to use Boltzmann's famous equation for entropy S=k\ln(W). Using the N_i that you found above and substituting it into the expression for \ln W you get
S/k=N\alpha+\beta E
differentiating and rearranging, you get
dE=\frac{1}{k\beta}\,dS-\frac{\alpha}{\beta}\,dN
which is just the fundamental equation of thermodynamics at constant volume:
dE=T\,dS+\mu dN
which shows that T=1/k\beta and \mu=-\alpha/\beta is the chemical potential.