Ok...then let's go back in time...why is the entropy increasing? Shouldn't the entropy be 0? I mean since all there is, is energy. What keeps it moving?
The fine grained entropy is indeed equal to zero and will always remain zero. I.m.o., when we consider the universe as a whole it doesn't make much sense to talk about the usual definition of the entropy, which is a coarse grained entropy. The fact that this enytropy does not decrease is basically a triviality.
The laws of physics are (as far as we know) such that information is exactly conserved as a function of time. Now, in practice, when we want to describe a system containing a huge number of particles, we are only intersted in a few variables, like the pressure, the energy content etc. These few variables cannot, of course, uniquely define the exact physical state of the system. So, given these variables, there will be a huge number of states the system can be in. The entropy is the logarithm of this number.
Now, where does this "fine grained" thing come in? A system in a finite volume can be in certain energy levels. If you could specify the energy a system is into sufficient accuracy, then you could actually exactly define the state the system is in. So, then there would only be one state the system can be in (assuming that the energy levels are not degenerate) given the specification of the energy and the entropy would thus be zero.
The coarse grained definition of the entropy is as follows. You simply fix some small energy range \delta E as your energy uncertainty. This energy range is supposed to be small on a macroscopic scale. You then count how many microstates have an energy in that range. The entropy is the logarithm of that number. For a system containing many particles, the dependence on \delta E can be ignored in a relative sense.
You can also look at this from an information theoretical perspective. You can define the entropy of a system as the number of bytes you would need to fully specify the system given the information you already have about the system. The distinction between fine grained and coarse grained entropy is then easy to understand. If you were to specify the energy of a system so precisely that it precisely fixes the state the system is in, then all of the bytes needed to specify the system are in the energy specification and no extra bytes would be needed to specify the system exactly.
Suppose, on the other hand, that the energy specification is of finite accuracy and containes only a few bytes of information. Since the number of bytes needed to specify the system is huge, we can ignore the few bytes already contained in the energy specification.
Entropy increases because all the states of a physical system are intrinsically equally likely, just like when throwing a die all the outcomes from 1 to 6 are equally likely. Suppose you have N dice and you measure its macroscopic state by adding up all the numbers the dice are showing. If the intital state is N, then that means that all the dice are showing a 1. The macrostate defines a single state so the entropy is zero.
But if we now throw all the dice, then the macrostate will change, it will become 3.5 N, because for this macrostate there are the most number of microstates possbile, all of which are equally likely. The entropy has thus increased, it is now the logarithm of the number of microstates compatible with a value of 3.5 N for the macrostate. There is nothing mysterious about the reason why this increase happened, other than why the initial state had such a low entropy.
If the entropy of the universe were maximal, then there could be no life in the universe, so given that we exist, the entropy cannot be maximal. So, I don't think that the low initial entropy of the universe is such a strange fact. There are perhaps other issues, like why we expereince an arrow of time that points in the direction in which the entropy increases. This has to do with the fact a computer in a universe can only be run in the direction of increasing entropy.