Information Theory - Shannon's Self-Information units

I'm familiar with information and coding theory, and do know that the units of Shannon information content (-log_2(P(A))) are "bits". Where "bit" is a "binary digit", or a "storage device that has two stable states".

But, can someone rigorously prove that the units are actually "bits"? Or we should only accept it as a definition and then justify it with coding examples.

No, you can use what ever logarithmic base. For natural information you could use unit "nat" with base e and for binary information unit "bit" with base 2.

