Einstein's Cat said:
I'm assuming you mean information in a technical sense - because that's a little different from the meaning of the word in its everyday common usage.
Basically information (in its technical sense) is a way of quantifying uncertainty. If we have a fair coin then we are equally likely to obtain a head or a tail on any given throw (which defines what we mean by a fair coin). We have maximum uncertainty about the outcome. Another way of stating this is that we have maximum 'surprise' at the outcome.
If we have a biased coin that always lands on heads then we are never uncertain about the outcome and there is no 'surprise' at the result of a given coin toss.
Another way of thinking about this is that if we have one of these biased coins - which
always lands on heads - then we have absolutely nothing to learn from doing a coin toss. We get no new 'information' from doing an experiment.
In terms of a communication channel if we have Alice at one end and Bob at the other - then if Alice always inputs a 1 into the channel then there is no conceivable way she can communicate any information to Bob - in order to convey information Alice has to have different possible inputs. But that's still not enough. Suppose Alice always inputs an alternating sequence 10101010. . . . then, again, no information whatsoever can be conveyed by this - since the outcome is predictable. In order to convey information Alice must make non-predictable (to Bob) changes to her inputs and, furthermore, there has to be some correlation between the changes Alice makes and Bob's measurement (if there is no correlation between what goes in and what goes out then no information, again, can be conveyed on the channel).
So information in its technical sense is just a way of quantifying all of this. So if we assume this 'information' parameter is positive and a continuous function of probability, and we assume that if we have 2 independent events (like two independent coins tossed) then the uncertainty will simply be a sum of the two individual uncertainties - this is enough to determine that we require a logarithmic measure of uncertainty. The information (or uncertainty) over many trials, the average information or uncertainty, is then an entropy.
So what we're most interested in communication terms are changes of uncertainty - if our uncertainty before an experiment is the same as the uncertainty after an experiment then we've learned nothing and no information has been gained. So information is a direct technical measure of the change of uncertainty.