# Homework Help: Information theory questions

1. Nov 20, 2007

### T-7

Hi,

Some Qs on information -- something rather new to me. I'm not quite sure what to do with these:

1. The problem statement, all variables and given/known data

a) Lack of information is defined as

$$S_{info} = - \sum_{i} p_{i}log_{2}p_{i}$$

($$p_{i}$$ is the probability of the outcome i). Calculate the entropy associated with the reception of N bits.

b) The human brain has approx. $$10^{10}$$ neurons each connected with 1000 neighbours. If one assumes that memory consists in activiting or deactivating each bond, what is the maximum amount of information which may be stored in the brain?

3. The attempt at a solution

a) Well, a bit is either on or off (1 or 0), so I suppose that each bit received has a probability of 0.5 attached to it (there are two states, and there is no mathematical reason to think one more probable than the other). So I would say all the $$p_{i}$$ are 0.5. If there are N bits, then that's N 'outcomes', each with probability 0.5.

So does this simply boil down to

$$S_{info} = - \sum_{1}^{N} 0.5 \times log_{2}(0.5) = N \times 0.5$$ (?)

b) I guess it's a bit to a bond -- that is, each activation/deactivation corresponds to a bit. But are the 'neighbours' it refers to, do you think, supposed to be in addition to the $$10^{10}$$ collection? In which case, is it merely $$10^{10} \times 1000$$ bits of information? Too easy, I suspect... Hmm.

A little guidance would be appreciated!

Cheers.

Last edited: Nov 20, 2007
2. Nov 20, 2007

### marcusl

The sum is not over bits, but over every string of 1000 bits that gives the same result.
You chose to start with equal numbers of 1's and 0's. Let's call this case A. You'll find out that case A is the most likely of all cases, and the one that has maximum entropy. [BTW, in the language of statistical mechanics a particular string of 1000 bits is a "microstate", and the case of so many 1's and 0's is a "macrostate." Each macrostate can have a huge number of corresponding microstates.]

To calculate the information entropy of A, call it S_A, you need to figure out the number n of different strings that have 500 1's and 500 0's, out of the total number of possible strings N. This is a problem in combinatorics, properly considering permutations. With these numbers in hand, the probabilty of the ith single such string is p_i=n/N. Lacking information to the contrary, we can assume that all p_i for case A have the same value. The sum over the n (identical) p_i's is now simple, and you have the entropy.

Realize there is a value of entropy for every other case S_B, S_C, etc., where these cases consist of all possible outcomes of 1000 bits (1000 1's; 999 1's;.... one 1; zero 1's). You can reason from the binomial distribution that case A is the most likely, so the entropy of all other cases will be smaller. You could calculate one of them for comparison. (Take an extreme case like all 1's, for instance).

Last edited: Nov 20, 2007
3. Nov 21, 2007

### T-7

Unfortunately I have done very little stats. My answer to (a) is evidently wrong. I wonder if someone could exhibit at least part of the correct solution (or perhaps do it for a specific case, say, of 4 bits of information), and then perhaps it will be clearer in my mind, and I'll be able to finish it off (or do it for the general case).

Cheers.

4. Nov 21, 2007

### marcusl

Ok, first things first. Did you understand what the equation in your post means and how it's used? Does your class have a text?

5. Nov 21, 2007

### T-7

No. It has just appeared in a problem sheet. We don't have a text for this part of the course, nor did this formula feature in the lectures.

6. Nov 21, 2007

### marcusl

That's pretty irresponsible of your professor! I don't have time to address this now (I'm at work, and later I'll be preparing for tomorrow's Thanksgiving party). I can steer you to this article
http://en.wikipedia.org/wiki/Information_entropy" [Broken]
Following Eq. (5) is a short discussion of a coin toss and a curve showing maximum entropy for a fair coin. (This curve is for a single toss).

I can get back to you on Friday. Can anyone else help in the meantime?

Last edited by a moderator: May 3, 2017
7. Nov 22, 2007

### T-7

Enjoy the thanksgiving party.

I shall re-launch part of the question in a new thread, along with my rather rushed musings on it so far...

Last edited by a moderator: May 3, 2017
8. Nov 23, 2007

### marcusl

Hi, I'm back a little more relaxed and weighing a little more Hope everyone else (in the US, anyway) had a great holiday.

Here's how to work the problem for N = 4 random bits. For sake of completeness, we note that the number of different strings of 4 bits is M = 2^N = 16.

Case A) Equal probability of 0's and 1's, so number of 0's is k = 0.5 * N = 2. The number of ways m to sample k=2 objects from N=4, where the order doesn't matter, is 6. You can see this by writing down all strings of 2 0s's and 2 1's. When you get to larger numbers, use combinatorics (see, e.g.,
http://en.wikipedia.org/wiki/Combinatorics" [Broken]).
EDIT: Stirling's approximation helps with the big factorials. (See
http://en.wikipedia.org/wiki/Stirling_approximation" [Broken].)

The number of ways to choose k objects from N is
$$m = {N\choose k} = \frac{N!}{k!(N - k)!}$$

In other words, there are 6 microstates that belong to the macrostate two 0's and two 1's.

Let's turn now to the entropy,
$$S = - \sum_{i=1}^{m} p_{i}log_{2}p_{i} .$$
For random bits (or coin tosses or dice throws), the probability of each microstate making up a macrostate is identical and equal to 1/m. The entropy simplifies to
$$S = log_{2} m$$.
For case A,
$$S_{A} = log_{2} 6$$.
This case has the maximal information entropy for N=4 random bits.
Compare to the extreme case B of zero 0's. The number of ways to take N objects from N is m=1, so
$$S_{B} = log_{2} 1 = 0$$.
This (and the case of all 0's) have minimal entropy.

In words, case A makes the least assumptions about the bits (equal probability of 0 or 1), which is justified if we have no information to the contrary. This is the case which conveys the maximal amount of information. If that seems confusing, consider case B where every message consists of all 1's. All messages are the same, and we knew what they'd be in advance, so we receive no information at all!

Hope this helps, and gives you a start towards understanding Shannon's information entropy.

Last edited by a moderator: May 3, 2017
9. Nov 25, 2007

### T-7

Splendid. Thanks very much for your help.