## definition and interpretation of entropy

It is often said, that entropy is the average number of yes-no
questions you have to ask to obtain the information "which event"
occurred.

In Honerkamp "Statistical Physics An Advanced Approach with
Applications" (2.4.1) I found an explanation of the above dictum, but
I don't understand it. I first cite Honerkamp, tell you what I don't
understand and pose some further questions below.

Perhaps someone can help me!

Honerkamp writes:

"Let {A_1,...,A_N} be a complete, disjoint set of events, i.e A_1
u ....u A_N = \Omega.

Furthermore, let P be a probability defined for those events. We then
define the entropy as S = -k \sum_{i=1}^{N} P(A_i) ln(P(A_i)). Here k
represents a factor which we set 1 for the moment. In the framework of
statistical mechanics k will be Boltzmann's constant k_B."
...
"If an event has occurred, then, as we will show in a moment, -log_2
P(A_j) is a good measure of the number of questions to be asked in
order to find out that it is just A_j which is realized."
...

and now the section I don't understand:

"To show that -log_2 P(A_j) is just equal to the number of required
yes-or-no questions, we first divide \Omega into two disjoint domains
\Omega_1 and \Omega_2 such that \sum_{A_i \in \Omega_1} P(A_i) =
\sum_{A_i \in \Omega_2} P(A_i) = 1/2."

I think this is in general not possible. Consider just the trivial
case P(A_1) = 1 and P(A_i) = 0 for i != 1.

Honerkamp proceeds:

"The first question is now: Is A_j in \Omega_1? Having the answer to
this question we next consider the set containing A_j and multiply the
probabilities for this set by a factor of 2. The sum of probabilities
for this set is now again equal to 1, and we are in the same position
as before with the set \Omega: We divide it again and ask the
corresponding yes-or-no question.This procedure ends after k steps,
where k is the smallest integer such that 2^kP(A_j) becomes equal to
or larger than 1. Consequently, -log_2P(A_j) is a good measure of the
number of yes-or-no questions needed."

Is there a (simple) way to modify this algorithm to get a satisfactory
interpretation of the concept of entropy? Let me make this question
more concrete:

Assume you know the probabilities P_1,...,P_N, you want to find out
what event (A_1 or ... or A_N) occurred and you have an algorithm aa
to do so. Then, if you apply your algorithm and suppose the result
will be A_j, you have to ask, say f_aa(A_j) questions.

Since I have statistical mechanics in mind, let N be a natural number
and let us assume, that the following series converge (in the case of
N = infinity) (do you agree, that this can be assumed??).

In that case, the average number of questions you have to ask is
C_aa((P_k)_k\in{1...N}) := \sum_j P(A_j)f_aa(A_j).

Then the question is: Is there a modification mm of the algorithm
which Honerkamp described (and which doesn't work in my opinion), that
leads to f_mm = -log_2? If not, does an algorithm ll exist at all,
which leads to f_ll = -log_2? If not, how would you motivate the
definition of entropy? If yes, is it true that there is no other
algorithm aa such that C_aa((P_k)_{k\in\{1...N\}}) < C_ll((P_k)_{k\in
\{1...N\}}) for some (P_k)_{k \in \{1...N\}} (P_k \in [0,1]...). If
yes, can you give me a mathematical rigorous proof. If not, can you
give me an algorithm cc which serves as counterexample? In the latter
case: why is then entropy not defined as sum_j P(A_j) f_cc(A_j) ?

Are there other ways to motivate, that -log_2P(A_j) is a good measure
for the "increment of knowledge" by receiving the information that A_j
is realized.

Surely -log_2P(A_j) >= 0 and -log_2P(A_j) = 0 iff P(A_j) = 1, which
are reasonable properties of such a measure, but that's not enough to
motivate the formula -log_2P(A_j)...

References to books or papers where (parts of) my questions were

Thanks,

Chris M

 PhysOrg.com physics news on PhysOrg.com >> Kenneth Wilson, Nobel winner for physics, dies>> Two collider research teams find evidence of new particle Zc(3900)>> Scientists make first direct images of topological insulator's edge currents

## definition and interpretation of entropy

On Jun 24, 8:09am, chris.meyer...@googlemail.com wrote:
> It is often said, that entropy is the average number of yes-no
> questions you have to ask to obtain the information "which event"
> occurred.
>
> In Honerkamp "Statistical Physics An Advanced Approach with
> Applications" (2.4.1) I found an explanation of the above dictum, but
> I don't understand it. I first cite Honerkamp, tell you what I don't
> understand and pose some further questions below.
>

The proof you describe does sound rather handwaving. One good book,
which from memory discusses this interpretation (and others) of
entropy, is Information Theory' by Robert Ash there is a recent
Dover edition.

Entropy is only an average' concept, and one has to consider a number
of events to pull out its meaning. You can check Ash or other
textbooks for complete rigour, but here is the basic argument.

1. For a random event generator, that generates outcome k with
probability p_k, consider a sequence of N outcomes (k1,k2,=85,kN) from
the generator (eg, a dice is thrown N times). It is assumed that each
outcome is generated independently of the others. The probability of
generating a given sequence is then given by
P = (p_1)^(n_1) (p_2)^(n_2) =85. ,
where n_k is the number of times that outcome k appears in the
sequence.

2. Now, as N is increased, we expect that any particular outcome k
will, typically, appear in the sequence a total of approximately
n_k = Np_k
times (rounded to the nearest integer). Such a sequence is called a
typical sequence'. It follows that the probability of generating
given typical sequence is given by, approximately,
P_typ = (p_1)^(Np_1) (p_2)^(Np_2) =85. = 2^{-NH] ,
where H is the Shannon entropy function
H = - sum_k p_k log_2 p_k .

3. It can be shown rigorously is that, for N sufficiently large, the
probability of obtaining a typical' sequence of outcomes becomes as
close to 1 as one desires. Hence, any non-typical sequences can be
ignored FAPP. You will have to consult Ash or another textbook for
this 'law of large numbers'.

4. Since each typical sequence has the same probability, P_typ, of
occurring, and the sum of these probabilities is very close to unity
for large N, the number of typical sequences of length N must be
approximately given by
N_typ = 1 / P_typ = 2^{NH} .

5. We are now almost finished. To determine which actual typical
sequence has occurred in a given run of N outcomes, how many yes/no
questions must I ask? But this is same problem as guessing a number
from 1 to N_typ. I number the possible typical sequences from 1 to
N_typ, and ask `is the sequence of outcomes in the first half or in
the second half. And so on. This will take at most log_2 N_typ
questions, i.e., the number of questions required is
N_q = log_2 N_typ = NH.

6. Finally, the average number of questions required per outcome is
then
N_q / N = H.

 chris.meyer123@googlemail.com schrieb: > It is often said, that entropy is the average number of yes-no > questions you have to ask to obtain the information "which event" > occurred. > > References to books or papers where (parts of) my questions were > answered would be nice. You may find the discussion in Appendix A of http://lanl.arxiv.org/abs/0705.3790 illuminating. Arnold Neumaier

 Similar discussions for: definition and interpretation of entropy Thread Forum Replies Quantum Physics 10 Advanced Physics Learning Materials 0 Advanced Physics Homework 5 General Math 0 Classical Physics 5