| Thread Closed |
definition and interpretation of entropy |
Share Thread |
| Jun24-08, 05:00 AM | #1 |
|
|
definition and interpretation of entropy
It is often said, that entropy is the average number of yes-no
questions you have to ask to obtain the information "which event" occurred. In Honerkamp "Statistical Physics An Advanced Approach with Applications" (2.4.1) I found an explanation of the above dictum, but I don't understand it. I first cite Honerkamp, tell you what I don't understand and pose some further questions below. Perhaps someone can help me! Honerkamp writes: "Let {A_1,...,A_N} be a complete, disjoint set of events, i.e A_1 u ....u A_N = \Omega. Furthermore, let P be a probability defined for those events. We then define the entropy as S = -k \sum_{i=1}^{N} P(A_i) ln(P(A_i)). Here k represents a factor which we set 1 for the moment. In the framework of statistical mechanics k will be Boltzmann's constant k_B." ... "If an event has occurred, then, as we will show in a moment, -log_2 P(A_j) is a good measure of the number of questions to be asked in order to find out that it is just A_j which is realized." ... and now the section I don't understand: "To show that -log_2 P(A_j) is just equal to the number of required yes-or-no questions, we first divide \Omega into two disjoint domains \Omega_1 and \Omega_2 such that \sum_{A_i \in \Omega_1} P(A_i) = \sum_{A_i \in \Omega_2} P(A_i) = 1/2." I think this is in general not possible. Consider just the trivial case P(A_1) = 1 and P(A_i) = 0 for i != 1. Honerkamp proceeds: "The first question is now: Is A_j in \Omega_1? Having the answer to this question we next consider the set containing A_j and multiply the probabilities for this set by a factor of 2. The sum of probabilities for this set is now again equal to 1, and we are in the same position as before with the set \Omega: We divide it again and ask the corresponding yes-or-no question.This procedure ends after k steps, where k is the smallest integer such that 2^kP(A_j) becomes equal to or larger than 1. Consequently, -log_2P(A_j) is a good measure of the number of yes-or-no questions needed." Is there a (simple) way to modify this algorithm to get a satisfactory interpretation of the concept of entropy? Let me make this question more concrete: Assume you know the probabilities P_1,...,P_N, you want to find out what event (A_1 or ... or A_N) occurred and you have an algorithm aa to do so. Then, if you apply your algorithm and suppose the result will be A_j, you have to ask, say f_aa(A_j) questions. Since I have statistical mechanics in mind, let N be a natural number or infinity (Honerkamp doesn't say something about this as I can see) and let us assume, that the following series converge (in the case of N = infinity) (do you agree, that this can be assumed??). In that case, the average number of questions you have to ask is C_aa((P_k)_k\in{1...N}) := \sum_j P(A_j)f_aa(A_j). Then the question is: Is there a modification mm of the algorithm which Honerkamp described (and which doesn't work in my opinion), that leads to f_mm = -log_2? If not, does an algorithm ll exist at all, which leads to f_ll = -log_2? If not, how would you motivate the definition of entropy? If yes, is it true that there is no other algorithm aa such that C_aa((P_k)_{k\in\{1...N\}}) < C_ll((P_k)_{k\in \{1...N\}}) for some (P_k)_{k \in \{1...N\}} (P_k \in [0,1]...). If yes, can you give me a mathematical rigorous proof. If not, can you give me an algorithm cc which serves as counterexample? In the latter case: why is then entropy not defined as sum_j P(A_j) f_cc(A_j) ? Are there other ways to motivate, that -log_2P(A_j) is a good measure for the "increment of knowledge" by receiving the information that A_j is realized. Surely -log_2P(A_j) >= 0 and -log_2P(A_j) = 0 iff P(A_j) = 1, which are reasonable properties of such a measure, but that's not enough to motivate the formula -log_2P(A_j)... References to books or papers where (parts of) my questions were answered would be nice. Thanks, Chris M |
| Jun26-08, 05:00 AM | #2 |
|
|
On 23 Jun, 23:09, chris.meyer...@googlemail.com wrote:
> It is often said, that entropy is the average number of yes-no > questions you have to ask to obtain the information "which event" > occurred. > > In Honerkamp "Statistical Physics An Advanced Approach with > Applications" (2.4.1) I found an explanation of the above dictum, but > I don't understand it. I first cite Honerkamp, tell you what I don't > understand and pose some further questions below. > > Perhaps someone can help me! > > Honerkamp writes: > > *"Let {A_1,...,A_N} be a complete, disjoint set of events, i.e A_1 > u ....u A_N = \Omega. > > Furthermore, let P be a probability defined for those events. We then > define the entropy as S = -k \sum_{i=1}^{N} P(A_i) ln(P(A_i)). Here k > represents a factor which we set 1 for the moment. In the framework of > statistical mechanics k will be Boltzmann's constant k_B." > .. > "If an event has occurred, then, as we will show in a moment, -log_2 > P(A_j) is a good measure of the number of questions to be asked in > order to find out that it is just A_j which is realized." > .. > > and now the section I don't understand: > > "To show that -log_2 P(A_j) is just equal to the number of required > yes-or-no questions, we first divide \Omega into two disjoint domains > \Omega_1 and \Omega_2 such that \sum_{A_i \in \Omega_1} P(A_i) = > \sum_{A_i \in \Omega_2} P(A_i) = 1/2." > > I think this is in general not possible. Consider just the trivial > case P(A_1) = 1 and P(A_i) = 0 for i != 1. > > Honerkamp proceeds: > > "The first question is now: Is A_j in \Omega_1? Having the answer to > this question we next consider the set containing A_j and multiply the > probabilities for this set by a factor of 2. The sum of probabilities > for this set is now again equal to 1, and we are in the same position > as before with the set \Omega: We divide *it again and ask the > corresponding yes-or-no question.This procedure ends after k steps, > where k is the smallest integer such that 2^kP(A_j) becomes equal to > or larger than 1. Consequently, -log_2P(A_j) is a good measure of the > number of yes-or-no questions needed." > > Is there a (simple) way to modify this algorithm to get a satisfactory > interpretation of the concept of entropy? Let me make this question > more concrete: > > Assume you know the probabilities P_1,...,P_N, you want to find out > what event (A_1 or ... or A_N) occurred and you have an algorithm aa > to do so. Then, if you apply your algorithm and suppose the result > will be A_j, you have to ask, say f_aa(A_j) questions. > > Since I have statistical mechanics in mind, let N be a natural number > or infinity (Honerkamp doesn't say something about this as I can see) > and let us assume, that the following series converge (in the case of > N = infinity) (do you agree, that this can be assumed??). > > In that case, the average number of questions you have to ask is > C_aa((P_k)_k\in{1...N}) := \sum_j P(A_j)f_aa(A_j). > > Then the question is: Is there a modification mm of the algorithm > which Honerkamp described (and which doesn't work in my opinion), that > leads to f_mm = -log_2? If not, does an algorithm ll exist at all, > which leads to f_ll = -log_2? If not, how would you motivate the > definition of entropy? If yes, is it true that there is no other > algorithm aa such that C_aa((P_k)_{k\in\{1...N\}}) < *C_ll((P_k)_{k\in > \{1...N\}}) for some (P_k)_{k \in \{1...N\}} (P_k \in [0,1]...). If > yes, can you give me a mathematical rigorous proof. If not, can you > give me an algorithm cc which serves as counterexample? In the latter > case: why is then entropy not defined as sum_j P(A_j) f_cc(A_j) ? > > Are there other ways to motivate, that -log_2P(A_j) is a good measure > for the "increment of knowledge" by receiving the information that A_j > is realized. > > Surely -log_2P(A_j) >= 0 and -log_2P(A_j) = 0 iff P(A_j) = 1, which > are reasonable properties of such a measure, but that's not enough to > motivate the formula -log_2P(A_j)... > > References to books or papers where (parts of) my questions were > answered would be nice. > I think your book explains things very badly. Let us look at a quantum mechanical system. No matter what its complexity it can be expressed in terms of a large but finite number of quantum states. A finite number of quantum states can be expressed as a binary number. It says a bit placed un an up/down true/false position. Quantum theory gives us a clear numerical value, without quantum theory we would have to express states in terms of Cantor's infinite sets. The second law of thermodynamics says that heat flows from a higher to a lower temperature. We define temperature such that dE/dS = T T = Temperature E = Energy S = Entropy This is the normal definition of temperature. Entropy increases and heat flows from a higher to a lower temperature. Entropy defined statistically is Log(number of states) We may thus say that the temperature of a body is dE/d(Log N) where N is the number of states. This is where we take our two partitions. Physicists tend to think in terms of "e" whereas information theory tends to use the base 2. Heat flows so as to maximize the number of Quantum states. We thus find out how many states are created by a nanojoule in each partition. A statistical view point means that we can readily assign temperatures to sub systems within a solid or a gas. We can define color temperatures, we can say that the temperature of the electrons in a florescent tube is 20,000K, similarly for carriers in an LED. What is the meaning of these "temperatures"? It is quantum mechanical and statistical. A binary number does have a feel of Hutter. We can express the first 100 Megabytes of Wikipaedia in the form of a binary number. Hutter has defined AI in entropic terms. What is entropy? It is the same thing as compression. - Ian Parker |
| Jun27-08, 05:00 AM | #3 |
|
|
[Moderator's note: Lines reformatted to make them shorter. Although 80
is a maximum to enable nice display on almost all terminals, it's better to keep your own, original, unquoted text in posts at 72 characters per line or less, to enable it to be quoted a couple of times and still keep things at less than 80 characters per line total. Also, don't use more than 2 characters (including space) for a quote symbol. -P.H.] <chris.meyer123@googlemail.com> wrote in message news:a41d7e38-0715-401f-9ebd-430431dc3cc2@f63g2000hsf.googlegroups.com... > It is often said, that entropy is the average number of yes-no > questions you have to ask to obtain the information "which event" > occurred. This is an explanation of entropy which I suppose is more popular in information sciences than in physics, hence Ian Parker's response. However, let me address your issue from the point you make it. > Honerkamp writes: > > "Let {A_1,...,A_N} be a complete, disjoint set of events, i.e A_1 > u ....u A_N = \Omega. > > Furthermore, let P be a probability defined for those events. We then > define the entropy as S = -k \sum_{i=1}^{N} P(A_i) ln(P(A_i)). Here k > represents a factor which we set 1 for the moment. In the framework of > statistical mechanics k will be Boltzmann's constant k_B." > .. > "If an event has occurred, then, as we will show in a moment, -log_2 > P(A_j) is a good measure of the number of questions to be asked in > order to find out that it is just A_j which is realized." > .. > > and now the section I don't understand: > > "To show that -log_2 P(A_j) is just equal to the number of required > yes-or-no questions, we first divide \Omega into two disjoint domains > \Omega_1 and \Omega_2 such that \sum_{A_i \in \Omega_1} P(A_i) = > \sum_{A_i \in \Omega_2} P(A_i) = 1/2." > > I think this is in general not possible. Consider just the trivial > case P(A_1) = 1 and P(A_i) = 0 for i != 1. Well, in your trivial counter-example, you don't have to ask any questions at all: you already know the answer is A_1. However, writing P_i for P(A_i), let's take (P_1,P_2)=(2/3,1/3), and it also breaks down since you always have to ask one question, but the entropy is less than one bit: the reason for this is you don't get the optimal 1/2--1/2 split corresponding to 1 bit of information, but only the 2/3--1/3 split corresponding to less than 1 bit of information. Honerkamp's explanation thus works only to the extent you can divide the remaining alternatives into two groups such that each group has likelihood 1/2. Of course, this will generally not be the case, and the average number of yes-no questions will therefore be at least the number of bits in the entropy: i.e. at least -Sum[P_i*log2(P_i)]. However, let's instead say you have X_1, X_2,...,X_K drawn independently from 1...N with probabilities (P_1,...,P_N). Again, if the entropy of the distribution is S = s bits, then even given an optimal strategy the expected number of questions would have to be at least K*s. However, given any epsilon>0, for sufficiently large K you can do with less than K*(s+epsilon). E.g. with (P_1,P_2)=(2/3,1/3) and K=2, your first question could be "Is X_1=X_2=1?" which could at least help you get the expected number of questions below 2 (to 17/9 if I'm right). > Honerkamp proceeds: > > "The first question is now: Is A_j in \Omega_1? Having the answer to > this question we next consider the set containing A_j and multiply the > probabilities for this set by a factor of 2. The sum of probabilities > for this set is now again equal to 1, and we are in the same position > as before with the set \Omega: We divide it again and ask the > corresponding yes-or-no question.This procedure ends after k steps, > where k is the smallest integer such that 2^kP(A_j) becomes equal to > or larger than 1. Consequently, -log_2P(A_j) is a good measure of the > number of yes-or-no questions needed." > > > Is there a (simple) way to modify this algorithm to get a satisfactory > interpretation of the concept of entropy? Let me make this question > more concrete: I seem to recall there is an algorithm where you encode (X_1,X_2,...,X_K) as a number in the interval [0,1]---X_1 determines which of the intervals [0,P_1),[P_1,P_1+P_2),..., X_2 determines subintervals similarly, etc.---and you can split the remaining interval in half by each yes-no question. I suspect you cannot take the K=infinity limit directly with respect to the encoding in [0,1] since you'd want 0.99999...<1 in this encoding, but if you use the equivalent ordering of (X_1,...), you can always make perfect 1/2--1/2 splits, so Honerkamp's algorithm would have worked (with some caution I suppose since K=infinity). I'm afraid it's too long since I was looking at this to remember any references. Einar |
| Jun27-08, 05:00 AM | #4 |
|
|
definition and interpretation of entropy
On Jun 24, 8:09am, chris.meyer...@googlemail.com wrote:
> It is often said, that entropy is the average number of yes-no > questions you have to ask to obtain the information "which event" > occurred. > > In Honerkamp "Statistical Physics An Advanced Approach with > Applications" (2.4.1) I found an explanation of the above dictum, but > I don't understand it. I first cite Honerkamp, tell you what I don't > understand and pose some further questions below. > The proof you describe does sound rather handwaving. One good book, which from memory discusses this interpretation (and others) of entropy, is `Information Theory' by Robert Ash there is a recent Dover edition. Entropy is only an `average' concept, and one has to consider a number of events to pull out its meaning. You can check Ash or other textbooks for complete rigour, but here is the basic argument. 1. For a random event generator, that generates outcome k with probability p_k, consider a sequence of N outcomes (k1,k2,=85,kN) from the generator (eg, a dice is thrown N times). It is assumed that each outcome is generated independently of the others. The probability of generating a given sequence is then given by P = (p_1)^(n_1) (p_2)^(n_2) =85. , where n_k is the number of times that outcome k appears in the sequence. 2. Now, as N is increased, we expect that any particular outcome k will, typically, appear in the sequence a total of approximately n_k = Np_k times (rounded to the nearest integer). Such a sequence is called a `typical sequence'. It follows that the probability of generating given typical sequence is given by, approximately, P_typ = (p_1)^(Np_1) (p_2)^(Np_2) =85. = 2^{-NH] , where H is the Shannon entropy function H = - sum_k p_k log_2 p_k . 3. It can be shown rigorously is that, for N sufficiently large, the probability of obtaining a `typical' sequence of outcomes becomes as close to 1 as one desires. Hence, any non-typical sequences can be ignored FAPP. You will have to consult Ash or another textbook for this 'law of large numbers'. 4. Since each typical sequence has the same probability, P_typ, of occurring, and the sum of these probabilities is very close to unity for large N, the number of typical sequences of length N must be approximately given by N_typ = 1 / P_typ = 2^{NH} . 5. We are now almost finished. To determine which actual typical sequence has occurred in a given run of N outcomes, how many yes/no questions must I ask? But this is same problem as guessing a number from 1 to N_typ. I number the possible typical sequences from 1 to N_typ, and ask `is the sequence of outcomes in the first half or in the second half. And so on. This will take at most log_2 N_typ questions, i.e., the number of questions required is N_q = log_2 N_typ = NH. 6. Finally, the average number of questions required per outcome is then N_q / N = H. |
| Jun29-08, 05:00 AM | #5 |
|
|
chris.meyer123@googlemail.com schrieb:
> It is often said, that entropy is the average number of yes-no > questions you have to ask to obtain the information "which event" > occurred. > > References to books or papers where (parts of) my questions were > answered would be nice. You may find the discussion in Appendix A of http://lanl.arxiv.org/abs/0705.3790 illuminating. Arnold Neumaier |
| Thread Closed |
Similar discussions for: definition and interpretation of entropy
|
||||
| Thread | Forum | Replies | ||
| Is the Entropy of the Universe Zero?:(Entropy as Entanglement) | Quantum Physics | 10 | ||
| Structure and Interpretation of Classical Mechanics | Advanced Physics Learning Materials | 0 | ||
| Entropy (information entropy) | Advanced Physics Homework | 5 | ||
| Shannon's entropy definition` | General Math | 0 | ||
| Absolute entropy, definition, measurement, practical use | Classical Physics | 5 | ||