Character strings as random variables?

Click For Summary

Discussion Overview

The discussion revolves around the concept of character strings generated from a binary alphabet {T,H} and their relationship to Shannon entropy. Participants explore whether such strings can be considered random variables and how entropy is defined in this context, examining both theoretical and conceptual aspects.

Discussion Character

  • Exploratory
  • Debate/contested
  • Conceptual clarification

Main Points Raised

  • Some participants claim that a character string of length L, generated from a uniform distribution, has Shannon entropy defined as S=-logb2(P)^L, where P is the probability of the string.
  • Others argue that while each element of the string can be viewed as a random variable, the entire string itself is a realization or constant, raising questions about its entropy.
  • A participant suggests that if the string is considered as a unit entity, it could be viewed as a mapping of a random variable onto the event space.
  • There is a discussion about the implications of observing a particular string and whether it retains any information or entropy once realized.
  • Some participants highlight that the entropy of a system with multiple equally probable states can be calculated, but the context of how strings are generated influences the interpretation of their entropy.
  • One participant mentions the thermodynamic definition of entropy and its application to character strings, suggesting that if entropy can be calculated, then the strings possess entropy.

Areas of Agreement / Disagreement

Participants express differing views on whether character strings can be considered to have entropy, with some asserting they do and others contesting this based on the nature of the strings as realizations of random variables. The discussion remains unresolved with multiple competing perspectives.

Contextual Notes

Participants note that the definitions and interpretations of entropy depend on assumptions about how strings are generated and observed, leading to potential limitations in the discussion.

SW VandeCarr
Messages
2,199
Reaction score
77
Consider a character string randomly generated from an alphabet {T,H} of length L, where T and H each have a probability of 0.5. For an arbitrary finite L the probability of a given string is p=(0.5)^L.

A probability is the sole determinant of Shannon entropy (S). Therefore I'm claiming that such character strings have Shannon entropy which, given a uniform PDF, would be S=-logb2(P)^L.

This is my reasoning for claiming that such character strings have entropy. I've been challenged on this based on the argument that each element of the string is a random variable, but the entire string is a "constant". In fact, there is no specification that the string need be generated sequentially. A string, as defined above, where L=10 has 1024 possible outcomes or states. Is this not an example of entropy?

EDIT: In addition, I'm claiming that if L were an RV and P(T or F) is fixed, then S is a random variable with a known PDF.
(see also LuculentCabal:logarithm of discrete RV Jul 12)
 
Last edited:
Physics news on Phys.org
SW VandeCarr said:
Consider a character string randomly generated from an alphabet {T,H} of length L, where T and H each have a probability of 0.5. For an arbitrary finite L the probability of a given string is p=(0.5)^L.

A probability is the sole determinant of Shannon entropy (S). Therefore I'm claiming that such character strings have Shannon entropy which, given a uniform PDF, would be S=-logb2(P)^L.

This is my reasoning for claiming that such character strings have entropy. I've been challenged on this based on the argument that each element of the string is a random variable, but the entire string is a "constant". In fact, there is no specification that the string need be generated sequentially. A string, as defined above, where L=10 has 1024 possible outcomes or states. Is this not an example of entropy?

EDIT: In addition, I'm claiming that if L were an RV and P(T or F) is fixed, then S is a random variable with a known PDF.
(see also LuculentCabal:logarithm of discrete RV Jul 12)

I think the confusion may be that a particular string is not a random variable but a realization/event of a random variable. Maybe in your past debate you were just having a comunication problem.
 
John Creighto said:
I think the confusion may be that a particular string is not a random variable but a realization/event of a random variable. Maybe in your past debate you were just having a comunication problem.

Well, that is the root of the problem apparently. But if you take the view that an outcome, once observed, has no information, then information/entropy doesn't exist as an observable. If we have a system which has 1024 equally probable states than the entropy of that system is 10 in Shannon measure, is it not? The string that is observed is one randomly realized state of the system. What is the proper context for the concept of information/entropy?
 
Last edited:
SW VandeCarr said:
Well, that is the root of the problem apparently. But if you take the view that an outcome, once observed, has no information, then information/entropy doesn't exist as an observable. If we have a system which has 1024 equally probable states than the entropy of that system is 10 in Shannon measure, is it not? The string that is observed is one randomly realized state of the system. What is the proper context for the concept of information/entropy?

That all makes sense to me. Keep in mind though I haven't studied Shannon entropy but did try reading his paper once (a long long time ago).
 
John Creighto said:
That all makes sense to me. Keep in mind though I haven't studied Shannon entropy but did try reading his paper once (a long long time ago).

The essential thing you need to know is that entropy is defined as:

S = -k \sum (p(x_{i})log_{2} p(x_{i}))

Therefore any value that can be calculated from the appropriate input parameters by means of this equation is entropy. If entropy can be calculated for a character string, then the string has entropy. In the thermodynamic version, k is the Boltzmann constant and it applies to a system whose microstate is defined in terms of the kinetic energy (KE) of the individual particles and the macrostate in terms of temperature (T) resulting in S= KE/T in SI units for systems in thermal equilibrium.

In the statistical application the same equation applies usually with k=1. The the input values in the case at hand is the alphabet {T,H}, the length(L) of the string and the probability of the string (0.5)^{L}.

So character strings can have entropy. The question is how they have entropy. If you assume that a string is generated sequentially, then the character output is known as the process proceeds. Here you can argue that each unit of output is the mapping of a random variable (RV), but the string as a whole is not the mapping of a RV. However, if the string is considered as a unit entity (one state of a system), then the entire string is the mapping of a random variable onto the event space.
 
Last edited:
When, I first read this I thought "So". So I decided to look at your other thread to try and understand why you are making what seems to be a seemingly obvious point. A string has no entropy as it is a particular instance of a random variable but each string is one state or mode. The entropy of the entire system is based upon the number of modes and the probability of each mode. So now that we are in agreement so far let's get back to your other thread.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 37 ·
2
Replies
37
Views
10K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K