# Character strings as random variables?

1. Jul 14, 2009

### SW VandeCarr

Consider a character string randomly generated from an alphabet {T,H} of length L, where T and H each have a probability of 0.5. For an arbitrary finite L the probability of a given string is p=(0.5)^L.

A probability is the sole determinant of Shannon entropy (S). Therefore I'm claiming that such character strings have Shannon entropy which, given a uniform PDF, would be S=-logb2(P)^L.

This is my reasoning for claiming that such character strings have entropy. I've been challenged on this based on the argument that each element of the string is a random variable, but the entire string is a "constant". In fact, there is no specification that the string need be generated sequentially. A string, as defined above, where L=10 has 1024 possible outcomes or states. Is this not an example of entropy?

EDIT: In addition, I'm claiming that if L were an RV and P(T or F) is fixed, then S is a random variable with a known PDF.

Last edited: Jul 14, 2009
2. Jul 14, 2009

### John Creighto

I think the confusion may be that a particular string is not a random variable but a realization/event of a random variable. Maybe in your past debate you were just having a comunication problem.

3. Jul 14, 2009

### SW VandeCarr

Well, that is the root of the problem apparently. But if you take the view that an outcome, once observed, has no information, then information/entropy doesn't exist as an observable. If we have a system which has 1024 equally probable states than the entropy of that system is 10 in Shannon measure, is it not? The string that is observed is one randomly realized state of the system. What is the proper context for the concept of information/entropy?

Last edited: Jul 14, 2009
4. Jul 14, 2009

### John Creighto

That all makes sense to me. Keep in mind though I haven't studied Shannon entropy but did try reading his paper once (a long long time ago).

5. Jul 15, 2009

### SW VandeCarr

The essential thing you need to know is that entropy is defined as:

$$S = -k \sum (p(x_{i})log_{2} p(x_{i}))$$

Therefore any value that can be calculated from the appropriate input parameters by means of this equation is entropy. If entropy can be calculated for a character string, then the string has entropy. In the thermodynamic version, k is the Boltzmann constant and it applies to a system whose microstate is defined in terms of the kinetic energy (KE) of the individual particles and the macrostate in terms of temperature (T) resulting in S= KE/T in SI units for systems in thermal equilibrium.

In the statistical application the same equation applies usually with k=1. The the input values in the case at hand is the alphabet {T,H}, the length(L) of the string and the probability of the string $$(0.5)^{L}$$.

So character strings can have entropy. The question is how they have entropy. If you assume that a string is generated sequentially, then the character output is known as the process proceeds. Here you can argue that each unit of output is the mapping of a random variable (RV), but the string as a whole is not the mapping of a RV. However, if the string is considered as a unit entity (one state of a system), then the entire string is the mapping of a random variable onto the event space.

Last edited: Jul 16, 2009
6. Jul 16, 2009

### John Creighto

When, I first read this I thought "So". So I decided to look at your other thread to try and understand why you are making what seems to be a seemingly obvious point. A string has no entropy as it is a particular instance of a random variable but each string is one state or mode. The entropy of the entire system is based upon the number of modes and the probability of each mode. So now that we are in agreement so far let's get back to your other thread.