Logarithm of a discrete random variable

Click For Summary

Discussion Overview

The discussion revolves around the entropy of random strings and the implications of using a discrete random variable to represent the number of possible characters in a string. Participants explore how the size of a character set affects entropy, the nature of strings versus random variables, and the calculation of logarithms in this context.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant inquires about taking the logarithm of a discrete random variable and its implications for entropy calculation.
  • Another participant asserts that the alphabet length is a constant, not a random variable, challenging the initial premise.
  • A clarification is made regarding the maximum and minimum values of the discrete random variable representing character choices, suggesting it can vary between 1 and 26 for the alphabet.
  • Some participants argue that strings built from independent and identically distributed (iid) random characters can be considered random variables with associated entropy.
  • There is a discussion about whether strings themselves can possess entropy, with some asserting that they do not, while others argue that they can have varying levels of order and thus entropy.
  • One participant introduces the concept of Kolmogorov complexity as a way to assign entropy to fixed strings, noting it differs from Shannon entropy.
  • Concerns are raised about how the randomness of the length of the string (L) or the character set (N) affects the overall entropy calculation.
  • Participants discuss the implications of using random processes to determine L and how that would influence the entropy of the resulting random variable.
  • There is a distinction made between a specific string and a sequence of random variables, emphasizing that a specific string does not have Shannon entropy.

Areas of Agreement / Disagreement

Participants express differing views on whether strings can have entropy and the nature of random variables versus fixed strings. There is no consensus on the implications of these distinctions for entropy calculations.

Contextual Notes

Some participants note the importance of distinguishing between random variables and specific strings, as well as the potential confusion surrounding the definitions of entropy in different contexts.

  • #31
John Creighto said:
For each value N can take, the new random variable would be log_2 (N) and have the same probability as that of N.

I think we are in agreement here. However take a look at my post 15 and see if you agree with that. The OP expressed satisfaction with it in post 17.

In any case I was taking the point of view that if I risked $1 to win $512 given a 1/1024 probability of guessing right, then the surprisal value of learning that I won would be 10 bits. To me surprisal, entropy and information all are essentially the same thing in this context. They are all calculated in the same way. If you read through the thread, it's clear that I agree a known value exists with P=1 and the information value is 0. However, there is something important (I believe) about first learning a result which is informative and perhaps surprising.
 
Last edited:
Physics news on Phys.org
  • #32
john creighto said:
for each value n can take, the new random variable would be log_2 (n) and have the same probability as that of n.

Thank you john creighto. Can you explain a bit further.

Assume that N was representing the throw of a six sided fair die. Are you saying that log_2 (N) would be a discrete random variable with possible values {log_2 (1), log_2 (2), log_2 (3), log_2 (4), log_2 (5), log_2 (6)}? This would seem intuitive but please forgive me, high school offered me no exposure to probability and stats.

Thanks again for contributing. This is really the discussion that I had intended for this thread to be.
 
  • #33
LuculentCabal said:
Thank you john creighto. Can you explain a bit further.

Assume that N was representing the throw of a six sided fair die. Are you saying that log_2 (N) would be a discrete random variable with possible values {log_2 (1), log_2 (2), log_2 (3), log_2 (4), log_2 (5), log_2 (6)}? This would seem intuitive but please forgive me, high school offered me no exposure to probability and stats.

Thanks again for contributing. This is really the discussion that I had intended for this thread to be.

Yes. That is exactly what I am saying.
 
  • #34
LuculentCabal, although what John says is correct, it is probably NOT what you want to do (no fault of John's). However, I wash my hands of this thread.
 
Last edited:
  • #35
LuculentCabal said:
Thank you john creighto. Can you explain a bit further.

Assume that N was representing the throw of a six sided fair die. Are you saying that log_2 (N) would be a discrete random variable with possible values {log_2 (1), log_2 (2), log_2 (3), log_2 (4), log_2 (5), log_2 (6)}? This would seem intuitive but please forgive me, high school offered me no exposure to probability and stats.

Thanks again for contributing. This is really the discussion that I had intended for this thread to be.

Hello LuculentCabal.

Using your formula, H would be random variable over the set {0, 1, 1.58, 2, 2.33, 2.59} assuming a uniform probability distribution of die face outcomes and a fixed L=1. Is this what you want? N as you define it, is the number of (distinct) characters in the set for which H is determined. This number is determined by the result of the die throw. We are NOT talking about the set of possible outcomes for the die throw which have N=6 and a uniform P=1/6. In this case H is constant and equals 2.59 for L=1.
 
Last edited:
  • #36
SW VandeCarr said:
Hello LuculentCabal.

Using your formula, H would be random variable over the set {0, 1, 1.58, 2, 2.33, 2.59} assuming a uniform probability distribution of die face outcomes and a fixed L=1. Is this what you want? N as you define it, is the number of (distinct) characters in the set for which H is determined. This number is determined by the result of the die throw. We are NOT talking about the set of possible outcomes for the die throw which have N=6 and a uniform P=1/6. In this case H is constant and equals 2.59 for L=1.

OK, I am starting to confuse myself here.

//-------------------------------Begin Brain Storm----------------------------------
Letting L = 1:

If N were a six-sided fair-die throw, there would be six possible outcomes so N would be six. In this case, H would just be 2.59.

However, if you threw a six-sided fair-die to determine the number of sides on your fair-die N, then H would be a random variable over the set {0, 1, 1.58, 2, 2.33, 2.59}
//-------------------------------End Brain Storm----------------------------------

Perhaps I am confusing random variables and random processes, but those are details for another thread. If this brainstorm is correct, then I will have no further questions/comments for this thread.

Thank you all again. It has been greatly appreciated.
 
Last edited:
  • #37
LuculentCabal said:
I am trying to explore a number of things regarding the entropy of random strings and am wondering how a character set of random size would affect the entropy of strings made from that set.

Using the following formula, I need to take the log of a discrete random variable
H = L\log_2 N

where:
H is the entropy of the string in bits,
L is the length of the string in characters
N is the discrete random variable representing the number of possible characters to choose from

How do you take the logarithm of a discrete random variable? Is there a general method that takes into account any maximum or minimum size of this variable?

Thanks in advance

These are your definitions of N and H. (Actually N is just the number of characters if you are defining N as a random variable.) Usually these are constants. You're making N (and therefore H) variables. You can do this, but you can't change the number of faces on the die. If you're letting the die determine the value of N, fine, but the character set you are actually then using is some unspecified set of sets ranging from 1 to 6 characters. You need not specify the characters other than they are each unique (no repeats within any of the six subsets: {{a},{a,b},...,{a,b,c,d,e,f}}). Like I said, this is pretty esoteric.

If you were using the die in the usual way it's quite straightforward: H=(L)log_2(N)=log_2(6)=2.59 when L=1. If you want L to also be a variable then H=L(2.59) in this case.
 
Last edited:
  • #38
SW VandeCarr said:
These are your definitions of N and H. (Actually N is just the number of characters if you are defining N as a random variable.) Usually these are constants. You're making N (and therefore H) variables. You can do this, but you can't change the number of faces on the die. If you're letting the die determine the value of N, fine, but the character set you are actually then using is some unspecified set of sets ranging from 1 to 6 characters. You need not specify the characters other than they are each unique (no repeats within any of the six subsets: {{a},{a,b},...,{a,b,c,d,e,f}}). Like I said, this is pretty esoteric.

If you were using the die in the usual way it's quite straightforward: H=(L)log_2(N)=log_2(6)=2.59 when L=1. If you want L to also be a variable then H=L(2.59) in this case.

I am defining N as being the length of a set of characters where the length of the set is random (hence rolling a die to determine the number of sides on a die to roll), but I think we are both agreeing on the same thing. As I have said, I have no formal training in any of this and perhaps N should be defined as aprocess and not a variable (if there is a difference [details for another thread]?).

The bottom line is that I now know what the log (or other functions) of discrete random variables are and that will be all from me.

Thank you all again for your help.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 25 ·
Replies
25
Views
9K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K