Logarithm of a discrete random variable

Click For Summary
The discussion focuses on the entropy of random strings and how the size of the character set affects it. The formula H = L log_2 N is used to calculate entropy, where L is the string length and N is the number of possible characters. Participants clarify that the entropy applies to random variables rather than specific strings, which do not possess entropy themselves. The conversation also touches on the implications of using random variables for L and N, suggesting that if either is random, the resulting entropy becomes a random variable. Overall, the complexities of defining and calculating entropy in this context are explored, emphasizing the distinction between strings and random variables.
  • #31
John Creighto said:
For each value N can take, the new random variable would be log_2 (N) and have the same probability as that of N.

I think we are in agreement here. However take a look at my post 15 and see if you agree with that. The OP expressed satisfaction with it in post 17.

In any case I was taking the point of view that if I risked $1 to win $512 given a 1/1024 probability of guessing right, then the surprisal value of learning that I won would be 10 bits. To me surprisal, entropy and information all are essentially the same thing in this context. They are all calculated in the same way. If you read through the thread, it's clear that I agree a known value exists with P=1 and the information value is 0. However, there is something important (I believe) about first learning a result which is informative and perhaps surprising.
 
Last edited:
Physics news on Phys.org
  • #32
john creighto said:
for each value n can take, the new random variable would be log_2 (n) and have the same probability as that of n.

Thank you john creighto. Can you explain a bit further.

Assume that N was representing the throw of a six sided fair die. Are you saying that log_2 (N) would be a discrete random variable with possible values {log_2 (1), log_2 (2), log_2 (3), log_2 (4), log_2 (5), log_2 (6)}? This would seem intuitive but please forgive me, high school offered me no exposure to probability and stats.

Thanks again for contributing. This is really the discussion that I had intended for this thread to be.
 
  • #33
LuculentCabal said:
Thank you john creighto. Can you explain a bit further.

Assume that N was representing the throw of a six sided fair die. Are you saying that log_2 (N) would be a discrete random variable with possible values {log_2 (1), log_2 (2), log_2 (3), log_2 (4), log_2 (5), log_2 (6)}? This would seem intuitive but please forgive me, high school offered me no exposure to probability and stats.

Thanks again for contributing. This is really the discussion that I had intended for this thread to be.

Yes. That is exactly what I am saying.
 
  • #34
LuculentCabal, although what John says is correct, it is probably NOT what you want to do (no fault of John's). However, I wash my hands of this thread.
 
Last edited:
  • #35
LuculentCabal said:
Thank you john creighto. Can you explain a bit further.

Assume that N was representing the throw of a six sided fair die. Are you saying that log_2 (N) would be a discrete random variable with possible values {log_2 (1), log_2 (2), log_2 (3), log_2 (4), log_2 (5), log_2 (6)}? This would seem intuitive but please forgive me, high school offered me no exposure to probability and stats.

Thanks again for contributing. This is really the discussion that I had intended for this thread to be.

Hello LuculentCabal.

Using your formula, H would be random variable over the set {0, 1, 1.58, 2, 2.33, 2.59} assuming a uniform probability distribution of die face outcomes and a fixed L=1. Is this what you want? N as you define it, is the number of (distinct) characters in the set for which H is determined. This number is determined by the result of the die throw. We are NOT talking about the set of possible outcomes for the die throw which have N=6 and a uniform P=1/6. In this case H is constant and equals 2.59 for L=1.
 
Last edited:
  • #36
SW VandeCarr said:
Hello LuculentCabal.

Using your formula, H would be random variable over the set {0, 1, 1.58, 2, 2.33, 2.59} assuming a uniform probability distribution of die face outcomes and a fixed L=1. Is this what you want? N as you define it, is the number of (distinct) characters in the set for which H is determined. This number is determined by the result of the die throw. We are NOT talking about the set of possible outcomes for the die throw which have N=6 and a uniform P=1/6. In this case H is constant and equals 2.59 for L=1.

OK, I am starting to confuse myself here.

//-------------------------------Begin Brain Storm----------------------------------
Letting L = 1:

If N were a six-sided fair-die throw, there would be six possible outcomes so N would be six. In this case, H would just be 2.59.

However, if you threw a six-sided fair-die to determine the number of sides on your fair-die N, then H would be a random variable over the set {0, 1, 1.58, 2, 2.33, 2.59}
//-------------------------------End Brain Storm----------------------------------

Perhaps I am confusing random variables and random processes, but those are details for another thread. If this brainstorm is correct, then I will have no further questions/comments for this thread.

Thank you all again. It has been greatly appreciated.
 
Last edited:
  • #37
LuculentCabal said:
I am trying to explore a number of things regarding the entropy of random strings and am wondering how a character set of random size would affect the entropy of strings made from that set.

Using the following formula, I need to take the log of a discrete random variable
H = L\log_2 N

where:
H is the entropy of the string in bits,
L is the length of the string in characters
N is the discrete random variable representing the number of possible characters to choose from

How do you take the logarithm of a discrete random variable? Is there a general method that takes into account any maximum or minimum size of this variable?

Thanks in advance

These are your definitions of N and H. (Actually N is just the number of characters if you are defining N as a random variable.) Usually these are constants. You're making N (and therefore H) variables. You can do this, but you can't change the number of faces on the die. If you're letting the die determine the value of N, fine, but the character set you are actually then using is some unspecified set of sets ranging from 1 to 6 characters. You need not specify the characters other than they are each unique (no repeats within any of the six subsets: {{a},{a,b},...,{a,b,c,d,e,f}}). Like I said, this is pretty esoteric.

If you were using the die in the usual way it's quite straightforward: H=(L)log_2(N)=log_2(6)=2.59 when L=1. If you want L to also be a variable then H=L(2.59) in this case.
 
Last edited:
  • #38
SW VandeCarr said:
These are your definitions of N and H. (Actually N is just the number of characters if you are defining N as a random variable.) Usually these are constants. You're making N (and therefore H) variables. You can do this, but you can't change the number of faces on the die. If you're letting the die determine the value of N, fine, but the character set you are actually then using is some unspecified set of sets ranging from 1 to 6 characters. You need not specify the characters other than they are each unique (no repeats within any of the six subsets: {{a},{a,b},...,{a,b,c,d,e,f}}). Like I said, this is pretty esoteric.

If you were using the die in the usual way it's quite straightforward: H=(L)log_2(N)=log_2(6)=2.59 when L=1. If you want L to also be a variable then H=L(2.59) in this case.

I am defining N as being the length of a set of characters where the length of the set is random (hence rolling a die to determine the number of sides on a die to roll), but I think we are both agreeing on the same thing. As I have said, I have no formal training in any of this and perhaps N should be defined as aprocess and not a variable (if there is a difference [details for another thread]?).

The bottom line is that I now know what the log (or other functions) of discrete random variables are and that will be all from me.

Thank you all again for your help.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K