# Probability of 0 bit in ASCII text files

by Cylab
Tags: ascii, files, probability, text
P: 54
 Quote by haruspex That's all true, but I'm not sure where it gets you. What are you trying to solve here? Are we done with the N≤8 case?
How about investigation of N > 8 case?
Just cant quiet figure out how N=8A+B works.
Homework
HW Helper
Thanks
P: 9,863
 Quote by Cylab How about investigation of N > 8 case? Just cant quiet figure out how N=8A+B works.
N=8A+B, B<8, means there are A whole bytes and B odd bits, so either A or A+1 MSBs included.
See what gaps you can fill in here:
Prob that this includes A+1 MSB's = ....?; if A+1 MSBs, prob that all N bits are 0 is ...?
Prob that this includes A MSB's = ....?; if A MSBs, prob that all N bits are 0 is ...?
Adding this up, prob that all N bits are 0 is ...?
P: 54
 Quote by haruspex N=8A+B, B<8, means there are A whole bytes and B odd bits, so either A or A+1 MSBs included. See what gaps you can fill in here: Prob that this includes A+1 MSB's = ....?; if A+1 MSBs, prob that all N bits are 0 is ...? Prob that this includes A MSB's = ....?; if A MSBs, prob that all N bits are 0 is ...? Adding this up, prob that all N bits are 0 is ...?
 Quote by haruspex if A+1 MSBs, prob that all N bits are 0 is ...?
That is not easy for me Sir.
Analysis: Since the N bits are drawn consecutively from ASCII, there is only 1 character (out of 2^256), which is are all 0. So only 1 MSB. Thus, the Prob =1/2^256.
Others seem to follow the conception , or I misunderstood your point?

Question:
1) Pr[0] in ASCII (assume each character appears with same ratio) equals = 1/8+1/2=5/8. Is it OK?
2) Successive 7 bits are drawn at random from ASCII bits (e.g. no bias of character distribution), what is Pr[0] in the 7 bits?
Successive 4 bits are drawn (same condition with above), what is Pr[0] in the 4 bits?
So, say, Successive N bits are drawn (same condition) , what is Pr[0] in the N bits?
Analysis:
Do you think it is same case? remember you explained that N/8*2^-N + (1-N/8)*2^-(N+1) . Does the formula apply to the case of 2).

Shed some lights on please.
Homework
HW Helper
Thanks
P: 9,863
 Quote by Cylab Analysis: Since the N bits are drawn consecutively from ASCII, there is only 1 character (out of 2^256), which is are all 0. So only 1 MSB. Thus, the Prob =1/2^256.
The N bits might start in the middle of one byte, span several whole bytes, and finish part way through the last. For each whole byte, prob of all zeroes is 1/128 (since MSB always zero).
If the N bits include A MSBs then how many non-MSBs do they include?
What is the prob that the A MSBs are all 0?
What is the prob that the non-MSBs are all 0?
So what is the prob that all N bits are 0?
 Question: 1) Pr[0] in ASCII (assume each character appears with same ratio) equals = 1/8+1/2=5/8. Is it OK?
No. 1/8 that bit is MSB, so 7/8 that it is non-MSB. P[0] = 1/8 + 7/8*1/2 = 9/16. We went through that much earlier in the thread.
 2) Successive 7 bits are drawn at random from ASCII bits (e.g. no bias of character distribution), what is Pr[0] in the 7 bits? Successive 4 bits are drawn (same condition with above), what is Pr[0] in the 4 bits? So, say, Successive N bits are drawn (same condition) , what is Pr[0] in the N bits? Analysis: Do you think it is same case? remember you explained that N/8*2^-N + (1-N/8)*2^-(N+1) . Does the formula apply to the case of 2).
I believe you are misquoting the formula. In post #27 I explained that the prob of N consecutive 0 bits (N≤8) is 2-N(1+N/8).
P: 54
 Quote by haruspex I believe you are misquoting the formula. In post #27 I explained that the prob of N consecutive 0 bits (N≤8) is 2-N(1+N/8).
That is right and it was good explanation.
Now, say, two N consecutive bits are taken (or two groups) (N1=7, and N2=4).
So the prob is different in following cases in comparing of same amount of 0`s?
1st case: (in N1=7) Pr[0], Pr[00],..,Pr[0000] =?
2nd case: (in N2=4) Pr[0], Pr[00],..,Pr[0000] =?
Homework
HW Helper
Thanks
P: 9,863
 Quote by Cylab 1st case: (in N1=7) Pr[0], Pr[00],..,Pr[0000] =? 2nd case: (in N2=4) Pr[0], Pr[00],..,Pr[0000] =?
I don't understand your question. What does Pr[0] mean in the context of N1=7? Is it the probability that the next bit is zero given the preceding 7 were?
P: 54
 Quote by haruspex I don't understand your question. What does Pr[0] mean in the context of N1=7? Is it the probability that the next bit is zero given the preceding 7 were?
Sorry!
I meant the prob of 0 within N1 =7 consecutively drawn from ASCII .
In other words, say, now we have a group of bits consists of many N1, each of which is consecutive 7 bits drawn from ASCII. what is Prob[0], Prob[00] in the group respectively?

2nd case. another group of bits with same condition, where N2=4. What is Prob[0], Prob[00] within the second group respectively?

Is (N1)Prob[0] = (N2)Prob[0] right, or should it be unequal?
Homework
HW Helper
Thanks
P: 9,863
 Quote by Cylab Is (N1)Prob[0] = (N2)Prob[0] right,
Of course. The bits don't know how many others were chosen.
P: 54
 Quote by haruspex Of course. The bits don't know how many others were chosen.
So you are saying following are correct?

(N1=7)Prob[0] = (N2=4)Prob[0]
(N1=7)Prob[00] = (N2=4)Prob[00]
(N1=7)Prob[000] = (N2=4)Prob[000]
....
 Homework Sci Advisor HW Helper Thanks P: 9,863 To be completely clear: If you choose N consecutive bits, the probability that the first R of those bits are all zero (R <= N), depends only on R. It cannot depend on N. Further, if you choose N consecutive bits, then choose R consecutive bits from those N, the probability that the first R of those bits are all zero depends only on R. (This seems so obvious that I worry that I have not understood the question.)
P: 54
 Quote by haruspex To be completely clear: If you choose N consecutive bits, the probability that the first R of those bits are all zero (R <= N), depends only on R. It cannot depend on N. Further, if you choose N consecutive bits, then choose R consecutive bits from those N, the probability that the first R of those bits are all zero depends only on R. (This seems so obvious that I worry that I have not understood the question.)
case: N1=7 & N2=4 . Assume N is taken from X bits, which is ASCII.
take R=2 bits from N1 & N2 respectively, what is prob that they are two 0 bits.

1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
1st. (N2 case) : {(9X/16)C2 * (7X/16)C2 } / xC4.

Seems it depends on N too.
Homework
HW Helper
Thanks
P: 9,863
 Quote by Cylab 1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
I have no idea what that notation means.
P: 54
 Quote by haruspex I have no idea what that notation means.
Hypergeometric Distribution.

X: number of ASCII bits , from which N is taken.
Pr[0] = 9/16.

xC7 : The number of combinations of x , taken 7 at a time.
Homework
HW Helper
Thanks
P: 9,863
 Quote by Cylab Hypergeometric Distribution. X: number of ASCII bits , from which N is taken. Pr[0] = 9/16. xC7 : The number of combinations of x , taken 7 at a time.
So how do I read (9X/16)C2? If I plug in X=7, that gives (63/16)C2, which is meaningless.
P: 54
 Quote by haruspex So how do I read (9X/16)C2? If I plug in X=7, that gives (63/16)C2, which is meaningless.
C: combinations
Pr[0] = 9/16.
X: number of ASCII bits , from which N is taken.
case: N1=7 & N2=4 . Assume N is taken from X bits, which is ASCII.
Other definitions should be clear
Homework
HW Helper
Thanks
P: 9,863
 Quote by Cylab C: combinations Pr[0] = 9/16. X: number of ASCII bits , from which N is taken. case: N1=7 & N2=4 . Assume N is taken from X bits, which is ASCII. Other definitions should be clear
You wrote (9X/16)C2, and you have still offered no reasonable explanation for that notation. Did you mean (9/16)XC2?
P: 54
 Quote by haruspex You wrote (9X/16)C2, and you have still offered no reasonable explanation for that notation. Did you mean (9/16)XC2?
X: The number of bits in ASCII.
9X/16: The number of 0 bits in the X that are classified as successes.
7 or 4: The number(s) of bits taken consecutively from X.
2: The number of 2 zeros in the 7 or 4 that are classified as successes.
(9X/16)C2 : The number of combinations of 9X/16, taken two 0 bits at a time.
Homework
HW Helper
Thanks
P: 9,863
 Quote by Cylab X: The number of bits in ASCII. 9X/16: The number of 0 bits in the X that are classified as successes. 7 or 4: The number(s) of bits taken consecutively from X. 2: The number of 2 zeros in the 7 or 4 that are classified as successes. (9X/16)C2 : The number of combinations of 9X/16, taken two 0 bits at a time.
Now that you have explained that, thankyou, I can see where it is wrong.
For one thing, that analysis treats all bits as independently 0 or 1, regardless of their proximity to each other. Bits multiples of 8 positions apart will be positively correlated, and at other distances negatively correlated.
More significantly, let's look at what these represent:
1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
1st. (N2 case) : {(9X/16)C2 * (7X/16)C2 } / xC4.
The first is the probability of picking 7 bits that are exactly two 0 bits and 5 1 bits; the second is the prob of picking 4 bits that are exactly 2 and 2. No wonder they're different! In the problem I thought we were discussing, P[00] doesn't care what the remaining 2 or 5 bits are.

 Related Discussions Calculators 2 Programming & Computer Science 5 Programming & Computer Science 0 Programming & Computer Science 1 Computing & Technology 1