## Probability of 0 bit in ASCII text files

 Quote by haruspex How about you complete that and post the formula you get? A and B are integers, 0 ≤ B < 8, N=8A+B. That completely defines A and B. A = integer part of N/8; B = N modulo 8.
OK! But the analysis concentrates on MSB only (say, N=7); so the formula maybe like;
P(MSB)=P(MSB |H0)P(H0) + P(MSB |H1)P(H1)
(Let Hi be the event that there are i MSB bits in N, for i = 0, 1, 2, 3….. )
where P(MSB |H0) stands for conditional probability of MSB bit in N given it is H0 which equals 0 (no MSB) and P(MSB |H1) stands for conditional probability of MSB bit in the N given it is H1 which equals 1/7 (one MSB);

But it is little different from the point of N=8A+B. right?
I could not figure out the computation.

Recognitions:
Homework Help
 Quote by Cylab the analysis concentrates on MSB only (say, N=7); so the formula maybe like; P(MSB)=P(MSB |H0)P(H0) + P(MSB |H1)P(H1) (Let Hi be the event that there are i MSB bits in N, for i = 0, 1, 2, 3….. ) where P(MSB |H0) stands for conditional probability of MSB bit in N given it is H0 which equals 0 (no MSB) and P(MSB |H1) stands for conditional probability of MSB bit in the N given it is H1 which equals 1/7 (one MSB);
That's all true, but I'm not sure where it gets you. What are you trying to solve here?
Are we done with the N≤8 case?

 Quote by haruspex That's all true, but I'm not sure where it gets you. What are you trying to solve here? Are we done with the N≤8 case?
How about investigation of N > 8 case?
Just cant quiet figure out how N=8A+B works.

Recognitions:
Homework Help
 Quote by Cylab How about investigation of N > 8 case? Just cant quiet figure out how N=8A+B works.
N=8A+B, B<8, means there are A whole bytes and B odd bits, so either A or A+1 MSBs included.
See what gaps you can fill in here:
Prob that this includes A+1 MSB's = ....?; if A+1 MSBs, prob that all N bits are 0 is ...?
Prob that this includes A MSB's = ....?; if A MSBs, prob that all N bits are 0 is ...?
Adding this up, prob that all N bits are 0 is ...?

 Quote by haruspex N=8A+B, B<8, means there are A whole bytes and B odd bits, so either A or A+1 MSBs included. See what gaps you can fill in here: Prob that this includes A+1 MSB's = ....?; if A+1 MSBs, prob that all N bits are 0 is ...? Prob that this includes A MSB's = ....?; if A MSBs, prob that all N bits are 0 is ...? Adding this up, prob that all N bits are 0 is ...?
 Quote by haruspex if A+1 MSBs, prob that all N bits are 0 is ...?
That is not easy for me Sir.
Analysis: Since the N bits are drawn consecutively from ASCII, there is only 1 character (out of 2^256), which is are all 0. So only 1 MSB. Thus, the Prob =1/2^256.
Others seem to follow the conception , or I misunderstood your point?

Question:
1) Pr[0] in ASCII (assume each character appears with same ratio) equals = 1/8+1/2=5/8. Is it OK?
2) Successive 7 bits are drawn at random from ASCII bits (e.g. no bias of character distribution), what is Pr[0] in the 7 bits?
Successive 4 bits are drawn (same condition with above), what is Pr[0] in the 4 bits?
So, say, Successive N bits are drawn (same condition) , what is Pr[0] in the N bits?
Analysis:
Do you think it is same case? remember you explained that N/8*2^-N + (1-N/8)*2^-(N+1) . Does the formula apply to the case of 2).

Recognitions:
Homework Help
 Quote by Cylab Analysis: Since the N bits are drawn consecutively from ASCII, there is only 1 character (out of 2^256), which is are all 0. So only 1 MSB. Thus, the Prob =1/2^256.
The N bits might start in the middle of one byte, span several whole bytes, and finish part way through the last. For each whole byte, prob of all zeroes is 1/128 (since MSB always zero).
If the N bits include A MSBs then how many non-MSBs do they include?
What is the prob that the A MSBs are all 0?
What is the prob that the non-MSBs are all 0?
So what is the prob that all N bits are 0?
 Question: 1) Pr[0] in ASCII (assume each character appears with same ratio) equals = 1/8+1/2=5/8. Is it OK?
No. 1/8 that bit is MSB, so 7/8 that it is non-MSB. P[0] = 1/8 + 7/8*1/2 = 9/16. We went through that much earlier in the thread.
 2) Successive 7 bits are drawn at random from ASCII bits (e.g. no bias of character distribution), what is Pr[0] in the 7 bits? Successive 4 bits are drawn (same condition with above), what is Pr[0] in the 4 bits? So, say, Successive N bits are drawn (same condition) , what is Pr[0] in the N bits? Analysis: Do you think it is same case? remember you explained that N/8*2^-N + (1-N/8)*2^-(N+1) . Does the formula apply to the case of 2).
I believe you are misquoting the formula. In post #27 I explained that the prob of N consecutive 0 bits (N≤8) is 2-N(1+N/8).

 Quote by haruspex I believe you are misquoting the formula. In post #27 I explained that the prob of N consecutive 0 bits (N≤8) is 2-N(1+N/8).
That is right and it was good explanation.
Now, say, two N consecutive bits are taken (or two groups) (N1=7, and N2=4).
So the prob is different in following cases in comparing of same amount of 0`s?
1st case: (in N1=7) Pr[0], Pr[00],..,Pr[0000] =?
2nd case: (in N2=4) Pr[0], Pr[00],..,Pr[0000] =?

Recognitions:
Homework Help
 Quote by Cylab 1st case: (in N1=7) Pr[0], Pr[00],..,Pr[0000] =? 2nd case: (in N2=4) Pr[0], Pr[00],..,Pr[0000] =?
I don't understand your question. What does Pr[0] mean in the context of N1=7? Is it the probability that the next bit is zero given the preceding 7 were?

 Quote by haruspex I don't understand your question. What does Pr[0] mean in the context of N1=7? Is it the probability that the next bit is zero given the preceding 7 were?
Sorry!
I meant the prob of 0 within N1 =7 consecutively drawn from ASCII .
In other words, say, now we have a group of bits consists of many N1, each of which is consecutive 7 bits drawn from ASCII. what is Prob[0], Prob[00] in the group respectively?

2nd case. another group of bits with same condition, where N2=4. What is Prob[0], Prob[00] within the second group respectively?

Is (N1)Prob[0] = (N2)Prob[0] right, or should it be unequal?

Recognitions:
Homework Help
 Quote by Cylab Is (N1)Prob[0] = (N2)Prob[0] right,
Of course. The bits don't know how many others were chosen.

 Quote by haruspex Of course. The bits don't know how many others were chosen.
So you are saying following are correct?

(N1=7)Prob[0] = (N2=4)Prob[0]
(N1=7)Prob[00] = (N2=4)Prob[00]
(N1=7)Prob[000] = (N2=4)Prob[000]
....
 Recognitions: Homework Help Science Advisor To be completely clear: If you choose N consecutive bits, the probability that the first R of those bits are all zero (R <= N), depends only on R. It cannot depend on N. Further, if you choose N consecutive bits, then choose R consecutive bits from those N, the probability that the first R of those bits are all zero depends only on R. (This seems so obvious that I worry that I have not understood the question.)

 Quote by haruspex To be completely clear: If you choose N consecutive bits, the probability that the first R of those bits are all zero (R <= N), depends only on R. It cannot depend on N. Further, if you choose N consecutive bits, then choose R consecutive bits from those N, the probability that the first R of those bits are all zero depends only on R. (This seems so obvious that I worry that I have not understood the question.)
case: N1=7 & N2=4 . Assume N is taken from X bits, which is ASCII.
take R=2 bits from N1 & N2 respectively, what is prob that they are two 0 bits.

1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
1st. (N2 case) : {(9X/16)C2 * (7X/16)C2 } / xC4.

Seems it depends on N too.

Recognitions:
Homework Help
 Quote by Cylab 1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
I have no idea what that notation means.

 Quote by haruspex I have no idea what that notation means.
Hypergeometric Distribution.

X: number of ASCII bits , from which N is taken.
Pr[0] = 9/16.

xC7 : The number of combinations of x , taken 7 at a time.

Recognitions:
Homework Help