Register to reply

Probability of 0 bit in ASCII text files

by Cylab
Tags: ascii, files, probability, text
Share this thread:
Cylab
#37
Nov26-12, 12:20 AM
P: 54
Quote Quote by haruspex View Post
That's all true, but I'm not sure where it gets you. What are you trying to solve here?
Are we done with the N≤8 case?
How about investigation of N > 8 case?
Just can`t quiet figure out how N=8A+B works.
haruspex
#38
Nov26-12, 02:34 PM
Homework
Sci Advisor
HW Helper
Thanks
P: 9,939
Quote Quote by Cylab View Post
How about investigation of N > 8 case?
Just can`t quiet figure out how N=8A+B works.
N=8A+B, B<8, means there are A whole bytes and B odd bits, so either A or A+1 MSBs included.
See what gaps you can fill in here:
Prob that this includes A+1 MSB's = ....?; if A+1 MSBs, prob that all N bits are 0 is ...?
Prob that this includes A MSB's = ....?; if A MSBs, prob that all N bits are 0 is ...?
Adding this up, prob that all N bits are 0 is ...?
Cylab
#39
Nov27-12, 05:48 AM
P: 54
Quote Quote by haruspex View Post
N=8A+B, B<8, means there are A whole bytes and B odd bits, so either A or A+1 MSBs included.
See what gaps you can fill in here:
Prob that this includes A+1 MSB's = ....?; if A+1 MSBs, prob that all N bits are 0 is ...?
Prob that this includes A MSB's = ....?; if A MSBs, prob that all N bits are 0 is ...?
Adding this up, prob that all N bits are 0 is ...?
Quote Quote by haruspex View Post
if A+1 MSBs, prob that all N bits are 0 is ...?
That is not easy for me Sir.
Analysis: Since the N bits are drawn consecutively from ASCII, there is only 1 character (out of 2^256), which is are all 0. So only 1 MSB. Thus, the Prob =1/2^256.
Others seem to follow the conception , or I misunderstood your point?

Question:
1) Pr[0] in ASCII (assume each character appears with same ratio) equals = 1/8+1/2=5/8. Is it OK?
2) Successive 7 bits are drawn at random from ASCII bits (e.g. no bias of character distribution), what is Pr[0] in the 7 bits?
Successive 4 bits are drawn (same condition with above), what is Pr[0] in the 4 bits?
So, say, Successive N bits are drawn (same condition) , what is Pr[0] in the N bits?
Analysis:
Do you think it is same case? remember you explained that N/8*2^-N + (1-N/8)*2^-(N+1) . Does the formula apply to the case of 2).

Shed some lights on please.
haruspex
#40
Nov27-12, 02:26 PM
Homework
Sci Advisor
HW Helper
Thanks
P: 9,939
Quote Quote by Cylab View Post
Analysis: Since the N bits are drawn consecutively from ASCII, there is only 1 character (out of 2^256), which is are all 0. So only 1 MSB. Thus, the Prob =1/2^256.
The N bits might start in the middle of one byte, span several whole bytes, and finish part way through the last. For each whole byte, prob of all zeroes is 1/128 (since MSB always zero).
If the N bits include A MSBs then how many non-MSBs do they include?
What is the prob that the A MSBs are all 0?
What is the prob that the non-MSBs are all 0?
So what is the prob that all N bits are 0?
Question:
1) Pr[0] in ASCII (assume each character appears with same ratio) equals = 1/8+1/2=5/8. Is it OK?
No. 1/8 that bit is MSB, so 7/8 that it is non-MSB. P[0] = 1/8 + 7/8*1/2 = 9/16. We went through that much earlier in the thread.
2) Successive 7 bits are drawn at random from ASCII bits (e.g. no bias of character distribution), what is Pr[0] in the 7 bits?
Successive 4 bits are drawn (same condition with above), what is Pr[0] in the 4 bits?
So, say, Successive N bits are drawn (same condition) , what is Pr[0] in the N bits?
Analysis:
Do you think it is same case? remember you explained that N/8*2^-N + (1-N/8)*2^-(N+1) . Does the formula apply to the case of 2).
I believe you are misquoting the formula. In post #27 I explained that the prob of N consecutive 0 bits (N≤8) is 2-N(1+N/8).
Cylab
#41
Dec1-12, 12:55 AM
P: 54
Quote Quote by haruspex View Post
I believe you are misquoting the formula. In post #27 I explained that the prob of N consecutive 0 bits (N≤8) is 2-N(1+N/8).
That is right and it was good explanation.
Now, say, two N consecutive bits are taken (or two groups) (N1=7, and N2=4).
So the prob is different in following cases in comparing of same amount of 0`s?
1st case: (in N1=7) Pr[0], Pr[00],..,Pr[0000] =?
2nd case: (in N2=4) Pr[0], Pr[00],..,Pr[0000] =?
haruspex
#42
Dec1-12, 02:17 PM
Homework
Sci Advisor
HW Helper
Thanks
P: 9,939
Quote Quote by Cylab View Post
1st case: (in N1=7) Pr[0], Pr[00],..,Pr[0000] =?
2nd case: (in N2=4) Pr[0], Pr[00],..,Pr[0000] =?
I don't understand your question. What does Pr[0] mean in the context of N1=7? Is it the probability that the next bit is zero given the preceding 7 were?
Cylab
#43
Dec1-12, 10:15 PM
P: 54
Quote Quote by haruspex View Post
I don't understand your question. What does Pr[0] mean in the context of N1=7? Is it the probability that the next bit is zero given the preceding 7 were?
Sorry!
I meant the prob of 0 within N1 =7 consecutively drawn from ASCII .
In other words, say, now we have a group of bits consists of many N1, each of which is consecutive 7 bits drawn from ASCII. what is Prob[0], Prob[00] in the group respectively?

2nd case. another group of bits with same condition, where N2=4. What is Prob[0], Prob[00] within the second group respectively?

Is (N1)Prob[0] = (N2)Prob[0] right, or should it be unequal?
haruspex
#44
Dec2-12, 12:18 AM
Homework
Sci Advisor
HW Helper
Thanks
P: 9,939
Quote Quote by Cylab View Post
Is (N1)Prob[0] = (N2)Prob[0] right,
Of course. The bits don't know how many others were chosen.
Cylab
#45
Dec2-12, 12:34 AM
P: 54
Quote Quote by haruspex View Post
Of course. The bits don't know how many others were chosen.
So you are saying following are correct?

(N1=7)Prob[0] = (N2=4)Prob[0]
(N1=7)Prob[00] = (N2=4)Prob[00]
(N1=7)Prob[000] = (N2=4)Prob[000]
....
haruspex
#46
Dec2-12, 02:10 PM
Homework
Sci Advisor
HW Helper
Thanks
P: 9,939
To be completely clear:
If you choose N consecutive bits, the probability that the first R of those bits are all zero (R <= N), depends only on R. It cannot depend on N.
Further, if you choose N consecutive bits, then choose R consecutive bits from those N, the probability that the first R of those bits are all zero depends only on R.
(This seems so obvious that I worry that I have not understood the question.)
Cylab
#47
Dec2-12, 08:40 PM
P: 54
Quote Quote by haruspex View Post
To be completely clear:
If you choose N consecutive bits, the probability that the first R of those bits are all zero (R <= N), depends only on R. It cannot depend on N.
Further, if you choose N consecutive bits, then choose R consecutive bits from those N, the probability that the first R of those bits are all zero depends only on R.
(This seems so obvious that I worry that I have not understood the question.)
case: N1=7 & N2=4 . Assume N is taken from X bits, which is ASCII.
take R=2 bits from N1 & N2 respectively, what is prob that they are two 0 bits.

1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
1st. (N2 case) : {(9X/16)C2 * (7X/16)C2 } / xC4.

Seems it depends on N too.
haruspex
#48
Dec2-12, 08:52 PM
Homework
Sci Advisor
HW Helper
Thanks
P: 9,939
Quote Quote by Cylab View Post
1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
I have no idea what that notation means.
Cylab
#49
Dec2-12, 09:03 PM
P: 54
Quote Quote by haruspex View Post
I have no idea what that notation means.
Hypergeometric Distribution.

X: number of ASCII bits , from which N is taken.
Pr[0] = 9/16.

xC7 : The number of combinations of x , taken 7 at a time.
haruspex
#50
Dec2-12, 09:11 PM
Homework
Sci Advisor
HW Helper
Thanks
P: 9,939
Quote Quote by Cylab View Post
Hypergeometric Distribution.

X: number of ASCII bits , from which N is taken.
Pr[0] = 9/16.

xC7 : The number of combinations of x , taken 7 at a time.
So how do I read (9X/16)C2? If I plug in X=7, that gives (63/16)C2, which is meaningless.
Cylab
#51
Dec2-12, 09:54 PM
P: 54
Quote Quote by haruspex View Post
So how do I read (9X/16)C2? If I plug in X=7, that gives (63/16)C2, which is meaningless.
C: combinations
Pr[0] = 9/16.
X: number of ASCII bits , from which N is taken.
case: N1=7 & N2=4 . Assume N is taken from X bits, which is ASCII.
Other definitions should be clear
haruspex
#52
Dec2-12, 10:08 PM
Homework
Sci Advisor
HW Helper
Thanks
P: 9,939
Quote Quote by Cylab View Post
C: combinations
Pr[0] = 9/16.
X: number of ASCII bits , from which N is taken.
case: N1=7 & N2=4 . Assume N is taken from X bits, which is ASCII.
Other definitions should be clear
You wrote (9X/16)C2, and you have still offered no reasonable explanation for that notation. Did you mean (9/16)XC2?
Cylab
#53
Dec3-12, 05:46 AM
P: 54
Quote Quote by haruspex View Post
You wrote (9X/16)C2, and you have still offered no reasonable explanation for that notation. Did you mean (9/16)XC2?
X: The number of bits in ASCII.
9X/16: The number of 0 bits in the X that are classified as successes.
7 or 4: The number(s) of bits taken consecutively from X.
2: The number of 2 zeros in the 7 or 4 that are classified as successes.
(9X/16)C2 : The number of combinations of 9X/16, taken two 0 bits at a time.
haruspex
#54
Dec3-12, 01:23 PM
Homework
Sci Advisor
HW Helper
Thanks
P: 9,939
Quote Quote by Cylab View Post
X: The number of bits in ASCII.
9X/16: The number of 0 bits in the X that are classified as successes.
7 or 4: The number(s) of bits taken consecutively from X.
2: The number of 2 zeros in the 7 or 4 that are classified as successes.
(9X/16)C2 : The number of combinations of 9X/16, taken two 0 bits at a time.
Now that you have explained that, thankyou, I can see where it is wrong.
For one thing, that analysis treats all bits as independently 0 or 1, regardless of their proximity to each other. Bits multiples of 8 positions apart will be positively correlated, and at other distances negatively correlated.
More significantly, let's look at what these represent:
1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
1st. (N2 case) : {(9X/16)C2 * (7X/16)C2 } / xC4.
The first is the probability of picking 7 bits that are exactly two 0 bits and 5 1 bits; the second is the prob of picking 4 bits that are exactly 2 and 2. No wonder they're different! In the problem I thought we were discussing, P[00] doesn't care what the remaining 2 or 5 bits are.


Register to reply

Related Discussions
How read text files on CASIO fx-9860GII SD? Calculators 2
Several ascii files, read, write, average? Programming & Computer Science 5
Read multiple binary files to ascii Programming & Computer Science 0
Fortran 90 question about reading files with text Programming & Computer Science 1
Indexing a lot of text/pdf files Computing & Technology 1