Probability of 0 bit in ASCII text files

  • #51
haruspex said:
So how do I read (9X/16)C2? If I plug in X=7, that gives (63/16)C2, which is meaningless.

C: combinations
Pr[0] = 9/16.
X: number of ASCII bits , from which N is taken.
case: N1=7 & N2=4 . Assume N is taken from X bits, which is ASCII.
Other definitions should be clear
 
Physics news on Phys.org
  • #52
Cylab said:
C: combinations
Pr[0] = 9/16.
X: number of ASCII bits , from which N is taken.
case: N1=7 & N2=4 . Assume N is taken from X bits, which is ASCII.
Other definitions should be clear
You wrote (9X/16)C2, and you have still offered no reasonable explanation for that notation. Did you mean (9/16)XC2?
 
  • #53
haruspex said:
You wrote (9X/16)C2, and you have still offered no reasonable explanation for that notation. Did you mean (9/16)XC2?

X: The number of bits in ASCII.
9X/16: The number of 0 bits in the X that are classified as successes.
7 or 4: The number(s) of bits taken consecutively from X.
2: The number of 2 zeros in the 7 or 4 that are classified as successes.
(9X/16)C2 : The number of combinations of 9X/16, taken two 0 bits at a time.
 
  • #54
Cylab said:
X: The number of bits in ASCII.
9X/16: The number of 0 bits in the X that are classified as successes.
7 or 4: The number(s) of bits taken consecutively from X.
2: The number of 2 zeros in the 7 or 4 that are classified as successes.
(9X/16)C2 : The number of combinations of 9X/16, taken two 0 bits at a time.
Now that you have explained that, thankyou, I can see where it is wrong.
For one thing, that analysis treats all bits as independently 0 or 1, regardless of their proximity to each other. Bits multiples of 8 positions apart will be positively correlated, and at other distances negatively correlated.
More significantly, let's look at what these represent:
1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
1st. (N2 case) : {(9X/16)C2 * (7X/16)C2 } / xC4.
The first is the probability of picking 7 bits that are exactly two 0 bits and 5 1 bits; the second is the prob of picking 4 bits that are exactly 2 and 2. No wonder they're different! In the problem I thought we were discussing, P[00] doesn't care what the remaining 2 or 5 bits are.
 
  • #55
haruspex said:
Now that you have explained that, thankyou, I can see where it is wrong.
For one thing, that analysis treats all bits as independently 0 or 1, regardless of their proximity to each other. Bits multiples of 8 positions apart will be positively correlated, and at other distances negatively correlated.
More significantly, let's look at what these represent:
1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
1st. (N2 case) : {(9X/16)C2 * (7X/16)C2 } / xC4.
The first is the probability of picking 7 bits that are exactly two 0 bits and 5 1 bits; the second is the prob of picking 4 bits that are exactly 2 and 2. No wonder they're different! In the problem I thought we were discussing, P[00] doesn't care what the remaining 2 or 5 bits are.

You are right!
P[00] doesn't care what the remaining 2 or 5 bits are.
So does the calculation in the following two cases, which are the prob of P[00] taken from N1 and N2 respectively regardless of the contents of the N1 & N2.
1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
1st. (N2 case) : {(9X/16)C2 * (7X/16)C2 } / xC4.
 
  • #56
Cylab said:
So does the calculation in the following two cases, which are the prob of P[00] taken from N1 and N2 respectively regardless of the contents of the N1 & N2.
1st. (N1 case) : {(9X/16)C2 * (7X/16)C5 } / xC7.
1st. (N2 case) : {(9X/16)C2 * (7X/16)C2 } / xC4.
Once again, I'm not at all sure what you are saying. Are you insisting that the above formulae are correct for P[00]? I have just explained to you why they are not.
 
  • #57
haruspex said:
Once again, I'm not at all sure what you are saying. Are you insisting that the above formulae are correct for P[00]? I have just explained to you why they are not.

Just focusing your points.

Following link may help you clarify your analysis mentioned so far.
http://en.wikipedia.org/wiki/Hypergeometric_distribution
 
Back
Top