Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Probability of 0 bit in ASCII text files

  1. Oct 6, 2012 #1
    text files consists of 0&1 bits. But in ASCII, Most Significant Bit(MSB) is 0, so we have more 0s than 1s. Assume that we draw bits at random. Question: What is probability of drawing 0 in 2nd time, given 1st drawing was 0?

    My Analysis:
    From bayes formula, Pr(0) = 1/8*1 + 7/8*1/2 = 9/16, so Pr(1)=7/16 for first drawing of a bit.
    Then, it should be like: Pr(0&0) = Pr(0) * Pr(0|0), but I can`t figure out the Pr(0|0)...
    Please help!!
     
  2. jcsd
  3. Oct 6, 2012 #2
    Are the draws independent? Are we drawing with replacement?
     
  4. Oct 6, 2012 #3

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    Even if drawing with replacement, the result of the first draw adds information regarding the file. While it may be ok to assume that non-leading bits are in general equally likely 0 or 1, it will not be true of a given file. Consider a file consisting of a single character. Initially, we know the leading bit is zero and the others equally likely 0 or 1. If first draw returns a 0, what is the probability that it was the leading zero? If it was not, how does that affect the probability of drawing a zero again if first was replaced? What if not replaced?
     
  5. Oct 6, 2012 #4
    (sorry) it is without replacement. (this is part of research) calculation should be based on the description, that is, randomly distributed bits are given, with equal probability of 0 and 1, plus Pr(MSB=0)=1/8(each byte has a MSB=0). So the probability of 0 as whole is 9/16. Question is what is probability of drawing 0 in 2nd time, given 1st time was 0?
     
  6. Oct 7, 2012 #5

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    OK. If the file is large then clearly the replacement or otherwise has negligible effect, so the result of the first draw tells you nothing about the second.
    For a more modest file, N bytes say, can you calculate the probability that the first bit drawn was a MSB?
     
  7. Oct 7, 2012 #6
    To consult (please correct it if it is wrong, I am doubtful for the second analysis):
    The probability of being MSB for first bit drawn;
    1) Pr(MSB) = 1/8 (since only 1 MSB in each byte, and characters are stored as byte).
    The probability of being 0 for first bit drawn;
    2) Pr(0) = 1/8*1 + 7/8*1/2 (due to Bayes formula, if drawn bit is MSB, it is 0 with 100%, if it is NOT MSB, then it is 0 with 50%).
     
  8. Oct 7, 2012 #7

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    No, you're not understanding my question. Given that the first bit drawn was a 0, what is the prob that it was an MSB?
     
  9. Oct 7, 2012 #8
    So there are more questions come, I wish I will have answers at least one of them..Please share your thoughts.
     
  10. Oct 7, 2012 #9

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    At first, there are 8 equally likely possibilities, drawing bit 0 (MSB)through to bit 7.
    That splits into 15: a double shot at a 0 from bit 0, then 7 equal for a 0 bit from bits 1 to 7, and 7 equal for a 1 bit from bits 1 to 7. Since it turned out to be a 0, you rule out those last 7. That means the prob that it was an MSB is 2/9.
    If there are N bytes altogether, there are N MSBs. 2/9 of the time you removed one of these in the first selection, leaving only N-1 MSBs. The other 7/9 times you removed a non-msb.
    Can you finish it from there?
     
  11. Oct 7, 2012 #10
    Sorry, I am afraid I can`t!! Will you please shed more light.
    Reading the description several times, still i find myself unable to follow the logic.
    (btw, just in case, with initial requirement, when we draw bits, we don`t know where each byte starts, since these bits(Pr(0)=Pr(1)=1/2, plus Pr(MSB=0)=1/8) are randomly distributed.
     
  12. Oct 7, 2012 #11
    By the way, any idea about the Pr(0)=? of these ASCII bits??
    Sorry taking so much of your time.
     
  13. Oct 7, 2012 #12

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    With prob 2/9, after getting a 0 bit there are N-1 MSBs and 7N non-MSBs remaining. So prob of drawing a 0 next would be ((N-1)+7N/2)/(8N-1). With prob 7/9 there are N MSBs and 7N-1 non-MSBs remaining, in which case prob of drawing a 0 next would be (N+(7N-1)/2)/(8N-1). Total prob = (2*(N-1)+7N + 7N+7(7N-1)/2)/(9(8N-1)). I leave it to you to simplify that.
     
  14. Oct 8, 2012 #13
    Will you please clarify how prob 2/9 is calculated?
    From my understanding is that if there are 15 bits(sequence of bit strings), then
    there are 1 or 2 MSB in the strings. So little confused about 2/9.
     
  15. Oct 8, 2012 #14

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    In the first draw, you had 2/16 chance of a 0 from an MSB, 7/16 chance of a 0 from a non-MSB, and 7/16 chance of a 1 from a non-MSB. Since you got a 0, you can eliminate the third of these. That leaves a probability 'weight' of 2 for an MSB compared with 7 for a non-MSB, so the odds that it was in fact an MSB are 2/9.
    Note, I'm assuming a whole number of bytes in the file.
     
  16. Oct 8, 2012 #15
    What is the "the third of these"?
    What is the "'weight' of 2"?
    what is "odds"?
     
  17. Oct 8, 2012 #16

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    The "1 from a non-MSB" case.
    By 'weight' I just mean a share of the remaining probability space.
    Initially there are 8 equally likely possibilities corresponding to the 8 positions within a byte.
    The MSB always gives a 0, while the other 7 can each give a 0 or a 1. So we can divide the total probability 15 ways, but they are not equal now. A 0 from MSB gets 2 units while each of the other 14 get only 1 unit each. This gave you your 9/16 for a 0 in the first draw.
    Once you know you got a 0, that eliminates 7 of the total weight of 16, leaving only a total weight of 9. An MSB in the first draw made 2 of those 9, so the probability that it was an MSB is 2/9.
    [/QUOTE]
    what is "odds"?[/QUOTE]
    That's just another word for probability.
     
  18. Oct 8, 2012 #17

    rbj

    User Avatar

    if this file is text, like English-language text, then some letters are more common than other letters. there are statistics that have results about which letters and characters are more common. i dunno where to find such statistics, but i remember that the letter "e" is the most common. this unequal probability is what is behind the choice of symbols in Morse code. the letter "e" is a single "dit" or "dot" because it is the most common. some similar reason exists with assigning Huffman codes to particular symbols.
     
  19. Oct 8, 2012 #18

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    OP has stated that all bits except MSB are to be assumed equally likely, and independently, 0 or 1.
     
  20. Oct 9, 2012 #19
    Still confused, say, why getting a 0 eliminates 7 of the total weight of 16 etc?.(saying it in simple: we have Pr(MSB=0)=1/8, plus Pr(0)=Pr(1)=1/2, then draw a bit, (as whole) Pr(0)=?, Pr(MSB=0)=?)
    Anyway, it seems to follow the rule, where i equals number of 0 drawn sequenced/continual, say i=0,1,2.
    1/8 ((i+1)/2^i +(7-i)/2^(i+1))=(i+9)/2^(i+4) .
    Agree?
     
  21. Oct 9, 2012 #20
    what do you mean by unit now? is it soooooooo hard to explain in plain words?
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Probability of 0 bit in ASCII text files
Loading...