Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Conditional Prob of 0 in ASCII

  1. Nov 10, 2012 #1
    What is prob of 0 in ASCII text file assuming it is bit string?

    MSB(1st bit in each byte) is always 0. So Pr(MSB) = 1/8.
    its space is 2^7=256. Assuming each character is randomly distribution(same ratio of appearing in text), besides MSB, other bits follow the rule Pr(0)=Pr(1)=1/2.

    Thus, as total (MSB + other bits), Pr(0)=1/8 +1/2 = 5/8.
    Is it correct???

    Then further, randomly select, n bits from the total bits of ASCII text files.
    What is probability of being 0 of t_th position bit or Pr[t = 0] ,
    where 0<t <= n ? And what is prob of being 00 of t_th and (t+1)_th positions of bits (two sequences of zero).

    Please anyone shed some light on this.

    Thanks in advance for your attention.
    Last edited: Nov 10, 2012
  2. jcsd
  3. Nov 10, 2012 #2


    User Avatar
    Science Advisor

    Your first answer isn't correct. By your argument, the probability of a 1 is 0+1/2=1/2. Then the probability of a 0 or a 1 is p(0)+p(1)=5/8+1/2=9/8.

    You need to weight your p(0|MSB) by p(MSB) and p(0|not MSB) ay p(not MSB).

    I'm not sure quite what you're asking in the second part. Are you just taking n bits at random, or picking the n bits following a randomly chosen bit? If the first, are you allowing the same bit to be picked twice (or more), or no more than once?
  4. Nov 10, 2012 #3


    User Avatar
    Gold Member

    In what language? If it is an English text file, it will have a somewhat different distribution of letters that, say German, and so forth.

    This is not really a problem that is amenable to any simple analysis. It will depend on the distribution of ALL of the characters. How much punctuation is used, etc.

    The one thing that I can definitely say is that the probability of a zero is slightly greater than the probability of a 1 because normal text files pretty much NEVER use any of the characters above 127, so right away, you've got 1/8 of the characters always being a zero.
  5. Nov 11, 2012 #4
    NOTE: Assuming each character is randomly distribution.
    1st. What is prob of 0 in ASCII?
    2nd. Now we have n bits taken randomly from ASCII.
    What is prob of t_th position bit of being 0 in the randomly taken n bits (0<t<n)?
    3rd. Now taking 2 consequence bits, what is prob that they are 00?
  6. Nov 11, 2012 #5


    User Avatar
    Science Advisor

    I am reading your note as meaning that each symbol in the range 0-127 is equally probable. This is not what it says. You have not specified a distribution, so phinds' comment is reasonable. I will assume a uniform distribution because I would guess that is what you mean if you don't specify - but you should.

    I've given you a hint how to do the first one in my previous post.

    For the second one, you need to explain you are drawing bits with replacement or without.

    For the third part, you have identified two separate reasons why a bit might be zero. What are the conditional probabilities on a zero as the second bit?
  7. Nov 11, 2012 #6
    Random distribution means clearly that the ratio of each character is equal.
    your prob 9/8... what is that? why you write prob of 1.
    your writing is absolutely not related of the problem.

    e.g. 2nd says one bit.. why "with repalcement or without"??
    I would strong recommend you read before write...
  8. Nov 11, 2012 #7


    User Avatar
    Science Advisor

    Not in English. Random means that it is not possible to predict a result from other results. Paint one side of a cubic die red and the other five blue. Throw the die many times and record the colour of the top surface each time. The sequence of colours is random, but there will be around five times as many blue results as red ones.

    You do appear to mean that each letter is equally probable. Fair enough - but precision is very important in statistics, and if you do not learn the right words, other people will not understand you and you will get responses like mine and phinds'.

    To show you that your answer was incorrect. A bit can only be a 1 or a 0, so p(0) + p(1) must equal 1. If you take the same reasoning you used to arrive at p(0)=5/8 and use it to calculate p(1), you will arrive at p(1)=1/2. That makes p(0)+p(1)=9/8. So your reasoning is wrong. I told you how to correct it in my first post. If you didn't understand me that's fine, and I will try to explain further.

    That is simply a restatement of the last sentence in my first post - a question you have not answered. Again, if you did not understand then I am happy to explain further.
  9. Nov 12, 2012 #8

    jim mcnamara

    User Avatar

    Staff: Mentor

    FWIW - ASCII defines 7 bit combinations for letters. I think you mean something else like extended ASCII.

    Phinds is correct.

    http://www.iana.org/assignments/character-sets google for: ANSI_X3.4-1968 (exact name for ASCII)

    You can define your problem however it suits you, but you should be aware of what a standard definition of something is. So you don't confuse others.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook