Conditional Prob of 0 in ASCII 
#1
Nov1012, 06:09 AM

P: 54

What is prob of 0 in ASCII text file assuming it is bit string?
Analysis: MSB(1st bit in each byte) is always 0. So Pr(MSB) = 1/8. its space is 2^7=256. Assuming each character is randomly distribution(same ratio of appearing in text), besides MSB, other bits follow the rule Pr(0)=Pr(1)=1/2. Thus, as total (MSB + other bits), Pr(0)=1/8 ＋1/2 = ５/8. Is it correct??? Then further, randomly select, n bits from the total bits of ASCII text files. What is probability of being 0 of t_th position bit or Pr[t = 0] , where 0<t <= n ? And what is prob of being 00 of t_th and (t+1)_th positions of bits (two sequences of zero). Please anyone shed some light on this. Thanks in advance for your attention. 


#2
Nov1012, 02:08 PM

P: 378

Your first answer isn't correct. By your argument, the probability of a 1 is 0+1/2=1/2. Then the probability of a 0 or a 1 is p(0)+p(1)=5/8+1/2=9/8.
You need to weight your p(0MSB) by p(MSB) and p(0not MSB) ay p(not MSB). I'm not sure quite what you're asking in the second part. Are you just taking n bits at random, or picking the n bits following a randomly chosen bit? If the first, are you allowing the same bit to be picked twice (or more), or no more than once? 


#3
Nov1012, 02:38 PM

PF Gold
P: 6,360

In what language? If it is an English text file, it will have a somewhat different distribution of letters that, say German, and so forth.
This is not really a problem that is amenable to any simple analysis. It will depend on the distribution of ALL of the characters. How much punctuation is used, etc. The one thing that I can definitely say is that the probability of a zero is slightly greater than the probability of a 1 because normal text files pretty much NEVER use any of the characters above 127, so right away, you've got 1/8 of the characters always being a zero. 


#4
Nov1112, 12:20 AM

P: 54

Conditional Prob of 0 in ASCII
NOTE: Assuming each character is randomly distribution.
1st. What is prob of 0 in ASCII? 2nd. Now we have n bits taken randomly from ASCII. What is prob of t_th position bit of being 0 in the randomly taken n bits (0<t<n)? 3rd. Now taking 2 consequence bits, what is prob that they are 00? 


#5
Nov1112, 02:01 AM

P: 378

I am reading your note as meaning that each symbol in the range 0127 is equally probable. This is not what it says. You have not specified a distribution, so phinds' comment is reasonable. I will assume a uniform distribution because I would guess that is what you mean if you don't specify  but you should.
I've given you a hint how to do the first one in my previous post. For the second one, you need to explain you are drawing bits with replacement or without. For the third part, you have identified two separate reasons why a bit might be zero. What are the conditional probabilities on a zero as the second bit? 


#6
Nov1112, 04:34 AM

P: 54

Random distribution means clearly that the ratio of each character is equal.
your prob 9/8... what is that? why you write prob of 1. your writing is absolutely not related of the problem. e.g. 2nd says one bit.. why "with repalcement or without"?? I would strong recommend you read before write... 


#7
Nov1112, 08:28 AM

P: 378

You do appear to mean that each letter is equally probable. Fair enough  but precision is very important in statistics, and if you do not learn the right words, other people will not understand you and you will get responses like mine and phinds'. 


#8
Nov1212, 07:06 AM

Sci Advisor
PF Gold
P: 1,384

FWIW  ASCII defines 7 bit combinations for letters. I think you mean something else like extended ASCII.
Phinds is correct. http://www.iana.org/assignments/charactersets google for: ANSI_X3.41968 (exact name for ASCII) You can define your problem however it suits you, but you should be aware of what a standard definition of something is. So you don't confuse others. 


