Conditional Prob of 0 in ASCII


by Cylab
Tags: ascii, conditional, prob
Cylab
Cylab is offline
#1
Nov10-12, 06:09 AM
P: 54
What is prob of 0 in ASCII text file assuming it is bit string?

Analysis:
MSB(1st bit in each byte) is always 0. So Pr(MSB) = 1/8.
its space is 2^7=256. Assuming each character is randomly distribution(same ratio of appearing in text), besides MSB, other bits follow the rule Pr(0)=Pr(1)=1/2.

Thus, as total (MSB + other bits), Pr(0)=1/8 +1/2 = 5/8.
Is it correct???

Then further, randomly select, n bits from the total bits of ASCII text files.
What is probability of being 0 of t_th position bit or Pr[t = 0] ,
where 0<t <= n ? And what is prob of being 00 of t_th and (t+1)_th positions of bits (two sequences of zero).

Please anyone shed some light on this.

Thanks in advance for your attention.
Phys.Org News Partner Science news on Phys.org
Internet co-creator Cerf debunks 'myth' that US runs it
Astronomical forensics uncover planetary disks in Hubble archive
Solar-powered two-seat Sunseeker airplane has progress report
Ibix
Ibix is offline
#2
Nov10-12, 02:08 PM
P: 372
Your first answer isn't correct. By your argument, the probability of a 1 is 0+1/2=1/2. Then the probability of a 0 or a 1 is p(0)+p(1)=5/8+1/2=9/8.

You need to weight your p(0|MSB) by p(MSB) and p(0|not MSB) ay p(not MSB).

I'm not sure quite what you're asking in the second part. Are you just taking n bits at random, or picking the n bits following a randomly chosen bit? If the first, are you allowing the same bit to be picked twice (or more), or no more than once?
phinds
phinds is offline
#3
Nov10-12, 02:38 PM
PF Gold
phinds's Avatar
P: 5,720
In what language? If it is an English text file, it will have a somewhat different distribution of letters that, say German, and so forth.

This is not really a problem that is amenable to any simple analysis. It will depend on the distribution of ALL of the characters. How much punctuation is used, etc.

The one thing that I can definitely say is that the probability of a zero is slightly greater than the probability of a 1 because normal text files pretty much NEVER use any of the characters above 127, so right away, you've got 1/8 of the characters always being a zero.

Cylab
Cylab is offline
#4
Nov11-12, 12:20 AM
P: 54

Conditional Prob of 0 in ASCII


NOTE: Assuming each character is randomly distribution.
1st. What is prob of 0 in ASCII?
2nd. Now we have n bits taken randomly from ASCII.
What is prob of t_th position bit of being 0 in the randomly taken n bits (0<t<n)?
3rd. Now taking 2 consequence bits, what is prob that they are 00?
Ibix
Ibix is offline
#5
Nov11-12, 02:01 AM
P: 372
I am reading your note as meaning that each symbol in the range 0-127 is equally probable. This is not what it says. You have not specified a distribution, so phinds' comment is reasonable. I will assume a uniform distribution because I would guess that is what you mean if you don't specify - but you should.

I've given you a hint how to do the first one in my previous post.

For the second one, you need to explain you are drawing bits with replacement or without.

For the third part, you have identified two separate reasons why a bit might be zero. What are the conditional probabilities on a zero as the second bit?
Cylab
Cylab is offline
#6
Nov11-12, 04:34 AM
P: 54
Random distribution means clearly that the ratio of each character is equal.
your prob 9/8... what is that? why you write prob of 1.
your writing is absolutely not related of the problem.

e.g. 2nd says one bit.. why "with repalcement or without"??
I would strong recommend you read before write...
Ibix
Ibix is offline
#7
Nov11-12, 08:28 AM
P: 372
Quote Quote by Cylab View Post
Random distribution means clearly that the ratio of each character is equal.
Not in English. Random means that it is not possible to predict a result from other results. Paint one side of a cubic die red and the other five blue. Throw the die many times and record the colour of the top surface each time. The sequence of colours is random, but there will be around five times as many blue results as red ones.

You do appear to mean that each letter is equally probable. Fair enough - but precision is very important in statistics, and if you do not learn the right words, other people will not understand you and you will get responses like mine and phinds'.

Quote Quote by Cylab View Post
your prob 9/8... what is that? why you write prob of 1.
To show you that your answer was incorrect. A bit can only be a 1 or a 0, so p(0) + p(1) must equal 1. If you take the same reasoning you used to arrive at p(0)=5/8 and use it to calculate p(1), you will arrive at p(1)=1/2. That makes p(0)+p(1)=9/8. So your reasoning is wrong. I told you how to correct it in my first post. If you didn't understand me that's fine, and I will try to explain further.

Quote Quote by Cylab View Post
your writing is absolutely not related of the problem.

e.g. 2nd says one bit.. why "with repalcement or without"??
That is simply a restatement of the last sentence in my first post - a question you have not answered. Again, if you did not understand then I am happy to explain further.
jim mcnamara
jim mcnamara is offline
#8
Nov12-12, 07:06 AM
Sci Advisor
PF Gold
P: 1,355
FWIW - ASCII defines 7 bit combinations for letters. I think you mean something else like extended ASCII.

Phinds is correct.

http://www.iana.org/assignments/character-sets google for: ANSI_X3.4-1968 (exact name for ASCII)

You can define your problem however it suits you, but you should be aware of what a standard definition of something is. So you don't confuse others.


Register to reply

Related Discussions
Is most significant bit in ASCII always 0? Linear & Abstract Algebra 0
Basic Probabilities. Conditional Prob. Set Theory, Logic, Probability, Statistics 2
spr and ascii formatting Forum Feedback & Announcements 1
ASCII Starwars Computing & Technology 6
Conditional Prob -cont random variable Set Theory, Logic, Probability, Statistics 4