# Why a letter requires 1 byte?

1. Nov 21, 2015

### Raghav Gupta

Why is that a letter requires 8 bits which is combination of 0's and 1's?
In one byte we can have 28 combinations. Apart from letters and numbers what more?
Why is 210 famous?
How the 0's and 1's combination which is "on" and "off" printing letters?

2. Nov 21, 2015

### phyzguy

You can look at the table here and see all 256 symbols. Don't forget upper and lower case, punctuation marks, etc.

3. Nov 21, 2015

### Raghav Gupta

But there can be so many other characters as well like japanese letters etc.
How computer can understand numbers? I thought it only understands 1 and 0.
If it understands the number also how the printing of letter takes place?

4. Nov 21, 2015

### Hornbein

In the Stone Age some machines had 7-bit characters to save money. There are upper-case and lower-case letters, numerals, and punctuation, which total more than 64.

2^10 is famous because it is approximately equal to one thousand.

The "on" and "off" printing characters are for teletypes, now obsolete.

5. Nov 21, 2015

### phyzguy

Foreign languages typically use an encoding called Unicode, which takes two bytes (and sometimes four bytes). Do you understand that a byte is 8 bits, so that once you have done this translation, it is translated into 1's and 0's? Also, the number "16" as text is coded differently in the computer than the number 16 as an integer number, which is coded differently than the number 16.0 as a floating point number. Explaining all of this in detail is the topic of a whole book. Why don't you find a good textbook on how computers work and start there. If you have specific questions after reading that, come back and ask.

6. Nov 21, 2015

### Staff: Mentor

Unicode (UTF-8) even includes emoji:

https://en.wikipedia.org/wiki/Emoji#Blocks

How many of them display properly depends on the font that your browser uses. When new characters are added to the standard set, fonts have to be updated to include them.

7. Nov 21, 2015

### Raghav Gupta

Can you give a reference for a this type of book?

8. Nov 21, 2015

### SteamKing

Staff Emeritus
The Japanese writing systems (There are at least 4 in common use) don't use letters. One system does use the Roman alphabet ( romaji ) to transliterate the sounds of Japanese, but the other three use either a syllabary (hiragana and katakana) or full-on ideograms [ kanji ] (of which there are thousands of different signs).

https://en.wikipedia.org/wiki/Japanese_writing_system

The kanji are derived from Chinese ideograms, but are not equivalent for the most part. Typewriters and keyboards which can handle kanji or Chinese ideograms are cumbersome devices with many keys, which take a long time to master. A typical Japanese student does not become fully fluent in speaking and writing his native language (all 4 forms of writing) until he reaches his teen years. Calligraphers practice the art of drawing Japanese ideograms often for a lifetime.

The computer works with binary equivalents of numbers. Integers are converted into their binary equivalents. Floating point numbers are converted to a specially-coded binary format, which is manipulated by the computer, and the results are decoded back to a decimal number for display or printing.

When printing, the computer sends a stream of data to the printer. The printer decodes the data stream and prints the proper character. Likewise, when data is displayed by the computer on screen, the internal data is decoded into human readable characters.

9. Nov 21, 2015

### Raghav Gupta

But if integers can be converted into their binary equivalent why cannot letters directly be converted to their binary equivalent. Why ASCII is needed to convert letters first into numbers. Suppose ASCII assigns value 65 to A, then what about if we have to print 65 only ?

10. Nov 21, 2015

### SteamKing

Staff Emeritus
What's the binary equivalent of 'A' or 'q' or '&'?

The ASCII code for 'A' is 65 decimal, but the computer uses the binary equivalent 100 0001, which is also 41 hex. If you want to print the numeral '65', you must print each decimal digit, '6' and '5', in the proper order for a human to understand it.

ASCII is a coded representation of letters, numerals, and other characters commonly found in American English writing. ASCII is not the only such system, but it is the one around which most computers operate. There are also extended ASCII, Unicode, and several other code systems in use:

https://en.wikipedia.org/wiki/ASCII

An older coding system, developed by IBM for their mainframes, was known as EBCDIC:

https://en.wikipedia.org/wiki/EBCDIC

11. Nov 21, 2015

### phyzguy

I've heard good things about "The Elements of Computing Systems: Building a Modern Computer from First Principles" by Noam Nisan.

12. Nov 25, 2015

### Svein

Well it does - and does not. Back in the infancy of computers a character (which is the general term for "letter") used a different number of bits:
• Telex code = 5 bits
• Some computers used 6 bits/character
• Communication to early alphanumeric terminals: 7 bits + parity (ASCII or EBCDIC code)
The "modern" age of computers:

13. Nov 28, 2015

### jarekduda

So how many bits does a letter really need/carry?

27 letters would directly require 5 bits.
Using a base 27 numeral system, we would need lg(27) ~ 4.75 bits/letter instead.
Huffman would need ~ 4.12 bits/letter instead.
A better order 0 entropy coder ~ 4.08 bits/letter.
Order 1 (Markov: previous letter is the context) ~ 3.3 bits/letter
Order 2: ~ 3.1 bits/letter
using probability distribution among words: ~ 2.1 bits/letter.
...
the best compressor "cmix v8" http://mattmahoney.net/dc/text.html
compress 10^9 to 123,930,173 bytes - it is less than 1bit/letter.
...
Hilberg conjecture suggests that, due to long range correlations, entropy of text grows in a sublinear way
H(n) ~ n^beta where beta < 0.9
http://www.ipipan.waw.pl/~ldebowsk/docs/seminaria/hilberg.pdf
In other words, compressing two texts concatenated, we need less than sum of compressed sizes for separate files.
So this conjecture suggests that the number of bits per letter approaches zero !?!

Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook