How Do Different Factors Affect the Number of Bits Needed to Represent a Letter?

Raghav Gupta · Nov 21, 2015

Why is that a letter requires 8 bits which is combination of 0's and 1's?
In one byte we can have 2⁸ combinations. Apart from letters and numbers what more?
Why is 2¹⁰ famous?
How the 0's and 1's combination which is "on" and "off" printing letters?

phyzguy · Nov 21, 2015

You can look at the table here and see all 256 symbols. Don't forget upper and lower case, punctuation marks, etc.

Raghav Gupta · Nov 21, 2015

phyzguy said:

You can look at the table here and see all 256 symbols. Don't forget upper and lower case, punctuation marks, etc.

But there can be so many other characters as well like japanese letters etc.
How computer can understand numbers? I thought it only understands 1 and 0.
If it understands the number also how the printing of letter takes place?

Hornbein · Nov 21, 2015

Raghav Gupta said:

Why is that a letter requires 8 bits which is combination of 0's and 1's?
In one byte we can have 2⁸ combinations. Apart from letters and numbers what more?
Why is 2¹⁰ famous?
How the 0's and 1's combination which is "on" and "off" printing letters?

In the Stone Age some machines had 7-bit characters to save money. There are upper-case and lower-case letters, numerals, and punctuation, which total more than 64.

2^10 is famous because it is approximately equal to one thousand.

The "on" and "off" printing characters are for teletypes, now obsolete.

phyzguy · Nov 21, 2015

Foreign languages typically use an encoding called Unicode, which takes two bytes (and sometimes four bytes). Do you understand that a byte is 8 bits, so that once you have done this translation, it is translated into 1's and 0's? Also, the number "16" as text is coded differently in the computer than the number 16 as an integer number, which is coded differently than the number 16.0 as a floating point number. Explaining all of this in detail is the topic of a whole book. Why don't you find a good textbook on how computers work and start there. If you have specific questions after reading that, come back and ask.

jtbell · Nov 21, 2015

Unicode (UTF-8) even includes emoji:

https://en.wikipedia.org/wiki/Emoji#Blocks

How many of them display properly depends on the font that your browser uses. When new characters are added to the standard set, fonts have to be updated to include them.

Raghav Gupta · Nov 21, 2015

phyzguy said:

Why don't you find a good textbook on how computers work and start there.

Can you give a reference for a this type of book?

SteamKing · Nov 21, 2015

Raghav Gupta said:

But there can be so many other characters as well like japanese letters etc.
How computer can understand numbers? I thought it only understands 1 and 0.
If it understands the number also how the printing of letter takes place?

The Japanese writing systems (There are at least 4 in common use) don't use letters. One system does use the Roman alphabet ( romaji ) to transliterate the sounds of Japanese, but the other three use either a syllabary (hiragana and katakana) or full-on ideograms [ kanji ] (of which there are thousands of different signs).

https://en.wikipedia.org/wiki/Japanese_writing_system

The kanji are derived from Chinese ideograms, but are not equivalent for the most part. Typewriters and keyboards which can handle kanji or Chinese ideograms are cumbersome devices with many keys, which take a long time to master. A typical Japanese student does not become fully fluent in speaking and writing his native language (all 4 forms of writing) until he reaches his teen years. Calligraphers practice the art of drawing Japanese ideograms often for a lifetime.

The computer works with binary equivalents of numbers. Integers are converted into their binary equivalents. Floating point numbers are converted to a specially-coded binary format, which is manipulated by the computer, and the results are decoded back to a decimal number for display or printing.

When printing, the computer sends a stream of data to the printer. The printer decodes the data stream and prints the proper character. Likewise, when data is displayed by the computer on screen, the internal data is decoded into human readable characters.

Raghav Gupta · Nov 21, 2015

SteamKing said:

Integers are converted into their binary equivalents.

But if integers can be converted into their binary equivalent why cannot letters directly be converted to their binary equivalent. Why ASCII is needed to convert letters first into numbers. Suppose ASCII assigns value 65 to A, then what about if we have to print 65 only ?

SteamKing · Nov 21, 2015

Raghav Gupta said:

But if integers can be converted into their binary equivalent why cannot letters directly be converted to their binary equivalent. Why ASCII is needed to convert letters first into numbers. Suppose ASCII assigns value 65 to A, then what about if we have to print 65 only ?

What's the binary equivalent of 'A' or 'q' or '&'?

The ASCII code for 'A' is 65 decimal, but the computer uses the binary equivalent 100 0001, which is also 41 hex. If you want to print the numeral '65', you must print each decimal digit, '6' and '5', in the proper order for a human to understand it.

ASCII is a coded representation of letters, numerals, and other characters commonly found in American English writing. ASCII is not the only such system, but it is the one around which most computers operate. There are also extended ASCII, Unicode, and several other code systems in use:

https://en.wikipedia.org/wiki/ASCII

An older coding system, developed by IBM for their mainframes, was known as EBCDIC:

https://en.wikipedia.org/wiki/EBCDIC

phyzguy · Nov 21, 2015

Raghav Gupta said:

Can you give a reference for a this type of book?

I've heard good things about "The Elements of Computing Systems: Building a Modern Computer from First Principles" by Noam Nisan.

Svein · Nov 25, 2015

Raghav Gupta said:

Why is that a letter requires 8 bits

Well it does - and does not. Back in the infancy of computers a character (which is the general term for "letter") used a different number of bits:

Telex code = 5 bits
Some computers used 6 bits/character
Communication to early alphanumeric terminals: 7 bits + parity (ASCII or EBCDIC code)

The "modern" age of computers:

First IBM PC 8 bit ( 0 - 128: ASCII code, above 128 graphical characters and various European characters)
Early Microsoft Windows used 8 bit extended ASCII (see http://www.science.co.il/Language/Character-code.asp?s=1251)
From year 2000 or thereabouts Microsoft started using 16 bit Unicode (See http://unicode-table.com/en/#control-character)

jarekduda · Nov 28, 2015

So how many bits does a letter really need/carry?

27 letters would directly require 5 bits.
Using a base 27 numeral system, we would need lg(27) ~ 4.75 bits/letter instead.
Huffman would need ~ 4.12 bits/letter instead.
A better order 0 entropy coder ~ 4.08 bits/letter.
Order 1 (Markov: previous letter is the context) ~ 3.3 bits/letter
Order 2: ~ 3.1 bits/letter
using probability distribution among words: ~ 2.1 bits/letter.
...
the best compressor "cmix v8" http://mattmahoney.net/dc/text.html
compress 10^9 to 123,930,173 bytes - it is less than 1bit/letter.
...
Hilberg conjecture suggests that, due to long range correlations, entropy of text grows in a sublinear way
H(n) ~ n^beta where beta < 0.9
http://www.ipipan.waw.pl/~ldebowsk/docs/seminaria/hilberg.pdf
In other words, compressing two texts concatenated, we need less than sum of compressed sizes for separate files.
So this conjecture suggests that the number of bits per letter approaches zero ?

How Do Different Factors Affect the Number of Bits Needed to Represent a Letter?

Similar threads

Hot Threads

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Python Complaining About Python

Fortran Reading files in pre-f77 - handling end of file

Sequential Analog Computers?

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem