How Do Different Factors Affect the Number of Bits Needed to Represent a Letter?

  • Thread starter Raghav Gupta
  • Start date
In summary, the ASCII code for 'A' is 65 decimal, but the computer uses the binary equivalent 100 0001, which is also 41 hex.
  • #1
Raghav Gupta
1,011
76
Why is that a letter requires 8 bits which is combination of 0's and 1's?
In one byte we can have 28 combinations. Apart from letters and numbers what more?
Why is 210 famous?
How the 0's and 1's combination which is "on" and "off" printing letters?
 
Technology news on Phys.org
  • #2
You can look at the table here and see all 256 symbols. Don't forget upper and lower case, punctuation marks, etc.
 
  • #3
phyzguy said:
You can look at the table here and see all 256 symbols. Don't forget upper and lower case, punctuation marks, etc.
But there can be so many other characters as well like japanese letters etc.
How computer can understand numbers? I thought it only understands 1 and 0.
If it understands the number also how the printing of letter takes place?
 
  • #4
Raghav Gupta said:
Why is that a letter requires 8 bits which is combination of 0's and 1's?
In one byte we can have 28 combinations. Apart from letters and numbers what more?
Why is 210 famous?
How the 0's and 1's combination which is "on" and "off" printing letters?

In the Stone Age some machines had 7-bit characters to save money. There are upper-case and lower-case letters, numerals, and punctuation, which total more than 64.

2^10 is famous because it is approximately equal to one thousand.

The "on" and "off" printing characters are for teletypes, now obsolete.
 
  • Like
Likes Raghav Gupta
  • #5
Foreign languages typically use an encoding called Unicode, which takes two bytes (and sometimes four bytes). Do you understand that a byte is 8 bits, so that once you have done this translation, it is translated into 1's and 0's? Also, the number "16" as text is coded differently in the computer than the number 16 as an integer number, which is coded differently than the number 16.0 as a floating point number. Explaining all of this in detail is the topic of a whole book. Why don't you find a good textbook on how computers work and start there. If you have specific questions after reading that, come back and ask.
 
  • #6
Unicode (UTF-8) even includes emoji:

https://en.wikipedia.org/wiki/Emoji#Blocks

How many of them display properly depends on the font that your browser uses. When new characters are added to the standard set, fonts have to be updated to include them.
 
  • Like
Likes Raghav Gupta
  • #7
phyzguy said:
Why don't you find a good textbook on how computers work and start there.
Can you give a reference for a this type of book?
 
  • #8
Raghav Gupta said:
But there can be so many other characters as well like japanese letters etc.
How computer can understand numbers? I thought it only understands 1 and 0.
If it understands the number also how the printing of letter takes place?
The Japanese writing systems (There are at least 4 in common use) don't use letters. One system does use the Roman alphabet ( romaji ) to transliterate the sounds of Japanese, but the other three use either a syllabary (hiragana and katakana) or full-on ideograms [ kanji ] (of which there are thousands of different signs).

https://en.wikipedia.org/wiki/Japanese_writing_system

The kanji are derived from Chinese ideograms, but are not equivalent for the most part. Typewriters and keyboards which can handle kanji or Chinese ideograms are cumbersome devices with many keys, which take a long time to master. A typical Japanese student does not become fully fluent in speaking and writing his native language (all 4 forms of writing) until he reaches his teen years. Calligraphers practice the art of drawing Japanese ideograms often for a lifetime.

The computer works with binary equivalents of numbers. Integers are converted into their binary equivalents. Floating point numbers are converted to a specially-coded binary format, which is manipulated by the computer, and the results are decoded back to a decimal number for display or printing.

When printing, the computer sends a stream of data to the printer. The printer decodes the data stream and prints the proper character. Likewise, when data is displayed by the computer on screen, the internal data is decoded into human readable characters.
 
  • Like
Likes Silicon Waffle
  • #9
SteamKing said:
Integers are converted into their binary equivalents.
But if integers can be converted into their binary equivalent why cannot letters directly be converted to their binary equivalent. Why ASCII is needed to convert letters first into numbers. Suppose ASCII assigns value 65 to A, then what about if we have to print 65 only ?
 
  • #10
Raghav Gupta said:
But if integers can be converted into their binary equivalent why cannot letters directly be converted to their binary equivalent. Why ASCII is needed to convert letters first into numbers. Suppose ASCII assigns value 65 to A, then what about if we have to print 65 only ?
What's the binary equivalent of 'A' or 'q' or '&'?

The ASCII code for 'A' is 65 decimal, but the computer uses the binary equivalent 100 0001, which is also 41 hex. If you want to print the numeral '65', you must print each decimal digit, '6' and '5', in the proper order for a human to understand it.

ASCII is a coded representation of letters, numerals, and other characters commonly found in American English writing. ASCII is not the only such system, but it is the one around which most computers operate. There are also extended ASCII, Unicode, and several other code systems in use:

https://en.wikipedia.org/wiki/ASCII

An older coding system, developed by IBM for their mainframes, was known as EBCDIC:

https://en.wikipedia.org/wiki/EBCDIC
 
  • Like
Likes Raghav Gupta
  • #11
Raghav Gupta said:
Can you give a reference for a this type of book?

I've heard good things about "The Elements of Computing Systems: Building a Modern Computer from First Principles" by Noam Nisan.
 
  • Like
Likes Raghav Gupta
  • #12
Raghav Gupta said:
Why is that a letter requires 8 bits
Well it does - and does not. Back in the infancy of computers a character (which is the general term for "letter") used a different number of bits:
  • Telex code = 5 bits
  • Some computers used 6 bits/character
  • Communication to early alphanumeric terminals: 7 bits + parity (ASCII or EBCDIC code)
The "modern" age of computers:
 
  • #13
So how many bits does a letter really need/carry?

27 letters would directly require 5 bits.
Using a base 27 numeral system, we would need lg(27) ~ 4.75 bits/letter instead.
Huffman would need ~ 4.12 bits/letter instead.
A better order 0 entropy coder ~ 4.08 bits/letter.
Order 1 (Markov: previous letter is the context) ~ 3.3 bits/letter
Order 2: ~ 3.1 bits/letter
using probability distribution among words: ~ 2.1 bits/letter.
...
the best compressor "cmix v8" http://mattmahoney.net/dc/text.html
compress 10^9 to 123,930,173 bytes - it is less than 1bit/letter.
...
Hilberg conjecture suggests that, due to long range correlations, entropy of text grows in a sublinear way
H(n) ~ n^beta where beta < 0.9
http://www.ipipan.waw.pl/~ldebowsk/docs/seminaria/hilberg.pdf
In other words, compressing two texts concatenated, we need less than sum of compressed sizes for separate files.
So this conjecture suggests that the number of bits per letter approaches zero ?
 

Related to How Do Different Factors Affect the Number of Bits Needed to Represent a Letter?

1. Why does a letter require 1 byte of memory?

Letters are typically represented in a computer using the ASCII or Unicode character encoding systems. In both of these systems, each character, including letters, is assigned a unique numerical code. This code is then stored in memory using binary digits, or bits. Since 1 byte is equal to 8 bits, this means that each letter requires 1 byte of memory.

2. Can letters require more or less than 1 byte of memory?

Yes, depending on the character encoding system being used, some letters may require more than 1 byte of memory. For example, in the UTF-8 character encoding system, certain letters and symbols can require up to 4 bytes of memory.

3. How does a computer know which letter to retrieve from 1 byte of memory?

When a letter is stored in memory, it is assigned a specific location, or address, in memory. This address is determined by the character encoding system being used. When the computer needs to retrieve the letter, it uses the address to access the correct 1 byte of memory and retrieve the corresponding numerical code, which is then translated into the letter.

4. Why is 1 byte the standard unit for measuring memory?

The byte is the smallest addressable unit of memory, meaning it is the smallest amount of memory that can be accessed at one time. Therefore, it serves as a convenient unit for measuring and comparing memory usage. Additionally, many computer systems and programs are designed to work with data in multiples of 1 byte, making it a practical and widely used unit for memory measurement.

5. Can a letter require more than 1 byte of memory in certain situations?

In some cases, such as when using certain character encoding systems or fonts, a letter may require more than 1 byte of memory. Additionally, when storing multiple letters in a sequence, they may require more than 1 byte of memory depending on the encoding system used and the length of the sequence. However, for most practical purposes, 1 byte is the standard unit for measuring the memory usage of a single letter.

Similar threads

  • Programming and Computer Science
Replies
1
Views
2K
  • Programming and Computer Science
Replies
10
Views
744
  • Programming and Computer Science
Replies
3
Views
1K
  • Programming and Computer Science
Replies
11
Views
825
  • Programming and Computer Science
Replies
9
Views
1K
  • Precalculus Mathematics Homework Help
Replies
23
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
8
Views
1K
  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
Replies
2
Views
1K
Back
Top