Free download of any book ever written (real not spam)

  • Thread starter Thread starter BWV
  • Start date Start date
  • Tags Tags
    Book
AI Thread Summary
The discussion centers around the Library of Babel website, which claims to contain every possible book through combinations of characters. Participants express skepticism about the site's value, noting that the generated texts are largely gibberish and lack coherence. The limitations of the character set, which includes only lowercase letters, spaces, commas, and periods, make it difficult to find meaningful content. The conversation references the Infinite Monkey Theorem, highlighting the improbability of generating coherent sentences. Some contributors speculate about the potential for encoding punctuation and the implications of the library's vastness, while others question the practicality and purpose of such a collection. Overall, the consensus leans towards viewing the library as an interesting concept but ultimately lacking in substantive literary value.
BWV
Messages
1,583
Reaction score
1,936
if you can find it:

https://libraryofbabel.info

Site also has any book that ever will be written
 
Physics news on Phys.org
It is just combination of characters and didn't give any information. Do you think this can be real "book"?
 
It isn't as interesting as cat videos :wink:.
 
  • Like
Likes Doc Al
Daeho Ro said:
It is just combination of characters and didn't give any information. Do you think this can be real "book"?

Maybe its a real book in another language?
 
It is not. Unless you count "languages I made up just to fit this random string of letters".

Oh, and the website does not have any Chinese book. Or any book with a different alphabet at all.
 
mfb said:
It is not. Unless you count "languages I made up just to fit this random string of letters".

It could be code. Somewhere there is a book with the key

Oh, and the website does not have any Chinese book. Or any book with a different alphabet at all.

A phonetic Chinese version of everything ever written in both Wade-Giles and Pinyin can be found there.
 
I think it's an interesting idea, but it's hopelessly useless. The odds of finding even a coherent sentence are nill, which makes the whole thing rather boring. Even for just a 100 character string, (which is as long as this post up to the word 'coherent'), there are about 10143 possible combinations, just using letters and spaces. A minimum you'd need about 64 bytes to encode 100 characters using a 27 letter alphabet, which means you'd need ~64x10143 bytes of information to store all combinations. That's more storage than there are atoms in the universe.
 
  • #10
Not completely useless if it made the subject of a great short story
 
  • #11
I don't get it, when I search for a word the only thing it shows are blank pages with the word written on it, while the other pages in the 'book' are filled with random strings. Shouldn't that word be in a similar page with random strings?
 
  • #12
I think I need to cut down on my drinking! I got up to page 36 of one of those books before I realized it was complete gibberish.
 
  • #13
It looks like the stored characters are limited to lowercase letters, commas, periods, and spaces.

So, if that's true, if this "library" does contain J.D. Salinger's The Catcher In the Rye, (which I'm guessing it might/does), it would only contain a version of it where all capital letters are first converted to lower case; colons, semicolons, parenthesis, brackets, dashes, quotation marks, special characters and new-lines are omitted; and all formatting of text is removed.

In other words, nearly all punctuation is removed. I think it would be pretty hard to read a book without any punctuation except for periods, commas and spaces.
 
Last edited:
  • #14
There are versions where all the missing punctuation is spelled out like 'colon here'
 
  • #15
BWV said:
There are versions where all the missing punctuation is spelled out like 'colon here'
True, except without the single quotes as you used. Some other convention would have to be developed to signal a punctuation direction, to distinguish them from the actual text.

'Still pretty hard to read.

And when all is said and done, if certain punctuation symbols are desired, it may be more efficient (and certainly more robust) to add in the desired punctuation symbols into the valid characters. A hit would be taken in terms of compression efficiency, but likely not as much as adding in the punctuation convention "code" using only lowercase letters, commas and periods. That, and having punctuation symbols as valid characters would take care of special cases where actual text in the book actually corresponds with a convention "code." (E.g.. what if one of the characters in the book actually wanted to say [before compression], "... and the teacher said, 'no Davy, don't put the colon at the beginning of the sentence. Put the colon here.'")
 
  • #16
I wonder that as well, if Borges had added three or four more characters it would not have changed the order of magnitude of the size of the library: 25^(1.3*10^6) vs 28^(1.3*10^6). Was it more for the reader's sake?
 
  • #17
BWV said:
I wonder that as well, if Borges had added three or four more characters it would not have changed the order of magnitude of the size of the library: 25^(1.3*10^6) vs 28^(1.3*10^6). Was it more for the reader's sake?
Essentially, given the same indexing scheme as already present, adding additional characters would effectively reduce the maximum book size (but so would using "codes" to indicate punctuation). Of course you could split a real book into several smaller books.

Ultimately what this comes down to -- regardless of how many legal decompressed characters there are -- the index to the book is merely the compressed version of the book itself. When plugging in the book's index you're really just typing in the compressed version of the book, where the full compression involves more than just reducing the number of valid characters, but also character mapping into a more efficient storage format. Then the "book" it returns is the decompressed index, limited of course to its relatively small set of legal characters.

Btw, I just noticed that numerals (i.e., 0 - 9) don't seem to be valid characters either in the current implementation.
 
  • #18
collinsmark said:
the index to the book is merely the compressed version of the book itself. When plugging in the book's index you're really just typing in the compressed version of the book

Yes that was a theme of Borges - for example the 1:1 scale map in
On exactitude in science
 
  • #19
Expressing colons with lowercase letters is no problem. Let "special char colon" be a colon, "special char exclamation mark" be an exclamation mark and so on. If the book text should contain "special char", write "special char special char". Can be shortened to save space, of course.
The same is frequently done with character escaping for programming languages, e.g. how do you add quotation marks to a string that is delimited by quotation marks, or how do you add a backslash to a string where the backslash indicates an escape sequence (like \n for a new line)? Same solution.
 
  • #20
Im looking for a book call isjamgon but could not find it.
 
  • #21
I guess we could find in a fixed language, the probability of a string being a word. Say, in English. We count total number of strings, of length, say, ten or less. There are : 26+ 26^2+..+26^{10} . Then we divide total number of words in English by this amount. Just a curiosity, I guess.
 
  • #22
The entropy of written English in long texts is somewhere around 1 to 1.5 bits per letter (e.g. here). That corresponds to (as logarithmic average) about 2-3 reasonable options for each letter, given the previous text. As an examp[.....]le you could guess the next letter "l" after "examp" with nearly 100% confidence, [...]while the letter "w" was not so clear here. That would lead to 1000 to 30000 reasonable strings for 10 letters, or 1 million to 1 billion for 20 letters. That is a huge amount, but tiny compared to 26^10 or 26^20.
 
  • #23
collinsmark said:
It looks like the stored characters are limited to lowercase letters, commas, periods, and spaces.

So, if that's true, if this "library" does contain J.D. Salinger's The Catcher In the Rye, (which I'm guessing it might/does), it would only contain a version of it where all capital letters are first converted to lower case; colons, semicolons, parenthesis, brackets, dashes, quotation marks, special characters and new-lines are omitted; and all formatting of text is removed.

In other words, nearly all punctuation is removed. I think it would be pretty hard to read a book without any punctuation except for periods, commas and spaces.

No. Have you considered how much storage is required to store that many books? Look at what I posted... you couldn't even store every possible book if all you used were 100 characters per book. There is nothing interesting to read in that library - I looked around and couldn't even find a single three word string that even resembled something with semantic meaning. And if you were to actually read the about section of that website, the author doesn't actually claim to have produced every possible combination.
 
  • #24
The input textbox below this thread contains every book ever written.
To navigate, type the first symbol of the book you are looking for, then type the second symbol, continue until you found the book you were looking for.
 
  • #25
dipole said:
No. Have you considered how much storage is required to store that many books? Look at what I posted... you couldn't even store every possible book if all you used were 100 characters per book. There is nothing interesting to read in that library - I looked around and couldn't even find a single three word string that even resembled something with semantic meaning. And if you were to actually read the about section of that website, the author doesn't actually claim to have produced every possible combination.
I'm not privy to the author's compression algorithm (which might be no compression at all except for forcing a reduced character set). All I'm saying is that hypothetically, you could break up larger "real" books into several smaller books.

If I'm interpreting the author's claim correctly, the claim is that the "library" contains every possible combination of 3200 characters consisting of lowercase characters, space, comma and period. [Edit: what I mean to say here is that the analogous, smaller "books" are strings of 3200 characters, limited to lower case alphabetic characters, spaces, commas and periods -- those strings can then be combined to make larger "books".]

Of course limiting yourself to only those characters is not very practical. But that's what the author did anyway.
 
Last edited:
  • #26
That said, if the argument is that the site is kinda' silly, not really a "library" at all, and not truly worthy of real "research," then yes I agree. (But that doesn't mean it can't be fun.)
 
  • #27
collinsmark said:
If I'm interpreting the author's claim correctly, the claim is that the "library" contains every possible combination of 3200 characters consisting of lowercase characters, space, comma and period.

the library, conceptually, contains every possible 1.3 million character string constructed from 25 characters, so it contains 25^1,312,000 = 10^1,834,097 books, big enough that the corpses of dead librarians flung down the air shafts will decay into dust long before they hit the floor
 
  • #28
And it would seem it does so by combining combinations of 3200 character strings, only one of which you can specify yourself.
 
Back
Top