Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Limits to Data Compression

  1. Mar 15, 2019 #1
    I am fascinated with data compression, and for the past 7 years i have been trying to create an algorithm that can compress information almost infinitely lets say for example 1GB of information into 1KB of information, and while impossible that sounds after 7 years working at that problem 16 hours a day, creating about 50 different algorithms a day, i am about to make infinite data compression a reality.

    So while working on solving this infinite compression problem, my mind wondered to the stars and how they collapse on them self's gets compressed and create black holes, then i got a little bit worried telling myself maybe if i compress information infinitely it also might create a black hole, then i told myself i don't think information would be included in the set of things that can create a black hole if compresses enough and only physical materials can create one.

    Then while i am about to finish my 7 year journey, i was randomly watching a youtube video about quantum mechanics,and in it, it was suggested that even information compression can create black holes LInk: at 1:13:27.

    So assuming in a couple of weeks i press F5 to run the compiler and my program actually works and data gets compressed to about 99% or more, can it create a black hole? should i be worried?
  2. jcsd
  3. Mar 15, 2019 #2


    Staff: Mentor

    I doubt you literally mean "infinite" compression, since that would mean storing infinite information in zero bytes. That's obviously impossible.

    You mention a figure of 99% later on, which is much more realistic.

    Randomly watching youtube videos is not a good way to learn science. Even videos by scientists will mislead you if you are trying to actually learn the science, instead of just learning about the science.

    Whatever Greene meant by "information compression", it's not what you're doing. Your algorithms are just shuffling bytes around inside a computer. That doesn't change the computer's mass or density or gravitational properties.

  4. Mar 16, 2019 at 9:37 PM #3
    So have I been, both in my studies and work.
    ...really? Lossless, or? That may lead to Claude Shannon (the founder of information theory) starting to decompress in his grave, rising up to remind us that there is a fundamental limit to lossless data compression (check out "information entropy", and see e.g. this and this).
    No. To create a black hole you typically need a lot of mass that occupies a very, very small volume of space. See e.g. Schwarzschild radius (Hyperphysics), where you can compute the needed radius for different masses.
    Last edited: Mar 16, 2019 at 9:47 PM
  5. Mar 16, 2019 at 11:09 PM #4
    By the way, I will give you a pretty good initial method how to test the efficiency of your compression algorithm:
    1. Take for instance a plain text document with a reasonable size, let's say 1 MB. Plain texts have comparatively pretty low data variation, and thus a low information entropy, which means they can be heavily compressed.

    2. Let us assume the plain text document is coded in bytes (8 bits).

      The information entropy S of the text document will be between

      [itex] 0 \leqslant S \leqslant log_{2} (2^n)[/itex], where n is the number of bits, which means

      [itex] 0 \leqslant S \leqslant log_{2} (2^8)[/itex], which means

      [itex] 0 \leqslant S \leqslant 8[/itex].

      So the information entropy of a file coded with 8 bits per character will be between 0 and 8. A text file with little data variation will have low information entropy, while a text file with much data variation will have higher information entropy.

    3. Now compress the text file with a reasonably good known algorithm(s), for instance WinRAR.

    4. The resulting compressed file will have a reduced size and a much larger information entropy than the original plain text file, and the entropy will typically be pretty close to maximum (which is 8).

    5. Now, try to compress the compressed file further with your algorithm, and see if you can reduce the file size of it. If you can, you may have a good algorithm. But since the information entropy of the compressed file likely is pretty high (close to 8), you won't be able to significantly compress the file further.
  6. Mar 16, 2019 at 11:11 PM #5


    User Avatar
    Gold Member

    You may be conflating 'compressing' data via algorithmic manipulation with actually compressing matter via gravity. In the latter case the matter is still present. In the former case bytes are actually removed.

    Let us say we have a string of 50 identical bytes stored on a computer. A hypothetical algorithm reduces the 50 byte string to much fewer bytes; say some control stuff, the number 50 and a single byte of the original string. So we have 'compressed' 50 bytes to, say 4. The four byte string uses the same computer resources as any other 4 byte string. The operating system 'gets' 46 bytes back to reuse. A similar argument holds at the electronic level.

    Let us then say a star of 10 solar masses collapses to a black hole. Ignoring other effects the 10 solar masses are still present.

    Another major difference is that information contained in the star is irretrievable from the black hole. The decompression algorithm can take the 4 bytes and restore the 50 byte string so we can use the original information.
  7. Mar 18, 2019 at 3:12 AM #6
    Sorry for the late reply, was really busy.

    of course, it's not literally infinite compression, the ratio of compression is anysize to 1KB, so this is a really really big ratio in the world of data compression and i thought that infinite is a good word to describe it.

    Thanks for the tip, im planning to walk the path of and study quantum physics the right way after im finished with this project.

    Thats what i was gravitating towards too, but i just wanted to be safe and sure before i proceed.

    ofcourse lossless, i understand the limits very well. at first i didnt admit that there are limits and i kept going at it from every possible different angle and in each of those thousands of attempts the limit manifested no matter how i spin it, twist it, i even integrated four or more different algorithms together multiple times and it still was there to stop me.

    Then after enough time trying to solve this problem i understood that i had to create an algorithm that cannot be stopped by the limits, can such an algorithm exist? the answer is yes.

    ill give you a general idea and explain to you how my algorithm works in its basic form and how it can bypasses such entropic limits.

    Lets say we have a number consisting of 20 digits and we call it NUM1, This number put in a certain formula 1 and generates another number with the same size (20 digits) we call it NUM2, now if you apply a formula 2 on Num2 it reverses the process and generates Num1. now lets say we have binary data we want to compress in zeroes and ones ex: "011010101110101..." of size bigger than the zeroes and ones of 20 digits (it can be any bigger size), now the thing is when Num1 generates Num2 it can carry a binary data of 0 or 1 and depending on the choice of either 0 or 1 it generates a different number (the number is changed) and puts you on a different path for each mix of zeroes and ones that is reversible with formula 2 to the previous number. now for size n of data you want to compress you keep generating numbers (Num1 Num2 Num3....till Num N) and thats it, all you need is the last Num (N) and the distance traveled which generates all previous Numbers each carrying a zero or a 1 till Num1.

    now this is an extremely basic form of the algorithm and it is unfathomably more complex the what is shown above, thats why i am confident to sharing it with you.

    i understand this is not a data compression forums, and i will try not to discuss data compression from now on.
  8. Mar 18, 2019 at 3:47 AM #7


    User Avatar

    Staff: Mentor

    If I understand correctly, you say that you can start with N bits of data and compress it to two numbers. If that is the case, for N bits of random data, you will need on average ~N bits to encode the two numbers. You will not get "infinite" compression.

    To test that, apply you algorithm to an already compressed file (using some common algorithm) and see how much compression you can get.
  9. Mar 18, 2019 at 4:35 AM #8
    why? Please explain more.

    now again, maybe you missed something: if i have a file of binary numbers of size N that i want to compress and i apply formula 1 to the initial Num1 N times with each time the number carrying a zero or a one extracted from the original file of N binary bits, now i end up with Num N (20 digits) and the distance traveled (lets say 10 billion another 11 digits) and formula 2 all together are less than 1KB. now apply formula2 to Num N it gets you Num N-1 and a zero or a one then apply the same formula 2 to Num N-1 you get Num N-2 and again you get a zero or a one keep doing that until the first number Num1 then you get back the original file.
  10. Mar 18, 2019 at 8:28 AM #9


    User Avatar
    Gold Member

    @Mohamed Sabah I repeat what dennis said. Take a 1 Megabyte text file and compress it with some standard compression tool. Now use that as your base and see how much further you can compress it. You claim to have spent a huge amount of work on your algroithm, so now APPLY IT. Quit going back and forth with claims and defenses, just APPLY IT. Tell us the size of the original text file, the size of the standard compressed file, and then the size after your algorithm is applied. All the rest of this thread is just noise.
  11. Mar 18, 2019 at 8:29 AM #10


    User Avatar

    Staff: Mentor

    Because information is conserved. If the original is information dense, then it is impossible to compress it very much.
  12. Mar 18, 2019 at 9:02 AM #11
    if you read my OP you will notice that i am in the process of applying it, it still needs a week and a half of work.

    sure ill apply it, but no claims or defenses here, i never claimed that it worked i just said assuming it works in two weeks.
    the original post was about black holes not about data compression, but it seems other posts want to talk about data compression.

    i posted that process to show that there is a logical way to bypass limits.

    if you are interested in my results ill post here again when i run the program otherwise there is no point of posting here, i got what i came here for , can it create a black hole? no.
  13. Mar 18, 2019 at 9:19 AM #12


    User Avatar
    Gold Member


    Fair enough. I think the topic shifted to data compression pretty quickly because you are claiming to do the impossible.

    No, you have not. Your statement is nowhere near clear or specific enough to support that claim, even beside the point that the impossible is, after all, impossible.

    Yes, I'd be interested in seeing your results. In fact, I would be SO interested that I have done the first half of our suggested test for you. I have created an EXTREMELY compressible 1,000,000 byte text file and compressed it with WinZip. The resulting file is 12,063 bytes long and I post it here for you to work with. Since WinZip is not likely to be the very best possible compression technique, I would not be too surprized if you could compress this down to, say, 10,000 bytes. I think your claims suggest that you think you can take it down to WAY less than 1,000 bytes (correct me if I misunderstand your claim). SO ... finish your algorithm and apply it and let us know the results.

    The attached file has the extension ".txt" because I couldn't seem to upload a ZIP file, but the file is in fact a WinZip file

    Attached Files:

  14. Mar 18, 2019 at 9:21 AM #13


    User Avatar

    Staff: Mentor

    Thread closed for Moderation...
  15. Mar 18, 2019 at 10:15 AM #14


    Staff: Mentor

    Thread will remain closed.

    The question was always data compression, not black holes.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook