Limits to Data Compression

In summary, Claude Shannon's grave may start to decompress if an algorithm achieves 99% compression of data.
  • #1
Mohamed Sabah
12
0
I am fascinated with data compression, and for the past 7 years i have been trying to create an algorithm that can compress information almost infinitely let's say for example 1GB of information into 1KB of information, and while impossible that sounds after 7 years working at that problem 16 hours a day, creating about 50 different algorithms a day, i am about to make infinite data compression a reality.

So while working on solving this infinite compression problem, my mind wondered to the stars and how they collapse on them self's gets compressed and create black holes, then i got a little bit worried telling myself maybe if i compress information infinitely it also might create a black hole, then i told myself i don't think information would be included in the set of things that can create a black hole if compresses enough and only physical materials can create one.

Then while i am about to finish my 7 year journey, i was randomly watching a youtube video about quantum mechanics,and in it, it was suggested that even information compression can create black holes LInk: at 1:13:27.

So assuming in a couple of weeks i press F5 to run the compiler and my program actually works and data gets compressed to about 99% or more, can it create a black hole? should i be worried?
 
Technology news on Phys.org
  • #2
Mohamed Sabah said:
i am about to make infinite data compression a reality.

I doubt you literally mean "infinite" compression, since that would mean storing infinite information in zero bytes. That's obviously impossible.

You mention a figure of 99% later on, which is much more realistic.

Mohamed Sabah said:
i was randomly watching a youtube video

Randomly watching youtube videos is not a good way to learn science. Even videos by scientists will mislead you if you are trying to actually learn the science, instead of just learning about the science.

Mohamed Sabah said:
t was suggested that even information compression can create black holes

Whatever Greene meant by "information compression", it's not what you're doing. Your algorithms are just shuffling bytes around inside a computer. That doesn't change the computer's mass or density or gravitational properties.

Mohamed Sabah said:
an it create a black hole? should i be worried?

No.
 
  • #3
Mohamed Sabah said:
I am fascinated with data compression [...]
So have I been, both in my studies and work.
Mohamed Sabah said:
[...] i am about to make infinite data compression a reality
...really? Lossless, or? That may lead to Claude Shannon (the founder of information theory) starting to decompress in his grave, rising up to remind us that there is a fundamental limit to lossless data compression (check out "information entropy", and see e.g. this and this).
Mohamed Sabah said:
[...] can it create a black hole? should i be worried?
No. To create a black hole you typically need a lot of mass that occupies a very, very small volume of space. See e.g. Schwarzschild radius (Hyperphysics), where you can compute the needed radius for different masses.
 
Last edited:
  • Like
Likes .Scott
  • #4
Mohamed Sabah said:
I am fascinated with data compression, and for the past 7 years i have been trying to create an algorithm that can compress information almost infinitely let's say for example 1GB of information into 1KB of information [...]
By the way, I will give you a pretty good initial method how to test the efficiency of your compression algorithm:
  1. Take for instance a plain text document with a reasonable size, let's say 1 MB. Plain texts have comparatively pretty low data variation, and thus a low information entropy, which means they can be heavily compressed.
  2. Let us assume the plain text document is coded in bytes (8 bits).

    The information entropy S of the text document will be between

    [itex] 0 \leqslant S \leqslant log_{2} (2^n)[/itex], where n is the number of bits, which means

    [itex] 0 \leqslant S \leqslant log_{2} (2^8)[/itex], which means

    [itex] 0 \leqslant S \leqslant 8[/itex].

    So the information entropy of a file coded with 8 bits per character will be between 0 and 8. A text file with little data variation will have low information entropy, while a text file with much data variation will have higher information entropy.
  3. Now compress the text file with a reasonably good known algorithm(s), for instance WinRAR.
  4. The resulting compressed file will have a reduced size and a much larger information entropy than the original plain text file, and the entropy will typically be pretty close to maximum (which is 8).
  5. Now, try to compress the compressed file further with your algorithm, and see if you can reduce the file size of it. If you can, you may have a good algorithm. But since the information entropy of the compressed file likely is pretty high (close to 8), you won't be able to significantly compress the file further.
 
  • #5
You may be conflating 'compressing' data via algorithmic manipulation with actually compressing matter via gravity. In the latter case the matter is still present. In the former case bytes are actually removed.

Let us say we have a string of 50 identical bytes stored on a computer. A hypothetical algorithm reduces the 50 byte string to much fewer bytes; say some control stuff, the number 50 and a single byte of the original string. So we have 'compressed' 50 bytes to, say 4. The four byte string uses the same computer resources as any other 4 byte string. The operating system 'gets' 46 bytes back to reuse. A similar argument holds at the electronic level.

Let us then say a star of 10 solar masses collapses to a black hole. Ignoring other effects the 10 solar masses are still present.

Another major difference is that information contained in the star is irretrievable from the black hole. The decompression algorithm can take the 4 bytes and restore the 50 byte string so we can use the original information.
 
  • #6
Sorry for the late reply, was really busy.

PeterDonis said:
I doubt you literally mean "infinite" compression, since that would mean storing infinite information in zero bytes. That's obviously impossible.

of course, it's not literally infinite compression, the ratio of compression is anysize to 1KB, so this is a really really big ratio in the world of data compression and i thought that infinite is a good word to describe it.

PeterDonis said:
Randomly watching youtube videos is not a good way to learn science. Even videos by scientists will mislead you if you are trying to actually learn the science, instead of just learning about the science.

Thanks for the tip, I am planning to walk the path of and study quantum physics the right way after I am finished with this project.

PeterDonis said:
Whatever Greene meant by "information compression", it's not what you're doing. Your algorithms are just shuffling bytes around inside a computer. That doesn't change the computer's mass or density or gravitational properties.

Thats what i was gravitating towards too, but i just wanted to be safe and sure before i proceed.

DennisN said:
...really? Lossless, or? That may lead to Claude Shannon (the founder of information theory) starting to decompress in his grave, rising up to remind us that there is a fundamental limit to lossless data compression (check out "information entropy", and see e.g. this and this).

ofcourse lossless, i understand the limits very well. at first i didnt admit that there are limits and i kept going at it from every possible different angle and in each of those thousands of attempts the limit manifested no matter how i spin it, twist it, i even integrated four or more different algorithms together multiple times and it still was there to stop me.

Then after enough time trying to solve this problem i understood that i had to create an algorithm that cannot be stopped by the limits, can such an algorithm exist? the answer is yes.

ill give you a general idea and explain to you how my algorithm works in its basic form and how it can bypasses such entropic limits.

Lets say we have a number consisting of 20 digits and we call it NUM1, This number put in a certain formula 1 and generates another number with the same size (20 digits) we call it NUM2, now if you apply a formula 2 on Num2 it reverses the process and generates Num1. now let's say we have binary data we want to compress in zeroes and ones ex: "011010101110101..." of size bigger than the zeroes and ones of 20 digits (it can be any bigger size), now the thing is when Num1 generates Num2 it can carry a binary data of 0 or 1 and depending on the choice of either 0 or 1 it generates a different number (the number is changed) and puts you on a different path for each mix of zeroes and ones that is reversible with formula 2 to the previous number. now for size n of data you want to compress you keep generating numbers (Num1 Num2 Num3...till Num N) and that's it, all you need is the last Num (N) and the distance traveled which generates all previous Numbers each carrying a zero or a 1 till Num1.

now this is an extremely basic form of the algorithm and it is unfathomably more complex the what is shown above, that's why i am confident to sharing it with you.

i understand this is not a data compression forums, and i will try not to discuss data compression from now on.
 
  • #7
Mohamed Sabah said:
ill give you a general idea and explain to you how my algorithm works in its basic form and how it can bypasses such entropic limits.

Lets say we have a number consisting of 20 digits and we call it NUM1, This number put in a certain formula 1 and generates another number with the same size (20 digits) we call it NUM2, now if you apply a formula 2 on Num2 it reverses the process and generates Num1. now let's say we have binary data we want to compress in zeroes and ones ex: "011010101110101..." of size bigger than the zeroes and ones of 20 digits (it can be any bigger size), now the thing is when Num1 generates Num2 it can carry a binary data of 0 or 1 and depending on the choice of either 0 or 1 it generates a different number (the number is changed) and puts you on a different path for each mix of zeroes and ones that is reversible with formula 2 to the previous number. now for size n of data you want to compress you keep generating numbers (Num1 Num2 Num3...till Num N) and that's it, all you need is the last Num (N) and the distance traveled which generates all previous Numbers each carrying a zero or a 1 till Num1.
If I understand correctly, you say that you can start with N bits of data and compress it to two numbers. If that is the case, for N bits of random data, you will need on average ~N bits to encode the two numbers. You will not get "infinite" compression.

To test that, apply you algorithm to an already compressed file (using some common algorithm) and see how much compression you can get.
 
  • #8
DrClaude said:
you will need on average ~N bits to encode the two numbers.

why? Please explain more.

now again, maybe you missed something: if i have a file of binary numbers of size N that i want to compress and i apply formula 1 to the initial Num1 N times with each time the number carrying a zero or a one extracted from the original file of N binary bits, now i end up with Num N (20 digits) and the distance traveled (lets say 10 billion another 11 digits) and formula 2 all together are less than 1KB. now apply formula2 to Num N it gets you Num N-1 and a zero or a one then apply the same formula 2 to Num N-1 you get Num N-2 and again you get a zero or a one keep doing that until the first number Num1 then you get back the original file.
 
  • #9
@Mohamed Sabah I repeat what dennis said. Take a 1 Megabyte text file and compress it with some standard compression tool. Now use that as your base and see how much further you can compress it. You claim to have spent a huge amount of work on your algroithm, so now APPLY IT. Quit going back and forth with claims and defenses, just APPLY IT. Tell us the size of the original text file, the size of the standard compressed file, and then the size after your algorithm is applied. All the rest of this thread is just noise.
 
  • Like
Likes .Scott, A. Neumaier and DrClaude
  • #10
Mohamed Sabah said:
why? Please explain more.
Because information is conserved. If the original is information dense, then it is impossible to compress it very much.
 
  • Like
Likes Merlin3189
  • #11
phinds said:
I repeat what dennis said. Take a 1 Megabyte text file and compress it with some standard compression tool. Now use that as your base and see how much further you can compress it. You claim to have spent a huge amount of work on your algroithm, so now APPLY IT.

if you read my OP you will notice that i am in the process of applying it, it still needs a week and a half of work.

phinds said:
Quit going back and forth with claims and defenses, just APPLY IT.

sure ill apply it, but no claims or defenses here, i never claimed that it worked i just said assuming it works in two weeks.
the original post was about black holes not about data compression, but it seems other posts want to talk about data compression.

i posted that process to show that there is a logical way to bypass limits.

if you are interested in my results ill post here again when i run the program otherwise there is no point of posting here, i got what i came here for , can it create a black hole? no.
 
  • #12
Mohamed Sabah said:
if you read my OP you will notice that i am in the process of applying it, it still needs a week and a half of work.
OK

sure ill apply it, but no claims or defenses here, i never claimed that it worked i just said assuming it works in two weeks.
the original post was about black holes not about data compression, but it seems other posts want to talk about data compression.
Fair enough. I think the topic shifted to data compression pretty quickly because you are claiming to do the impossible.

i posted that process to show that there is a logical way to bypass limits.
No, you have not. Your statement is nowhere near clear or specific enough to support that claim, even beside the point that the impossible is, after all, impossible.

if you are interested in my results ill post here again when i run the program otherwise there is no point of posting here, i got what i came here for , can it create a black hole? no.
Yes, I'd be interested in seeing your results. In fact, I would be SO interested that I have done the first half of our suggested test for you. I have created an EXTREMELY compressible 1,000,000 byte text file and compressed it with WinZip. The resulting file is 12,063 bytes long and I post it here for you to work with. Since WinZip is not likely to be the very best possible compression technique, I would not be too surprized if you could compress this down to, say, 10,000 bytes. I think your claims suggest that you think you can take it down to WAY less than 1,000 bytes (correct me if I misunderstand your claim). SO ... finish your algorithm and apply it and let us know the results.

The attached file has the extension ".txt" because I couldn't seem to upload a ZIP file, but the file is in fact a WinZip file
 

Attachments

  • bigzip.txt
    11.8 KB · Views: 152
  • Like
Likes .Scott
  • #13
Thread closed for Moderation...
 
  • #14
Thread will remain closed.

The question was always data compression, not black holes.
 

What is data compression?

Data compression is the process of reducing the size of data to save storage space or improve transmission speeds. It involves encoding data using algorithms to remove redundancies and unnecessary information.

Why are there limits to data compression?

There are limits to data compression because all data has a minimum amount of information that cannot be further reduced without losing important details. Also, some types of data, such as already compressed files, cannot be compressed any further.

What are the factors that affect data compression?

The factors that affect data compression include the type of data being compressed, the compression algorithm used, and the desired level of compression. Other factors include the amount of available storage space, the speed of data transmission, and the processing power of the system.

What are lossy and lossless compression?

Lossy compression is a type of data compression that permanently removes some information from the original data to reduce its size. Lossless compression, on the other hand, maintains all of the original data and can be fully restored to its original form without any loss of information.

What are the potential drawbacks of data compression?

One potential drawback of data compression is the loss of quality or detail in the compressed data, especially with lossy compression. It is also possible for compressed data to become corrupted during the compression process, making it unusable. Additionally, decompressing large amounts of data can be time-consuming and require significant processing power.

Similar threads

  • Programming and Computer Science
Replies
15
Views
2K
  • Programming and Computer Science
Replies
11
Views
1K
  • Programming and Computer Science
Replies
4
Views
620
  • Programming and Computer Science
Replies
10
Views
2K
  • Programming and Computer Science
Replies
8
Views
256
  • Programming and Computer Science
Replies
6
Views
2K
  • Programming and Computer Science
Replies
6
Views
1K
  • Programming and Computer Science
Replies
7
Views
283
  • Special and General Relativity
Replies
7
Views
150
Replies
6
Views
1K
Back
Top