Compression Conundrum: Why Does WinRAR Reach 84% at Best?

  • Thread starter Thread starter martix
  • Start date Start date
  • Tags Tags
    Compression
AI Thread Summary
The discussion revolves around the compression capabilities of WinRAR, particularly its performance with different types of data. It was noted that WinRAR can compress random data, such as a segment of a transcendental number, to about 84% efficiency. In contrast, patterned data, like program executables, typically compresses to no better than 95%. The conversation highlights the inefficiency of Binary-Coded Decimal (BCD) encoding, which was initially used for the data. After switching from BCD to other data formats, including random data from random.org and large decimal representations of mathematical constants like PI and Catalan numbers, the user found that compression did not significantly reduce the size of the data. The underlying principle discussed is that truly random data is often not compressible, and the efficiency of compression algorithms can vary greatly depending on the data structure and encoding used. The thread also touches on the implications of data representation in terms of size and compression effectiveness.
martix
Messages
167
Reaction score
5
I have discovered something weird that I want someone to explain to me.

The program in question is winRAR, haven't tested others. Here's the thing.

That program at its best is able to compress random data(a piece of some transcendental number) to exactly 84 percent.
However should you choose to compress certain kinds of patterned data(like program executables), rarely does it manage to go lower than 95%.
Why is that?

I'm also not sure where this topic would be more at home - Computer Science or Number Theory.

P.S. I forgot to specify the format of the data - it's in BCD. Don't know how much that is relevant.
 
Technology news on Phys.org
martix said:
I have discovered something weird that I want someone to explain to me.

The program in question is winRAR, haven't tested others. Here's the thing.

That program at its best is able to compress random data(a piece of some transcendental number) to exactly 84 percent.
However should you choose to compress certain kinds of patterned data(like program executables), rarely does it manage to go lower than 95%.
Why is that?
I must question how random digits in a transcendental number are. Not only is truly random data not compressible, it will actually expand in WinRAR and any other compression scheme you can think of...

I'm also not sure where this topic would be more at home - Computer Science or Number Theory.
I would say that this is the right forum...

P.S. I forgot to specify the format of the data - it's in BCD. Don't know how much that is relevant.
It's very relevant. Would you be surprised to find that an ASCII string of a number compressed really well? If you read the wikipedia page on BCD, you'll see that it's not a particularly efficient encoding scheme so a compressor will find opportunities to remove redundancies...
 
I see...

Well I can see that there are loopholes in my scheme. :)
Did some more experiments. Properly this time. For one, I scrapped BCD. Turn's out it was the culprit. :)
In the mean time, I tried a few other methods:
1. I got a bunch of data(64K) from random.org.
2. A 2^28 dec digit PI(in hex).
3. A 2^24 dec digit CATALAN(in hex).

Turns out none of these compressed to any degree :) All it did was added a few headers, the underlying data remained utterly untouched. I guess it's impossible to say how random 2 or 3 is, but they seem to work in this case just as well.
 
If interested, in hex pi is 3.243f6a8885a308d3 ... . Here is a link to an 8MB file starting with the fractional part 243f6a88... (equivalent to about 19.265 million decimal digits).

http://rcgldr.net/misc/pi.bin
 
Last edited by a moderator:
Well, you have 19.265 mill digits, I got a 2^28 digits. Do the math :)
 
martix said:
Well, you have 19.265 mill digits, I got a 2^28 digits.
Sorry, I miss read your post. Wasn't sure what you meant by "in hex". So yes, a 111.465MB binary string is longer than a 8MB binary string. update - corrected my post.
 
Last edited:
It wasn't 32GB, it was 111465411 bytes(or 106.3MB) of binary data. :)
 
martix said:
It wasn't 32GB, it was 111465411 bytes(or 106.3MB) of binary data.
Not paying attention, was thinking 2^28 hex digits or 2^32 bits of data (32 giga-bits, not giga-bytes, which would be 536 mega-bytes. As you mentioned 2^28 decimal digits would take 111,465,411 bytes. ((1 / log10(256)) ~= 0.415241011861 bytes per decimal digit).
 
Back
Top