Compression conundrum

  • Thread starter martix
  • Start date
  • #1
147
0

Main Question or Discussion Point

I have discovered something weird that I want someone to explain to me.

The program in question is winRAR, haven't tested others. Here's the thing.

That program at its best is able to compress random data(a piece of some transcendental number) to exactly 84 percent.
However should you choose to compress certain kinds of patterned data(like program executables), rarely does it manage to go lower than 95%.
Why is that?

I'm also not sure where this topic would be more at home - Computer Science or Number Theory.

P.S. I forgot to specify the format of the data - it's in BCD. Don't know how much that is relevant.
 

Answers and Replies

  • #2
133
0
I have discovered something weird that I want someone to explain to me.

The program in question is winRAR, haven't tested others. Here's the thing.

That program at its best is able to compress random data(a piece of some transcendental number) to exactly 84 percent.
However should you choose to compress certain kinds of patterned data(like program executables), rarely does it manage to go lower than 95%.
Why is that?
I must question how random digits in a transcendental number are. Not only is truly random data not compressible, it will actually expand in WinRAR and any other compression scheme you can think of...

I'm also not sure where this topic would be more at home - Computer Science or Number Theory.
I would say that this is the right forum...

P.S. I forgot to specify the format of the data - it's in BCD. Don't know how much that is relevant.
It's very relevant. Would you be surprised to find that an ASCII string of a number compressed really well? If you read the wikipedia page on BCD, you'll see that it's not a particularly efficient encoding scheme so a compressor will find opportunities to remove redundancies...
 
  • #3
147
0
I see...

Well I can see that there are loopholes in my scheme. :)
Did some more experiments. Properly this time. For one, I scrapped BCD. Turn's out it was the culprit. :)
In the mean time, I tried a few other methods:
1. I got a bunch of data(64K) from random.org.
2. A 2^28 dec digit PI(in hex).
3. A 2^24 dec digit CATALAN(in hex).

Turns out none of these compressed to any degree :) All it did was added a few headers, the underlying data remained utterly untouched. I guess it's impossible to say how random 2 or 3 is, but they seem to work in this case just as well.
 
  • #4
rcgldr
Homework Helper
8,682
520
If interested, in hex pi is 3.243f6a8885a308d3 ... . Here is a link to an 8MB file starting with the fractional part 243f6a88... (equivalent to about 19.265 million decimal digits).

http://rcgldr.net/misc/pi.bin [Broken]
 
Last edited by a moderator:
  • #5
147
0
Well, you have 19.265 mill digits, I got a 2^28 digits. Do the math :)
 
  • #6
rcgldr
Homework Helper
8,682
520
Well, you have 19.265 mill digits, I got a 2^28 digits.
Sorry, I miss read your post. Wasn't sure what you meant by "in hex". So yes, a 111.465MB binary string is longer than a 8MB binary string. update - corrected my post.
 
Last edited:
  • #7
147
0
It wasn't 32GB, it was 111465411 bytes(or 106.3MB) of binary data. :)
 
  • #8
rcgldr
Homework Helper
8,682
520
It wasn't 32GB, it was 111465411 bytes(or 106.3MB) of binary data.
Not paying attention, was thinking 2^28 hex digits or 2^32 bits of data (32 giga-bits, not giga-bytes, which would be 536 mega-bytes. As you mentioned 2^28 decimal digits would take 111,465,411 bytes. ((1 / log10(256)) ~= 0.415241011861 bytes per decimal digit).
 

Related Threads on Compression conundrum

  • Last Post
Replies
4
Views
2K
  • Last Post
Replies
13
Views
1K
  • Last Post
Replies
9
Views
2K
  • Last Post
Replies
7
Views
2K
  • Last Post
Replies
2
Views
4K
Replies
6
Views
999
Replies
6
Views
8K
Replies
2
Views
455
Replies
11
Views
833
Replies
3
Views
2K
Top