- #1
Deano1984
- 4
- 0
Over the last 2 years, i have been developing a compression technique that is free of "mathematical" expressions. By doing this, the compression scheme would be free of traditional mathematic limitations usually applied to compression routines.
I'll go right into it, please let me know if i need to clarify any of the below.
I'll use a random set of digits for my example: 1415 9265 3589 7932
Now the basic point of this method is to observe the frequency of each specific digit, 0 appears no times, 1 appears 2 times, 2 apears 2 times, 3 appears 1 times, etc.
We then create a binary key sequence consiting of 10 bits. 98 7654 3210, then we proceed to turn them on, one at a time.
98 7654 3210
00 0000 0001
would make the sequence above into a binary pattern of 0000 0000 0000 0000.
98 7654 3210
00 0000 0010
would make the sequence above into a binary pattern of 1010 0000 0000 0000.
Repeating this 1024 times, would create 1024 sets of 2 bytes of data. or 2048 KB of data. all of which would only have a 10 bit Key address in this particular example, as the source data does not change. 10/16 represents a compression ratio of .625 out of 1.0.
The bigger picture: Let's say my source set was a continuous number (such as PI) And was always restricted to 8192 digit samples, starting at the first character in PI, and shifting over 1 digit, per 1024 keys. (keys 11 1111 1111, and 00 0000 0000 will not be used as they would make the entire set either all 0's or all 1's)
8192 digits per sample = 8192 binary digits per key = 1KB of data created from a number that can easily be formed via any home computer, and would only have to have the address where to go in PI, and the key used to obtain the data.
Now assuming we only use the currently formulated 1.2 trillion digits of PI (the world record held by Kanada Labratory) and each digit of this was held on a hard drive, each digit would only consume 4 bits, meaning the hard drive would have to be no bigger than 600 GB, which is very feasible in today's computer market.
This system could feasibly create 1,200,000,000,000 x 8,192 bits = 9,830,400,000,000,000 bits of data, that would only need to be stored on one main computer at the compression facility.
The key to storing the data: hash checksums.
For every key created, 2 hash checksums are created using adler32 and crc32, the odds of 2 different strings of data, creating the same checksum in BOTH hashes are more so than the odds of creating every possibility of 8192 bits. as it would be
2^32 to the 2^32 power odds of any 2 data strings creating the same hash. So using the hashes creating during the discovery phase to store each 1kb found using the keys as 8 bytes (2 separate 4 byte hashes) meaning we could store on a 1TB hard drive, 125000000000 sets of 1kb, or 116.415321826934814453125 TB of data, that would NEVER need to be fully stored, anywhere else on the planet, ever again.
Sorry for being so passionate about this, but please let me know what you think, and please ask any questions you may have, and any constructive advice would be wonderful.
William Dean Dover
Posted Monday, July 28, 2008 11:33AM CST
I'll go right into it, please let me know if i need to clarify any of the below.
I'll use a random set of digits for my example: 1415 9265 3589 7932
Now the basic point of this method is to observe the frequency of each specific digit, 0 appears no times, 1 appears 2 times, 2 apears 2 times, 3 appears 1 times, etc.
We then create a binary key sequence consiting of 10 bits. 98 7654 3210, then we proceed to turn them on, one at a time.
98 7654 3210
00 0000 0001
would make the sequence above into a binary pattern of 0000 0000 0000 0000.
98 7654 3210
00 0000 0010
would make the sequence above into a binary pattern of 1010 0000 0000 0000.
Repeating this 1024 times, would create 1024 sets of 2 bytes of data. or 2048 KB of data. all of which would only have a 10 bit Key address in this particular example, as the source data does not change. 10/16 represents a compression ratio of .625 out of 1.0.
The bigger picture: Let's say my source set was a continuous number (such as PI) And was always restricted to 8192 digit samples, starting at the first character in PI, and shifting over 1 digit, per 1024 keys. (keys 11 1111 1111, and 00 0000 0000 will not be used as they would make the entire set either all 0's or all 1's)
8192 digits per sample = 8192 binary digits per key = 1KB of data created from a number that can easily be formed via any home computer, and would only have to have the address where to go in PI, and the key used to obtain the data.
Now assuming we only use the currently formulated 1.2 trillion digits of PI (the world record held by Kanada Labratory) and each digit of this was held on a hard drive, each digit would only consume 4 bits, meaning the hard drive would have to be no bigger than 600 GB, which is very feasible in today's computer market.
This system could feasibly create 1,200,000,000,000 x 8,192 bits = 9,830,400,000,000,000 bits of data, that would only need to be stored on one main computer at the compression facility.
The key to storing the data: hash checksums.
For every key created, 2 hash checksums are created using adler32 and crc32, the odds of 2 different strings of data, creating the same checksum in BOTH hashes are more so than the odds of creating every possibility of 8192 bits. as it would be
2^32 to the 2^32 power odds of any 2 data strings creating the same hash. So using the hashes creating during the discovery phase to store each 1kb found using the keys as 8 bytes (2 separate 4 byte hashes) meaning we could store on a 1TB hard drive, 125000000000 sets of 1kb, or 116.415321826934814453125 TB of data, that would NEVER need to be fully stored, anywhere else on the planet, ever again.
Sorry for being so passionate about this, but please let me know what you think, and please ask any questions you may have, and any constructive advice would be wonderful.
William Dean Dover
Posted Monday, July 28, 2008 11:33AM CST