# PI Hypothosis

1. Jan 7, 2008

### Dmrwizkidz

Removed. Due to Lameness on my Part

Last edited: Jan 8, 2008
2. Jan 8, 2008

### dodo

Hi, Dean, welcome to the forums.

Your compression method would pay off when you choose large enough chunks of data, so that the size of the key + start index (an end index is not needed, for fixed-size input chunks) is small compared to the size of the chunk.

Put yourself for a moment in the following situation: if your data chunk size is, say, 512 bits, imagine that, instead of using Pi, you use the full collection of $$10^{512}$$ possible strings of digits, each of length 512,
0000 ... 0000
0000 ... 0001
...
9999 ... 9998
9999 ... 9999​
where you are guaranteed to find a match to your input data.

Here is the catch: given your 512-bit input chunk, the start index would be a number between 0 and $$10^{512}-1$$, which needs more than 512 bits to be stored.

If your table was made of all $$2^{512}$$ combinations of binary digits instead of decimal (and you key were either 01 or 10), the size of the index would again be 512 bits. In this case, you can see why the choice of key will not make the index smaller.

In the bottom of this discussion is the idea of how much information you want to compress. Compression algorithms usually tackle some known regularities in the input data (for example, if the data is always ASCII text, or if it is a sound wave, or a picture): the more regular the input data is, the less information it actually contains, so it is possible to convey that information in less bits.

3. Jan 8, 2008

### Dmrwizkidz

Also Removed

Last edited: Jan 8, 2008
4. Jan 8, 2008

### CRGreathouse

Sure, but on average the addresses (in that format) will take up more room than the data you're trying to compress.

Let's say you want to store 0xC3D2C7D2C5C1 (the first 7 letters of my username) by finding digits in pi. 'On average' you'll have to look 215 trillion places deep to find it, giving you an address 12 hexadecimal digits long. Let's say you're rather lucky and find such a string 70 billion digits in, 3000 times sooner than expected. Now you just need to give 10 hexadecimal digits for the start address and 10 hexadecimal digits for the end address and you've compressed 12 hexadecimal digits into 20 for a compression ratio of -66%.

If you store the length of the string instead, and make the length and address self-delimiting, you can get right back to 0% compression on average.

By contrast, noting that the string above is all lowercase letters, it could be compressed into 9 hexadecimal digits. Finally a positive compression ratio -- 25%.

5. Jan 8, 2008

### nanoWatt

I'm not sure about compression too much, but The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. http://en.wikipedia.org/wiki/Data_compression

If you want to use PI, you might want to look into some sort of indexing. Indexing makes searching for patterns much faster. Look into SED and AWK for pattern matching tools. I believe they are open source.