Apology for Lameness: How to Make Amends

Dmrwizkidz · Jan 7, 2008

Removed. Due to Lameness on my Part

dodo · Jan 8, 2008

Hi, Dean, welcome to the forums.

Your compression method would pay off when you choose large enough chunks of data, so that the size of the key + start index (an end index is not needed, for fixed-size input chunks) is small compared to the size of the chunk.

Put yourself for a moment in the following situation: if your data chunk size is, say, 512 bits, imagine that, instead of using Pi, you use the full collection of [tex]10^{512}[/tex] possible strings of digits, each of length 512,

0000 ... 0000
0000 ... 0001
...
9999 ... 9998
9999 ... 9999

where you are guaranteed to find a match to your input data.

Here is the catch: given your 512-bit input chunk, the start index would be a number between 0 and [tex]10^{512}-1[/tex], which needs more than 512 bits to be stored.

If your table was made of all [tex]2^{512}[/tex] combinations of binary digits instead of decimal (and you key were either 01 or 10), the size of the index would again be 512 bits. In this case, you can see why the choice of key will not make the index smaller.

In the bottom of this discussion is the idea of how much information you want to compress. Compression algorithms usually tackle some known regularities in the input data (for example, if the data is always ASCII text, or if it is a sound wave, or a picture): the more regular the input data is, the less information it actually contains, so it is possible to convey that information in less bits.

If there is something about your idea that I'm missing, please let me know.

Dmrwizkidz · Jan 8, 2008

Also Removed

CRGreathouse · Jan 8, 2008

Dmrwizkidz said:

Then my Address would be something like - 000000001-001 through 9502F9000-3FE of course assuming the chunk of data being found is a known constant as well. (6 bytes)

Sure, but on average the addresses (in that format) will take up more room than the data you're trying to compress.

Let's say you want to store 0xC3D2C7D2C5C1 (the first 7 letters of my username) by finding digits in pi. 'On average' you'll have to look 215 trillion places deep to find it, giving you an address 12 hexadecimal digits long. Let's say you're rather lucky and find such a string 70 billion digits in, 3000 times sooner than expected. Now you just need to give 10 hexadecimal digits for the start address and 10 hexadecimal digits for the end address and you've compressed 12 hexadecimal digits into 20 for a compression ratio of -66%.

If you store the length of the string instead, and make the length and address self-delimiting, you can get right back to 0% compression on average.

By contrast, noting that the string above is all lowercase letters, it could be compressed into 9 hexadecimal digits. Finally a positive compression ratio -- 25%.

nanoWatt · Jan 8, 2008

I'm not sure about compression too much, but The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. http://en.wikipedia.org/wiki/Data_compression

If you want to use PI, you might want to look into some sort of indexing. Indexing makes searching for patterns much faster. Look into SED and AWK for pattern matching tools. I believe they are open source.

Apology for Lameness: How to Make Amends

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Find the Number of Triangles

High School Ant on a stretchy rope puzzle

High School Potato paradox

Geometric Game: Fun With Matches (Safe!)

Undergrad Three Circle Problem

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect