Apology for Lameness: How to Make Amends

  • Thread starter Thread starter Dmrwizkidz
  • Start date Start date
AI Thread Summary
The discussion focuses on data compression techniques and the efficiency of using large data chunks for compression. It highlights the importance of choosing the right key and indexing method to optimize storage and retrieval. The conversation emphasizes that the regularity of input data can significantly affect compression effectiveness, with examples illustrating potential compression ratios. Additionally, it suggests exploring tools like SED and AWK for better pattern matching in data compression tasks. Overall, the thread provides insights into the complexities and considerations involved in data compression strategies.
Dmrwizkidz
Messages
2
Reaction score
0
Removed. Due to Lameness on my Part
 
Last edited:
Mathematics news on Phys.org
Hi, Dean, welcome to the forums.

Your compression method would pay off when you choose large enough chunks of data, so that the size of the key + start index (an end index is not needed, for fixed-size input chunks) is small compared to the size of the chunk.

Put yourself for a moment in the following situation: if your data chunk size is, say, 512 bits, imagine that, instead of using Pi, you use the full collection of 10^{512} possible strings of digits, each of length 512,
0000 ... 0000
0000 ... 0001
...
9999 ... 9998
9999 ... 9999​
where you are guaranteed to find a match to your input data.

Here is the catch: given your 512-bit input chunk, the start index would be a number between 0 and 10^{512}-1, which needs more than 512 bits to be stored.

If your table was made of all 2^{512} combinations of binary digits instead of decimal (and you key were either 01 or 10), the size of the index would again be 512 bits. In this case, you can see why the choice of key will not make the index smaller.

In the bottom of this discussion is the idea of how much information you want to compress. Compression algorithms usually tackle some known regularities in the input data (for example, if the data is always ASCII text, or if it is a sound wave, or a picture): the more regular the input data is, the less information it actually contains, so it is possible to convey that information in less bits.

If there is something about your idea that I'm missing, please let me know.
 
Also Removed
 
Last edited:
Dmrwizkidz said:
Then my Address would be something like - 000000001-001 through 9502F9000-3FE of course assuming the chunk of data being found is a known constant as well. (6 bytes)

Sure, but on average the addresses (in that format) will take up more room than the data you're trying to compress.

Let's say you want to store 0xC3D2C7D2C5C1 (the first 7 letters of my username) by finding digits in pi. 'On average' you'll have to look 215 trillion places deep to find it, giving you an address 12 hexadecimal digits long. Let's say you're rather lucky and find such a string 70 billion digits in, 3000 times sooner than expected. Now you just need to give 10 hexadecimal digits for the start address and 10 hexadecimal digits for the end address and you've compressed 12 hexadecimal digits into 20 for a compression ratio of -66%.

If you store the length of the string instead, and make the length and address self-delimiting, you can get right back to 0% compression on average.

By contrast, noting that the string above is all lowercase letters, it could be compressed into 9 hexadecimal digits. Finally a positive compression ratio -- 25%.
 
I'm not sure about compression too much, but The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. http://en.wikipedia.org/wiki/Data_compression

If you want to use PI, you might want to look into some sort of indexing. Indexing makes searching for patterns much faster. Look into SED and AWK for pattern matching tools. I believe they are open source.
 
Fermat's Last Theorem has long been one of the most famous mathematical problems, and is now one of the most famous theorems. It simply states that the equation $$ a^n+b^n=c^n $$ has no solutions with positive integers if ##n>2.## It was named after Pierre de Fermat (1607-1665). The problem itself stems from the book Arithmetica by Diophantus of Alexandria. It gained popularity because Fermat noted in his copy "Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos, et...
Insights auto threads is broken atm, so I'm manually creating these for new Insight articles. In Dirac’s Principles of Quantum Mechanics published in 1930 he introduced a “convenient notation” he referred to as a “delta function” which he treated as a continuum analog to the discrete Kronecker delta. The Kronecker delta is simply the indexed components of the identity operator in matrix algebra Source: https://www.physicsforums.com/insights/what-exactly-is-diracs-delta-function/ by...
Thread 'Imaginary Pythagorus'
I posted this in the Lame Math thread, but it's got me thinking. Is there any validity to this? Or is it really just a mathematical trick? Naively, I see that i2 + plus 12 does equal zero2. But does this have a meaning? I know one can treat the imaginary number line as just another axis like the reals, but does that mean this does represent a triangle in the complex plane with a hypotenuse of length zero? Ibix offered a rendering of the diagram using what I assume is matrix* notation...
Back
Top