Sweetspot of data compression

  • Thread starter Thread starter geekynerd
  • Start date Start date
  • Tags Tags
    #compression #data
geekynerd
Messages
58
Reaction score
11
each compression technique focuses on each aspects. Low storage space, faster opening speed, high quality storage cpu perfomace etc... But as a normal user i want to little bit of all of that. instead of being on one extreme what if we find a sweet spot in which every feature is not comprimised. which adapts to our system too by reading our system hardware. i am planning on writing a research paper on this. already have a basic idea on huffmann encoding. from now on i am going to explore the subject of data compression. Does my idea already exist??. share your opinion on my idea
 
Technology news on Phys.org
geekynerd said:
Does my idea already exist?
Yes, but the sweet spot for data compression turns out to be a compromise in every dimension.

The science of data compression is well understood.
 
The data compression field is a peer-reviewed paper corpus that is truly enormous. Even today, researchers are still looking for better ways to compress the ever increasing onslaught of scientific data that needs to be stored, transmitted and analyzed.

Some of the modern compressors are:
- sz3 https://github.com/szcompressor/SZ3
- mgard
- sperr
- fpzip
- zfp https://github.com/llnl/zfp
- pfpl https://github.com/burtscher/PFPL

Compressors are divided into two camps:
- lossles
- lossy

Lossless compressors are for data that must remain intact when decompressed with zero loss in accuracy. Its very hard to develop lossless compressors. Many compress at the bit level.

Lossy compressors handle scientific data either measured or collected from a simulation. Some level of fuzziness is allowed within a given error bound. Lossy decompression generates similar data but the source data won't match the decompressed data.

PFPL is my favorite compression tool because it's very fast, uses lossy compression within strict error bounds, and leverages the CPU and, if available, the NVIDIA GPU. Also, it was developed at my university's CS department.

Compressors are tested against some of the toughest datasets in the SDR Benchmark suite.

My feeling is that you are naively entering this well-tread and busy subfield of CS.

Try out these compressors, along with any others you find online, to see how well they perform on your machine setup.

There is no ideal compressor. Some tools compress certain data files better than others.

There are also some very intractable datasets ie random data with no pattern to exploit.

My work in the field is very primitive in comparison to the approaches used by the best compressors. I'm using linear, quadratic, cubic, and quartic lossy compression where I try to fit a string of data to a polynomial expression. The "compressed data" is actually the coefficients of the polynomial expressions. However, they will regenerate the data within a user-supplied error bound.

Sadly, its compression ratio is good for smooth, slow-moving data, like an undulating sine curve. But when compressing data that borders on random, which other compressors handle well, mine falls apart, yielding a poor compression ratio.

But the hope is I can do better.
 
Last edited:
Baluncore said:
Yes, but the sweet spot for data compression turns out to be a compromise in every dimension.

The science of data compression is well understood.
regular user dont want single dimension to be perfect. they want a mix of something. yea professional user care about specific need which can be fullfilled by the existing technology but the regular user want a optimal way to store his file not losing too much of something thats the idea of building this
 
jedishrfu said:
The data compression field is a peer-reviewed paper corpus that is truly enormous. Even today, researchers are still looking for better ways to compress the ever increasing onslaught of scientific data that needs to be stored, transmitted and analyzed.

Some of the modern compressors are:
- sz3 https://github.com/szcompressor/SZ3
- mgard
- sperr
- fpzip
- zfp https://github.com/llnl/zfp
- pfpl https://github.com/burtscher/PFPL

Compressors are divided into two camps:
- lossles
- lossy

Lossless compressors are for data that must remain intact when decompressed with zero loss in accuracy. Its very hard to develop lossless compressors. Many tools compress at the bit level.

Lossy compressors handle scientific data either measured or collected from a simulation. Some level of fuzziness is allowed within a given error bound. Lossy decompression generates similar data but the source data won't match the decompressed data.

PFPL is my favorite compression tool because it's very fast, uses lossy compression within strict error bounds, and leverages the CPU and, if available, the NVIDIA GPU. Also, it was developed at my university's CS department.

Compressors are tested against some of the toughest datasets in the SDR Benchmark suite.

My feeling is that you are naively entering this well-tread and busy subfield of CS.

Try out these compressors, along with any others you find online, to see how well they perform on your machine setup.

There is no ideal compressor. Some tools compress certain data files better than others.

There are also some very intractable datasets ie random data with no pattern to exploit.

My work in the field is very primitive in comparison to the approaches used by the best compressors. I'm using linear, quadratic, cubic, and quartic lossy compression where I try to fit a string of data to a polynomial expression. The "compressed data" is actually the coefficients of the polynomial expressions. However, they will regenerate the data within a user-supplied error bound.

Sadly, its compression ratio is good for smooth, slow-moving data, like an undulating sine curve. But when compressing data that borders on chaotic and random, which other compressors handle well, mine falls apart, yielding a poor compression ratio.

But the hope is I can do better.
yea i am jsut learining about data compression out of my own intrest as this idea sparked me. hope those tools aid me and thank you for those. And i am not trying to build a ideal one i am trying to build a optimal one. one thats a "jack of all trades and master of none". my target is to develop this decompressor for normal, regular users.

actually while i was researching i even saw someone using ai for decompression and quantum techniques for decompression. i accept its a busy subfield of cs but i feel like everyone is missing something basic.
 
Last edited by a moderator:
The thing people forget is that while data is real number based, data compression is looking patterns in the data but the vast majority of data is truly random meaning there is no pattern.

So it seems to new researchers that there is potential growth in the field but in reality there is a vast desert of random numbers that can't be compressed
 

Similar threads

Replies
6
Views
1K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 39 ·
2
Replies
39
Views
17K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
29
Views
6K
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K