Sweetspot of data compression

geekynerd · May 18, 2026

each compression technique focuses on each aspects. Low storage space, faster opening speed, high quality storage cpu perfomace etc... But as a normal user i want to little bit of all of that. instead of being on one extreme what if we find a sweet spot in which every feature is not comprimised. which adapts to our system too by reading our system hardware. i am planning on writing a research paper on this. already have a basic idea on huffmann encoding. from now on i am going to explore the subject of data compression. Does my idea already exist??. share your opinion on my idea

Baluncore · May 18, 2026

geekynerd said:

Does my idea already exist?

Yes, but the sweet spot for data compression turns out to be a compromise in every dimension.

The science of data compression is well understood.

jedishrfu · May 19, 2026

The data compression field is a peer-reviewed paper corpus that is truly enormous. Even today, researchers are still looking for better ways to compress the ever increasing onslaught of scientific data that needs to be stored, transmitted and analyzed.

Some of the modern compressors are:
- sz3 https://github.com/szcompressor/SZ3
- mgard
- sperr
- fpzip
- zfp https://github.com/llnl/zfp
- pfpl https://github.com/burtscher/PFPL

Compressors are divided into two camps:
- lossless
- lossy

Lossless compressors are for data that must remain intact when decompressed with zero loss in accuracy. Its very hard to develop lossless compressors. Many compress at the bit level.

Lossy compressors handle scientific data either measured or collected from a simulation. Some level of fuzziness is allowed within a given error bound. Lossy decompression generates similar data but the source data won't match the decompressed data.

PFPL is my favorite compression tool because it's very fast, uses lossy compression within strict error bounds, and leverages the CPU and, if available, the NVIDIA GPU. Also, it was developed at my university's CS department.

Compressors are tested against some of the toughest datasets in the SDR Benchmark suite.

My feeling is that you are naively entering this well-tread and busy subfield of CS.

Try out these compressors, along with any others you find online, to see how well they perform on your machine setup.

There is no ideal compressor. Some tools compress certain data files better than others.

There are also some very intractable datasets ie random data with no pattern to exploit.

My work in the field is very primitive in comparison to the approaches used by the best compressors. I'm using linear, quadratic, cubic, and quartic lossy compression where I try to fit a string of data to a polynomial expression. The "compressed data" is actually the coefficients of the polynomial expressions. However, they will regenerate the data within a user-supplied error bound.

Sadly, its compression ratio is good for smooth, slow-moving data, like an undulating sine curve. But when compressing data that borders on random, which other compressors handle well, mine falls apart, yielding a poor compression ratio.

But the hope is I can do better.

geekynerd · May 22, 2026

Baluncore said:

Yes, but the sweet spot for data compression turns out to be a compromise in every dimension.

The science of data compression is well understood.

regular user dont want single dimension to be perfect. they want a mix of something. yea professional user care about specific need which can be fullfilled by the existing technology but the regular user want a optimal way to store his file not losing too much of something thats the idea of building this

geekynerd · May 22, 2026

jedishrfu said:

The data compression field is a peer-reviewed paper corpus that is truly enormous. Even today, researchers are still looking for better ways to compress the ever increasing onslaught of scientific data that needs to be stored, transmitted and analyzed.

Some of the modern compressors are:
- sz3 https://github.com/szcompressor/SZ3
- mgard
- sperr
- fpzip
- zfp https://github.com/llnl/zfp
- pfpl https://github.com/burtscher/PFPL

Compressors are divided into two camps:
- lossles
- lossy

Lossless compressors are for data that must remain intact when decompressed with zero loss in accuracy. Its very hard to develop lossless compressors. Many tools compress at the bit level.

Lossy compressors handle scientific data either measured or collected from a simulation. Some level of fuzziness is allowed within a given error bound. Lossy decompression generates similar data but the source data won't match the decompressed data.

PFPL is my favorite compression tool because it's very fast, uses lossy compression within strict error bounds, and leverages the CPU and, if available, the NVIDIA GPU. Also, it was developed at my university's CS department.

Compressors are tested against some of the toughest datasets in the SDR Benchmark suite.

My feeling is that you are naively entering this well-tread and busy subfield of CS.

Try out these compressors, along with any others you find online, to see how well they perform on your machine setup.

There is no ideal compressor. Some tools compress certain data files better than others.

There are also some very intractable datasets ie random data with no pattern to exploit.

My work in the field is very primitive in comparison to the approaches used by the best compressors. I'm using linear, quadratic, cubic, and quartic lossy compression where I try to fit a string of data to a polynomial expression. The "compressed data" is actually the coefficients of the polynomial expressions. However, they will regenerate the data within a user-supplied error bound.

Sadly, its compression ratio is good for smooth, slow-moving data, like an undulating sine curve. But when compressing data that borders on chaotic and random, which other compressors handle well, mine falls apart, yielding a poor compression ratio.

But the hope is I can do better.

yea i am jsut learining about data compression out of my own intrest as this idea sparked me. hope those tools aid me and thank you for those. And i am not trying to build a ideal one i am trying to build a optimal one. one thats a "jack of all trades and master of none". my target is to develop this decompressor for normal, regular users.

actually while i was researching i even saw someone using ai for decompression and quantum techniques for decompression. i accept its a busy subfield of cs but i feel like everyone is missing something basic.

jedishrfu · May 22, 2026

The thing people forget is that while data is real number based, data compression is looking patterns in the data but the vast majority of data is truly random meaning there is no pattern.

So it seems to new researchers that there is potential growth in the field but in reality there is a vast desert of random numbers that can't be compressed

berkeman · May 22, 2026

geekynerd said:

i am planning on writing a research paper on this.

The compression algorithm names and links given by @jedishrfu in reply #3 should form a good basis for your research/survey paper, no?

geekynerd said:

actually while i was researching i even saw someone using ai for decompression and quantum techniques for decompression.

Be sure to keep things straight in your head -- they are compression/decompression algorithms.

jedishrfu · May 22, 2026

geekynerd said:

yea i am jsut learining about data compression out of my own intrest as this idea sparked me. hope those tools aid me and thank you for those. And i am not trying to build a ideal one i am trying to build a optimal one. one thats a "jack of all trades and master of none". my target is to develop this decompressor for normal, regular users.

actually while i was researching i even saw someone using ai for decompression and quantum techniques for decompression. i accept its a busy subfield of cs but i feel like everyone is missing something basic.

You can only say this if you've surveyed the field first. All good researchers do this looking for something left unexplored.

But when they find it there will always be something new hidd n away for the next researcher to try.

It seems disingenuous to assume that you have noticed a missing piece but you can't describe it.

I have to admit that I too when I was a lot younger I would have dreams like this, finding the magic formula for the theory of everything.

With respect to AI, there's AI Feynman which has been able to discover the formula behind data that was generated by a simulation of Feynman Lectures on Physics equations.

There’s also PySR that does something similar using genetic algorithms. PySR is quite good.

geekynerd · May 23, 2026

jedishrfu said:

You can only say this if you've surveyed the field first. All good researchers do this looking for something left unexplored.

But when they find it there will always be something new hidd n away for the next researcher to try.

It seems disingenuous to assume that you have noticed a missing piece but you can't describe it.

I have to admit that I too when I was a lot younger would have dreams like this, of finding the magic formula for the theory of everything.

With respect to AI, there's AI Feynman which has been able to discover the formula behind data that was generated by a simulation of Feynman Lectures on Physics equations.

There’s also PySR that does something similar using genetic algorithms. PySR is quite good.

no i didint get this ideas magically or in my dreams but i was watching a video whwere they say the image quality of a png is good but the compression is bad. At that time a idea sparked that not all people are editors or cinematographers. but most of the people uses the png format whithout knowing there are losing most of the storage.

i have a server and i use it as a photo cloud. i am saving the image as png whithout knowing the compression was bad i could have saved so much storage but switching to some other format saves space but a big loss in data. so why can we find a SWEETSPOT of both. my server has a i3 2nd gen chipset and compressing photos in a hard way takes away my perfomance. we have to take that into accountability.

so what i am trying to say is something optimal. something that can be brought mainstream. something that can be used as default in all systems.

i might have not have much knowledge in this field that dosent mean that i cant write a research paper of my idea. obviously i need to like go through preexisting ideas and learn what are the topics i need to read to cover my research. actually this idea gave me a oppurtunity to explore the field. i can understand what you are trying to say 'research before to say something' yea i will do it

geekynerd · May 23, 2026

berkeman said:

The compression algorithm names and links given by @jedishrfu in reply #3 should form a good basis for your research/survey paper, no?

Be sure to keep things straight in your head -- they are compression/decompression algorithms.

oh okay i cant understand those papers. just started learning the basics of data compressions

jedishrfu · May 23, 2026

Have you search on the best compressors for png?

I found a lossless copressor that seems to lead the pack for compression: OxiPNG. Users say it hits the sweet spot no loss of information and the best compression as compared to other lossless compressors.

https://github.com/oxipng/oxipng

And there's a blended compressor PNGgauntlet. It tries two or three png well known compression algorithms to find the best one for a given png file rather like pkzip.

https://pnggauntlet.com/

I've not tried compression tool and can't say which performs better. The homework will be left to the OP.

Also test pkzip to use how well it compresses. It's used by millions and is usually good enough for most work.

Lastly, I need to mention there are many competing compression tools for png files.

Some folks don't mind using lossy compression so they use pngquant first to remove some png structure and resolution information making a smaller png and use oxipng to compress the file further.

Usually for websites where network bandwidth prefers smaller file transfers.

Tuna_lover121 · May 28, 2026

Does anyone know any data processing algorithms that i could go down from using all 256 bytes in my data down to atleast 220?

Baluncore · May 28, 2026

Welcome to PF.

Tuna_lover121 said:

Does anyone know any data processing algorithms that i could go down from using all 256 bytes in my data down to atleast 220?

That will depend on your data.

Tuna_lover121 · May 28, 2026

Baluncore said:

Welcome to PF.

That will depend on your data.

Binary data, no specific type, but i am looking at slightly compressed html data using a special bitwise coding which involves adaptive grey coding that can either be used or not depending on certain byte frequencies,

mtf encoding,

and a xor based encoding that the first step uses (object^object[i+1])

and another process that i have recently thought of after looking at xor operations similar to xor but it is alot to explain and i am still correcting it.

Baluncore · May 28, 2026

If your data is ASCII, then simply deleting the MSB, then packing the data, will give you a compression of; 256 bytes → 224 bytes

jedishrfu · May 28, 2026

Javascript has some libraries to do that. One example is pako

https://github.com/nodeca/pako

Tuna_lover121 · May 28, 2026

Baluncore said:

If your data is ASCII, then simply deleting the MSB, then packing the data, will give you a compression of; 256 bytes → 224 bytes

some parts of my data do have non ASCII bytes like full binary values >127

Tuna_lover121 · Jun 16, 2026

I have found a solution, thanks for the help!

Sweetspot of data compression

SUMMARY

PREREQUISITES

NEXT STEPS

USEFUL FOR

Similar threads

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

PHP My website presents the visitor with the choice of opting out of using cookies....

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect