Huffman coding is currently being replaced with ANS coding

jarekduda · Jan 11, 2017

Huffman coding (e.g. A -> 010) is fast but approximates probabilities with powers of 1/2, while e.g. symbol of probability 0.99 carries only ~0.014 bits of information. This inaccuracy was repaired by arithmetic coding, but it is an order magnitude slower (more costly).
Above compromise has been recently ended with ANS coding, which is both fast(cheap) and accurate:
wiki: https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems
benchmarks: https://sites.google.com/site/powturbo/entropy-coder

It is used in compressors since 2014, like the currently default Apple LZFSE or great and free Facebook Zstd - it is 2-5x faster than gzip and provides much better compression:
https://github.com/facebook/zstd/
7-zip version: https://mcmilk.de/projects/7-Zip-zstd/
https://dl.dropboxusercontent.com/u/12405967/ZSTD.png

jedishrfu · Jan 13, 2017

Thanks for sharing. Are you using this coding in a project? What benefits are you seeing?

jarekduda · Jan 14, 2017

Benefits from zstd itself are huge: 2-5x faster compression than standard gzip and much better compression ratio - it has a real chance to finally replace the ancient gzip.
For example the Guardian is already officially using it: https://www.theguardian.com/info/de...-compression-iinovations-brotli-and-zstandard
And the family of such new generation compressors (ANS-based) is quickly growing, most are free and open-source:
http://encode.ru/threads/2078-List-of-Asymmetric-Numeral-Systems-implementations

I have experience with ANS coding, the basic concept is to encode information into a single natural number x, what is much more convenient for implementations than the standard arithmetic coding: encoding information into a range ( https://en.wikipedia.org/wiki/Arithmetic_coding ).
In numbers, in 2013 state-of-art arithmetic decoder processed ~50MB/s/core, now analogous task is made by ANS with ~1500MB/s ... we got ~30x implementation-only speedup for the basic transformation of information: the bottleneck of data compressors, and compression in modern world is practically default for all data.
benchmarks: https://sites.google.com/site/powturbo/entropy-coder
fastest open-source implementation: https://github.com/jkbonfield/rans_static

ANS uses philosophy that a natural number x contains lg(x) bits of information, what is equivalent to the Zipf law: Pr(x) ~ 1/x.
Symbol of probability p contains lg(1/p) bits of information, so adding this information to information already stored in x makes we should go to number (state) x' such that
lg(x') ~ lg(x) + lg(1/p)
or equivalently: x' ~ x/p.
Like x' = 2x + s in standard binary system, which is optimal for p = 1/2.

Huffman coding is currently being replaced with ANS coding

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

File Structure vs Data Structure

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight