Huffman coding is currently being replaced with ANS coding

  • Thread starter Thread starter jarekduda
  • Start date Start date
  • Tags Tags
    Coding
Click For Summary
Huffman coding, while fast, approximates probabilities using powers of 1/2, leading to inaccuracies in information representation, particularly for symbols with high probabilities. Arithmetic coding improves accuracy but is significantly slower. Asymmetric Numeral Systems (ANS) coding addresses these issues by providing a method that is both fast and accurate, making it a preferred choice in modern compression algorithms since 2014. ANS coding allows for encoding information into a single natural number, resulting in substantial speed improvements—up to 30 times faster than traditional arithmetic coding. Notable implementations include Apple’s LZFSE and Facebook’s Zstd, which offer compression speeds 2-5 times faster than gzip while achieving better compression ratios. The growing adoption of ANS-based compressors indicates a shift towards more efficient data compression methods, with many being open-source. The underlying principle of ANS aligns with the Zipf law, optimizing information storage and retrieval in data compression applications.
jarekduda
Messages
82
Reaction score
5
Huffman coding (e.g. A -> 010) is fast but approximates probabilities with powers of 1/2, while e.g. symbol of probability 0.99 carries only ~0.014 bits of information. This inaccuracy was repaired by arithmetic coding, but it is an order magnitude slower (more costly).
Above compromise has been recently ended with ANS coding, which is both fast(cheap) and accurate:
wiki: https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems
benchmarks: https://sites.google.com/site/powturbo/entropy-coder

It is used in compressors since 2014, like the currently default Apple LZFSE or great and free Facebook Zstd - it is 2-5x faster than gzip and provides much better compression:
https://github.com/facebook/zstd/
7-zip version: https://mcmilk.de/projects/7-Zip-zstd/
https://dl.dropboxusercontent.com/u/12405967/ZSTD.png
 
Last edited by a moderator:
  • Like
Likes Filip Larsen
Technology news on Phys.org
Thanks for sharing. Are you using this coding in a project? What benefits are you seeing?
 
Benefits from zstd itself are huge: 2-5x faster compression than standard gzip and much better compression ratio - it has a real chance to finally replace the ancient gzip.
For example the Guardian is already officially using it: https://www.theguardian.com/info/de...-compression-iinovations-brotli-and-zstandard
And the family of such new generation compressors (ANS-based) is quickly growing, most are free and open-source:
http://encode.ru/threads/2078-List-of-Asymmetric-Numeral-Systems-implementations

I have experience with ANS coding, the basic concept is to encode information into a single natural number x, what is much more convenient for implementations than the standard arithmetic coding: encoding information into a range ( https://en.wikipedia.org/wiki/Arithmetic_coding ).
In numbers, in 2013 state-of-art arithmetic decoder processed ~50MB/s/core, now analogous task is made by ANS with ~1500MB/s ... we got ~30x implementation-only speedup for the basic transformation of information: the bottleneck of data compressors, and compression in modern world is practically default for all data.
benchmarks: https://sites.google.com/site/powturbo/entropy-coder
fastest open-source implementation: https://github.com/jkbonfield/rans_static

ANS uses philosophy that a natural number x contains lg(x) bits of information, what is equivalent to the Zipf law: Pr(x) ~ 1/x.
Symbol of probability p contains lg(1/p) bits of information, so adding this information to information already stored in x makes we should go to number (state) x' such that
lg(x') ~ lg(x) + lg(1/p)
or equivalently: x' ~ x/p.
Like x' = 2x + s in standard binary system, which is optimal for p = 1/2.
 
Learn If you want to write code for Python Machine learning, AI Statistics/data analysis Scientific research Web application servers Some microcontrollers JavaScript/Node JS/TypeScript Web sites Web application servers C# Games (Unity) Consumer applications (Windows) Business applications C++ Games (Unreal Engine) Operating systems, device drivers Microcontrollers/embedded systems Consumer applications (Linux) Some more tips: Do not learn C++ (or any other dialect of C) as a...