Huffman coding is currently being replaced with ANS coding

  • Thread starter Thread starter jarekduda
  • Start date Start date
  • Tags Tags
    Coding
Click For Summary
SUMMARY

Huffman coding is being replaced by Asymmetric Numeral Systems (ANS) coding due to its superior speed and accuracy. While Huffman coding approximates probabilities and can lead to inefficiencies, ANS coding offers a significant performance boost, achieving speeds of approximately 1500MB/s per core compared to 50MB/s for traditional arithmetic coding. ANS has been integrated into modern compressors such as Apple LZFSE and Facebook Zstd, which provide 2-5 times faster compression than gzip. The Guardian has adopted Zstd, highlighting its growing acceptance in the industry.

PREREQUISITES
  • Understanding of Huffman coding and its limitations
  • Familiarity with arithmetic coding principles
  • Knowledge of data compression techniques
  • Basic programming skills for implementing compression algorithms
NEXT STEPS
  • Research the implementation details of ANS coding
  • Explore the performance benchmarks of Zstd compared to gzip
  • Learn about the integration of ANS coding in various data compression libraries
  • Investigate the mathematical foundations of information theory related to ANS
USEFUL FOR

Software developers, data compression engineers, and anyone involved in optimizing data storage and transmission will benefit from this discussion on ANS coding and its advantages over traditional methods.

jarekduda
Messages
82
Reaction score
5
Huffman coding (e.g. A -> 010) is fast but approximates probabilities with powers of 1/2, while e.g. symbol of probability 0.99 carries only ~0.014 bits of information. This inaccuracy was repaired by arithmetic coding, but it is an order magnitude slower (more costly).
Above compromise has been recently ended with ANS coding, which is both fast(cheap) and accurate:
wiki: https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems
benchmarks: https://sites.google.com/site/powturbo/entropy-coder

It is used in compressors since 2014, like the currently default Apple LZFSE or great and free Facebook Zstd - it is 2-5x faster than gzip and provides much better compression:
https://github.com/facebook/zstd/
7-zip version: https://mcmilk.de/projects/7-Zip-zstd/
https://dl.dropboxusercontent.com/u/12405967/ZSTD.png
 
Last edited by a moderator:
  • Like
Likes   Reactions: Filip Larsen
Technology news on Phys.org
Thanks for sharing. Are you using this coding in a project? What benefits are you seeing?
 
Benefits from zstd itself are huge: 2-5x faster compression than standard gzip and much better compression ratio - it has a real chance to finally replace the ancient gzip.
For example the Guardian is already officially using it: https://www.theguardian.com/info/de...-compression-iinovations-brotli-and-zstandard
And the family of such new generation compressors (ANS-based) is quickly growing, most are free and open-source:
http://encode.ru/threads/2078-List-of-Asymmetric-Numeral-Systems-implementations

I have experience with ANS coding, the basic concept is to encode information into a single natural number x, what is much more convenient for implementations than the standard arithmetic coding: encoding information into a range ( https://en.wikipedia.org/wiki/Arithmetic_coding ).
In numbers, in 2013 state-of-art arithmetic decoder processed ~50MB/s/core, now analogous task is made by ANS with ~1500MB/s ... we got ~30x implementation-only speedup for the basic transformation of information: the bottleneck of data compressors, and compression in modern world is practically default for all data.
benchmarks: https://sites.google.com/site/powturbo/entropy-coder
fastest open-source implementation: https://github.com/jkbonfield/rans_static

ANS uses philosophy that a natural number x contains lg(x) bits of information, what is equivalent to the Zipf law: Pr(x) ~ 1/x.
Symbol of probability p contains lg(1/p) bits of information, so adding this information to information already stored in x makes we should go to number (state) x' such that
lg(x') ~ lg(x) + lg(1/p)
or equivalently: x' ~ x/p.
Like x' = 2x + s in standard binary system, which is optimal for p = 1/2.