Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Huffman coding is currently being replaced with ANS coding

  1. Jan 11, 2017 #1
    Huffman coding (e.g. A -> 010) is fast but approximates probabilities with powers of 1/2, while e.g. symbol of probability 0.99 carries only ~0.014 bits of information. This inaccuracy was repaired by arithmetic coding, but it is an order magnitude slower (more costly).
    Above compromise has been recently ended with ANS coding, which is both fast(cheap) and accurate:
    wiki: https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems
    benchmarks: https://sites.google.com/site/powturbo/entropy-coder

    It is used in compressors since 2014, like the currently default Apple LZFSE or great and free Facebook Zstd - it is 2-5x faster than gzip and provides much better compression:
    https://github.com/facebook/zstd/
    7-zip version: https://mcmilk.de/projects/7-Zip-zstd/
    https://dl.dropboxusercontent.com/u/12405967/ZSTD.png [Broken]
     
    Last edited by a moderator: May 8, 2017
  2. jcsd
  3. Jan 13, 2017 #2

    jedishrfu

    Staff: Mentor

    Thanks for sharing. Are you using this coding in a project? What benefits are you seeing?
     
  4. Jan 14, 2017 #3
    Benefits from zstd itself are huge: 2-5x faster compression than standard gzip and much better compression ratio - it has a real chance to finally replace the ancient gzip.
    For example the Guardian is already officially using it: https://www.theguardian.com/info/de...-compression-iinovations-brotli-and-zstandard
    And the family of such new generation compressors (ANS-based) is quickly growing, most are free and open-source:
    http://encode.ru/threads/2078-List-of-Asymmetric-Numeral-Systems-implementations

    I have experience with ANS coding, the basic concept is to encode information into a single natural number x, what is much more convenient for implementations than the standard arithmetic coding: encoding information into a range ( https://en.wikipedia.org/wiki/Arithmetic_coding ).
    In numbers, in 2013 state-of-art arithmetic decoder processed ~50MB/s/core, now analogous task is made by ANS with ~1500MB/s ... we got ~30x implementation-only speedup for the basic transformation of information: the bottleneck of data compressors, and compression in modern world is practically default for all data.
    benchmarks: https://sites.google.com/site/powturbo/entropy-coder
    fastest open-source implementation: https://github.com/jkbonfield/rans_static

    ANS uses philosophy that a natural number x contains lg(x) bits of information, what is equivalent to the Zipf law: Pr(x) ~ 1/x.
    Symbol of probability p contains lg(1/p) bits of information, so adding this information to information already stored in x makes we should go to number (state) x' such that
    lg(x') ~ lg(x) + lg(1/p)
    or equivalently: x' ~ x/p.
    Like x' = 2x + s in standard binary system, which is optimal for p = 1/2.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted



Similar Discussions: Huffman coding is currently being replaced with ANS coding
  1. Huffman Code. (Replies: 4)

  2. Code tutorials? (Replies: 4)

Loading...