Main Question or Discussion Point
Suppose you have a Markov source of symbols (or more generally, a stationary ergodic source). If you Huffman encode longer and longer extensions (that is, you take blocks of n source symbols at a time, letting n increase) of this source, then the frequencies of 1's and 0's in the encoded message tend to 1/2 each. How do you prove this? I have an idea of why it is so. Namely, if the frequencies of 1's and 0's were very different, you could Huffman encode the coded message taken in blocks of k << n, and thereby obtain an even shorter code for the original message. But the original code was as short as possible by construction (the Huffman procedure), a contradiction. But this isn't really a proof. It sort of assumes that the expectation of a product is the product of the expectations, which is not generally true.