# Shannon Information Theory: Transducer Entropy doesn't increase

1. Nov 15, 2014

### VVS

HI,

I am reading Shannon's paper on Theory of Communication and I having trouble with a concept.
Shannon writes:

The output of a finite state transducer driven by a finite state statistical source is a finite state statistical source, with entropy (per unit time) less than or equal to that of the input. If the transducer is non-singular they are equal.
Let α represent the state of the source, which produces a sequence of symbols xi; and let β be the state of the transducer, which produces, in its output, blocks of symbols yj. The combined system can be represented by the "product state space" of pairs (α,β). Two points in the space (α1,β1)and (α2,β2), are connected by a line if α1 can produce an x which changes β1 to β2, and this line is given the probability of that x in this case. The line is labelled with the block of yj symbols produced by the transducer. The entropy of the output can be calculated as the weighted sum over the states. If we sum first on β each resulting term is less than or equal to the corresponding term for α, hence the entropy is not increased. If the transducer is non-singular, let its output be connected to the inverse transducer. If H′1, H′2 and H′3 are the output entropies of the source, the first and second transducers respectively, then H′1≥H′2≥H′3=H′1 and therefore H′1=H′2.

I am not able to show the decrease or equality in entropy mathematically. This is what I have got:

$H(y|\beta)=-\sum_{i,j}P(\beta_i)P(y_j|\beta_i)log\left[P(y_j|\beta_i)\right]$
$H(y|\beta)=-\sum_{j}P(\beta_1)P(y_j|\beta_1)log\left[P(y_j|\beta_1)\right]+P(\beta_2)P(y_j|\beta_2)log\left[P(y_j|\beta_2)\right]+..=\sum_iH(y|\beta_i)$
Now assume that there exist states $\alpha_k$ with output $x_{l}^{k}$ which cause the transition from $\beta_1 \rightarrow \beta_2$
Then entropy of those states which cause this particular transition with a particular input is:
$H(\beta_1\rightarrow\beta_2|\alpha_k)=H(x_l^k|\alpha_k)=-\sum_{k,l}P(\alpha_k)P(x_{l}^{k}|\alpha_k)log\left[P(x_{l}^{k}|\alpha_k)\right]$

My guess is that there is exists a relation between $P(\beta_i)P(y_j|\beta_i)$ and $P(\alpha_k)P(x_{l}^{k}|\alpha_k)$ but I just can't see it.

I forgot to add that the output of the transducer and the next state of the transducer are determined by these functions:
$y_n=f(x_n,\beta_n)$
$\beta_{n+1}=g(x_n,\beta_n)$
But Shannon doesn't mention whether these mappings are bijective or not.

Last edited: Nov 15, 2014
2. Nov 20, 2014