# Network information flow

1. Feb 18, 2009

### mXSCNT

Usually in information theory (at least the introductory version I have learned) one is concerned with the amount of information transmitted over a single channel, from point A to point B. Suppose that this is generalized to a Bayesian network, where each directed edge of the network is viewed as a noiseless channel that transmits information about its start point to its end point. For example, consider the following network:
Code (Text):
A --> B
A is uniform over {0,1}, B is another random variable over {0,1} where P(B=0|A=0)=0.1 and P(B=0|A=1)=0.5. At B, we wish to determine as much as we can about the value of A, based on our observation of B. One might call this the information that flowed over the edge from A to B, and can be viewed as the reduction in uncertainty of A, given B: I(A;B) = H(A)-H(A|B).

However, it's not as clear when you ask different questions. How much information flowed from B to A? I(A;B) = I(B;A) is not zero, but due to the direction of the edge, no information could have flowed from B to A; we are simply able to infer B based on A. Suppose we have this network instead:
Code (Text):

A --> B
\--> C

No information could have flowed from B to C because there is no path from B to C, but I(B;D) is nonzero. Or this network:
Code (Text):

A ------> C
\--> B --^

Some information flows from B to C along the edge (B,C). But I(B;C) is greater than that, because B also tells us about A, which indirectly tells us about C, although A->B->C is not a valid path.

One thing one could do is to define the information flow between two variables in a Bayesian network, from A to B, as I(A';B') where A',B' are the corresponding random variables in the network from which all edges not reachable from B have been deleted. Deleting an edge (Xi,Xj) consists of replacing P(Xi | Xj, ...) with P(Xi | ...). However, I'm having trouble deriving any identities about this. Specifically I want to write flow(A,B) in terms of flow(A,R) and flow(R,B) where R is a parent of B. I can do this but it does not look nice. One useful identity in this process is $$P(B=b | A=a) = \sum_{r \in R} P(R=r | A=a) P(B=b | R=r)$$ if R is the only parent of B and there is a path from A to B. This identity can be extended to more than one parent by letting R be the Cartesian product of all parents.

Is information flow in a Bayesian network an established subject in information theory?