Subtracting out a random variable

mXSCNT
Messages
310
Reaction score
1
"Subtracting out" a random variable

let X be a discrete R.V. and let Y = f(X) for some function f. I wish to find a function g, such that Y and Z = g(X) are independent, and also such that the uncertainty H(Z) is maximized. For example, suppose X is uniformly distributed over {0,1,2,3,4,5,6,7} and f(x) = 0 if x < 4, f(x) = 1 otherwise. Then if we let g(x) = x mod 4, g satisfies the requirements in this example. One could interpret Z as the distribution X from which the distribution Y has been "subtracted out." Encoding the values of X in binary as 000,001,010,011,100,101,110,111, we see that f extracts the left bit of X, and g extracts the remaining two bits.

However, this is not always possible; for example suppose X is uniformly distributed over {0,1,2} and f(x) = 1 if x == 2, f(x) = 0 otherwise. Then the only functions g such that g(X) is independent of f(X), are functions that map all of X to a single value, which does not capture the idea of "subtracting out" f(X). For one thing, one would like to be able to deduce the value of X by observing the values of f(X) and g(X), and that is not possible here.

As a compromise one could instead seek a function g such that if W is the joint distribution of f(X) and g(X), then H(X|W) = 0, and the mutual information I(f(X);g(X)) is minimized. But in general then, f(X) and g(X) would not be independent.

Any help would be appreciated--especially a pointer to other material that deals with "subtracting out" a random variable in a similar manner to this!
 
Last edited:
Physics news on Phys.org


In an extension of the problem, allow g(X,W) instead of simply g(X), for some discrete random variable W, independent from X. Then, seek g and W together, such that g(X,W) and f(X) are independent, and such that H(X|g(X,W), f(X)) = 0. If X is nonzero on only a finite number of values, this can always be done.

In the second example from above, we can solve it by letting W be uniformly distributed over {0,1}, and letting

g(x,w) = [x = 0] * 0 + [x = 1] * 1 + [x = 2] * w

where [] is the Iverson bracket: [P] = 1 if the proposition P is true, [P] = 0 if P is false.

From here on denote the level set \{x : f(x) = y\} by h(y). Note that the condition g(X,W) and f(X) are independent is equivalent to saying that for every y_1,y_2 in the image of f, and for every z in the image of g(h(Y),W), P(g(X,W) = z | X \in h(y_1)) = P(g(X,W) = z | X \in h(y_2)). Also, the condition H(X|g(X,W),f(X)) = 0 is equivalent to saying for each y in the image of f, for a fixed z, there is exactly one value of X such that X \in h(y) and P(g(X,W) = z) \neq 0.

This can be visualized as follows. Imagine that each value x of X corresponds to a rectangle of width 1 and height proportional to P(X=x). Partition the rectangles into groups h(y) for each y. Then we seek to "color" the rectangles in each group such that each color appears within exactly one rectangle for each group (it does not need to cover the entire rectangle) and such that the area of each color within a group, divided by the total area of its group, is constant over all the groups. Each color represents a possible value of g(X,W). Here is what such a diagram would look like for the preceding example:
attachment.php?attachmentid=17342&stc=1&d=1233184615.png

Let's take a slightly more complicated example. Suppose that X is uniformly distributed over {0,1,2,3,4} and f(X) = [X > 2]. (that is, f(X) = 1 if X > 2, f(X) = 0 otherwise). One can solve the problem in this case by letting W be uniformly distributed over {0,1,2,3,4,5} and letting g(x,w) = [x <= 2] * (2*x + w mod 2) + [x > 2] * (3*(x-3) + w mod 3). This is illustrated below.
attachment.php?attachmentid=17343&stc=1&d=1233184615.png

You may be able to see a pattern between these two examples. Now, somewhat more generally, suppose X is uniformly distributed over {1,2,...,n}, and the range of f(X) is {1,2,...,m}. Then let L = lcm_{y \in {1...m}} |h(y)|. Let W be uniformly distributed over {0,1,...,L-1}. Let r(x) = i iff x is the i'th smallest element of h(f(x)) (beginning at i=0). Now, define g(x, w) = r(x) * L / |h(f(x))| + (w mod |h(f(x))|); this solves the problem for the case when X is uniformly distributed.

However, it doesn't work for the following case: let X be distributed over {0,1,2,3} such that P(X=x) = \frac{1}{\sqrt{2} + 3}([x = 0] * \sqrt{2} + [x &gt; 0]), and f(x) = [x > 1]. This case can be solved by letting W be distributed over {0,1,2,3}, P(W=w) = \frac{1}{2(\sqrt{2} + 1)} ([w &lt;= 1] * \sqrt{2} + [w &gt; 1]), and letting g(x, w) = [x <= 1] * (2x + [w > 1])) + [x > 1] * (x - 2 + 2 * (w mod 2)), as illustrated below.
attachment.php?attachmentid=17344&stc=1&d=1233184615.png


Using the intuition from the preceding example, let's again suppose that X is only nonzero over {1, 2, ..., n}, and that the range of f(X) is {1,2,...m}, but relax the restriction that X must be uniformly distributed. Let Wy be independently distributed over h(y) such that P(W_y = x) = P(X=x | X \in h(y)). Let W \in \mathbb{Z}^m, W = W_1 \times W_2 \times ... \times W_m. Let W(i,x) \in \mathbb{Z}^m be W with its i'th coordinate replaced by x. Then, g(x, w) = W(f(x),x).Another question is, can this be extended to the case when X is nonzero on arbitrarily many values? I suspect that in this case W might have to be continuous.

The technique of using the extra random variable W to satisfy the conditions H(X|g(X,W), f(X)) = 0, I(g(X,W);f(X)) = 0, comes at a cost: the uncertainty in g(X,W) may become relatively large. It would be ideal if H(g(X,W)) + H(f(X)) = H(X), but much of the time this is not nearly the case.

Any commentary? I am especially looking for a place where similar material (about the subtraction of the effect of one random variable from another) has been treated elsewhere.
 

Attachments

  • fig1.png
    fig1.png
    812 bytes · Views: 381
  • fig2.png
    fig2.png
    1,008 bytes · Views: 486
  • fig3.png
    fig3.png
    777 bytes · Views: 515
Last edited:
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Back
Top