Subtracting out a random variable

AI Thread Summary
The discussion centers on the challenge of finding a function g that allows for the independence of two random variables, Y and Z, derived from a discrete random variable X through a function f. It highlights that while certain configurations can achieve this independence, such as when X is uniformly distributed, other cases may only allow for functions g that do not capture the essence of "subtracting out" f(X). The conversation also explores the use of an additional random variable W to meet independence and uncertainty conditions, although this may increase the overall uncertainty in g(X,W). Examples illustrate various scenarios and the complexities involved in maintaining independence while maximizing uncertainty. The thread seeks further insights and references related to this concept of subtracting out random variables.
mXSCNT
Messages
310
Reaction score
1
"Subtracting out" a random variable

let X be a discrete R.V. and let Y = f(X) for some function f. I wish to find a function g, such that Y and Z = g(X) are independent, and also such that the uncertainty H(Z) is maximized. For example, suppose X is uniformly distributed over {0,1,2,3,4,5,6,7} and f(x) = 0 if x < 4, f(x) = 1 otherwise. Then if we let g(x) = x mod 4, g satisfies the requirements in this example. One could interpret Z as the distribution X from which the distribution Y has been "subtracted out." Encoding the values of X in binary as 000,001,010,011,100,101,110,111, we see that f extracts the left bit of X, and g extracts the remaining two bits.

However, this is not always possible; for example suppose X is uniformly distributed over {0,1,2} and f(x) = 1 if x == 2, f(x) = 0 otherwise. Then the only functions g such that g(X) is independent of f(X), are functions that map all of X to a single value, which does not capture the idea of "subtracting out" f(X). For one thing, one would like to be able to deduce the value of X by observing the values of f(X) and g(X), and that is not possible here.

As a compromise one could instead seek a function g such that if W is the joint distribution of f(X) and g(X), then H(X|W) = 0, and the mutual information I(f(X);g(X)) is minimized. But in general then, f(X) and g(X) would not be independent.

Any help would be appreciated--especially a pointer to other material that deals with "subtracting out" a random variable in a similar manner to this!
 
Last edited:
Physics news on Phys.org


In an extension of the problem, allow g(X,W) instead of simply g(X), for some discrete random variable W, independent from X. Then, seek g and W together, such that g(X,W) and f(X) are independent, and such that H(X|g(X,W), f(X)) = 0. If X is nonzero on only a finite number of values, this can always be done.

In the second example from above, we can solve it by letting W be uniformly distributed over {0,1}, and letting

g(x,w) = [x = 0] * 0 + [x = 1] * 1 + [x = 2] * w

where [] is the Iverson bracket: [P] = 1 if the proposition P is true, [P] = 0 if P is false.

From here on denote the level set \{x : f(x) = y\} by h(y). Note that the condition g(X,W) and f(X) are independent is equivalent to saying that for every y_1,y_2 in the image of f, and for every z in the image of g(h(Y),W), P(g(X,W) = z | X \in h(y_1)) = P(g(X,W) = z | X \in h(y_2)). Also, the condition H(X|g(X,W),f(X)) = 0 is equivalent to saying for each y in the image of f, for a fixed z, there is exactly one value of X such that X \in h(y) and P(g(X,W) = z) \neq 0.

This can be visualized as follows. Imagine that each value x of X corresponds to a rectangle of width 1 and height proportional to P(X=x). Partition the rectangles into groups h(y) for each y. Then we seek to "color" the rectangles in each group such that each color appears within exactly one rectangle for each group (it does not need to cover the entire rectangle) and such that the area of each color within a group, divided by the total area of its group, is constant over all the groups. Each color represents a possible value of g(X,W). Here is what such a diagram would look like for the preceding example:
attachment.php?attachmentid=17342&stc=1&d=1233184615.png

Let's take a slightly more complicated example. Suppose that X is uniformly distributed over {0,1,2,3,4} and f(X) = [X > 2]. (that is, f(X) = 1 if X > 2, f(X) = 0 otherwise). One can solve the problem in this case by letting W be uniformly distributed over {0,1,2,3,4,5} and letting g(x,w) = [x <= 2] * (2*x + w mod 2) + [x > 2] * (3*(x-3) + w mod 3). This is illustrated below.
attachment.php?attachmentid=17343&stc=1&d=1233184615.png

You may be able to see a pattern between these two examples. Now, somewhat more generally, suppose X is uniformly distributed over {1,2,...,n}, and the range of f(X) is {1,2,...,m}. Then let L = lcm_{y \in {1...m}} |h(y)|. Let W be uniformly distributed over {0,1,...,L-1}. Let r(x) = i iff x is the i'th smallest element of h(f(x)) (beginning at i=0). Now, define g(x, w) = r(x) * L / |h(f(x))| + (w mod |h(f(x))|); this solves the problem for the case when X is uniformly distributed.

However, it doesn't work for the following case: let X be distributed over {0,1,2,3} such that P(X=x) = \frac{1}{\sqrt{2} + 3}([x = 0] * \sqrt{2} + [x &gt; 0]), and f(x) = [x > 1]. This case can be solved by letting W be distributed over {0,1,2,3}, P(W=w) = \frac{1}{2(\sqrt{2} + 1)} ([w &lt;= 1] * \sqrt{2} + [w &gt; 1]), and letting g(x, w) = [x <= 1] * (2x + [w > 1])) + [x > 1] * (x - 2 + 2 * (w mod 2)), as illustrated below.
attachment.php?attachmentid=17344&stc=1&d=1233184615.png


Using the intuition from the preceding example, let's again suppose that X is only nonzero over {1, 2, ..., n}, and that the range of f(X) is {1,2,...m}, but relax the restriction that X must be uniformly distributed. Let Wy be independently distributed over h(y) such that P(W_y = x) = P(X=x | X \in h(y)). Let W \in \mathbb{Z}^m, W = W_1 \times W_2 \times ... \times W_m. Let W(i,x) \in \mathbb{Z}^m be W with its i'th coordinate replaced by x. Then, g(x, w) = W(f(x),x).Another question is, can this be extended to the case when X is nonzero on arbitrarily many values? I suspect that in this case W might have to be continuous.

The technique of using the extra random variable W to satisfy the conditions H(X|g(X,W), f(X)) = 0, I(g(X,W);f(X)) = 0, comes at a cost: the uncertainty in g(X,W) may become relatively large. It would be ideal if H(g(X,W)) + H(f(X)) = H(X), but much of the time this is not nearly the case.

Any commentary? I am especially looking for a place where similar material (about the subtraction of the effect of one random variable from another) has been treated elsewhere.
 

Attachments

  • fig1.png
    fig1.png
    812 bytes · Views: 383
  • fig2.png
    fig2.png
    1,008 bytes · Views: 489
  • fig3.png
    fig3.png
    777 bytes · Views: 516
Last edited:
I was reading documentation about the soundness and completeness of logic formal systems. Consider the following $$\vdash_S \phi$$ where ##S## is the proof-system making part the formal system and ##\phi## is a wff (well formed formula) of the formal language. Note the blank on left of the turnstile symbol ##\vdash_S##, as far as I can tell it actually represents the empty set. So what does it mean ? I guess it actually means ##\phi## is a theorem of the formal system, i.e. there is a...
Back
Top