StoneTemplePython said:
Basically yes -- so long as they are bona fide random variables (not defective or whatever). It is implicit in the 'tests' for mutual independence. A more direct way of getting at this is... do it yourself by writing out the joint distribution and marginalizing the random variables you are not interested in selecting.
By the way, to refine this a bit
the definition used in feller is that a collection of ##n## discrete random variables (i.e. finite set of them) are mutually independent if for any combination of values (X, y, ... , w) assumed by them
##Pr\{X = x, Y=y, ... W=w\} = Pr\{X =x\}Pr\{Y = y\}...Pr\{W=w\}##
note there is ##n## random variables above (not necessarily the length of the alphabet)
what I was suggesting is that it is enough to assume that you are interested in the first ##n-1## random variables but not the ##n##th one, but nevertheless know that they are mutually independent. (Use of induction/recursion then gives the result for the ##k## random variables you are actually interested in for ##k \in \{2, 3, ..., n-1\}##)
- - - -
Since the random variables are discrete, the sample space is discrete and there is a natural correspondence with integers. Our attention is really on the correspondence here with ##W##'s sample space.
So you may define events:
##A_1: = \text{event that X = x, Y=y, ..., W=}w^{(1)}##
##A_2: = \text{event that X = x, Y=y, ..., W=}w^{(2)}##
##A_3: = \text{event that X = x, Y=y, ..., W=}w^{(3)}##
and so on
- - - -
(some technical nits: the above may not be totally satisfying with respect to ordering though we could easily clean this up via the use of conditional probabilities-- i.e. we can divide each side by ##Pr\{X = x, Y=y, ...,V=v\}## when that probability is non-zero and the zero probability cases need a little more care but ultimately can be ignored -- i.e. the implication is that for any positive probability event of ##w## if the right hand side is zero, it must be because the product of n-1 terms is zero which gives the desired relation and our analysis of events is only concerned with the positive probability sample points of original ##w## as the zero probability events for ##w## have no impact in the marginalization process that we're doing here-- but I think that all this is obscures the argument too much.)
- - - -
Now we know the relationship, by assumption:
##Pr\{X = x, Y=y, ... W=w\} = Pr\{X =x\}Pr\{Y = y\}...Pr\{W=w\}##
where I introduce random variable ##V## as the (n-1)th one -- hopefully that doesn't confuse things.
so summing over ##w##, we have
##\Big(Pr\{A_1\}+Pr\{A_2\} + Pr\{A_3\}+... \Big)= \sum_w Pr\{X = x, Y=y, ... W=w\}##
##= \sum_w Pr\{X =x\}Pr\{Y = y\}...Pr\{V = v\}Pr\{W=w\}##
##= Pr\{X =x\}Pr\{Y = y\}...Pr\{V = v\}\sum_w Pr\{W=w\} ##
which, because summing over all w on the RHS sums to one, simplifies to
##\Big(Pr\{A_1\}+Pr\{A_2\} + Pr\{A_3\}+... \Big)= Pr\{X =x\}Pr\{Y = y\}...Pr\{V = v\}##
from here, apply Boole's Inequality (/ Union Bound) and see
##Pr\{X = x, Y=y, ..., V=v\} = Pr\{A_1 \cup A_2 \cup A_3 \cup ... \}\leq \Big(Pr\{A_1\}+Pr\{A_2\} + Pr\{A_3\}+... \Big)= Pr\{X =x\}Pr\{Y = y\}...Pr\{V = v\}##
But this must be an equality. Two ways to reason: look at the equality conditions for Boole's Inequality (mutually exclusive events). Alternatively sum over all X, Y, ..., V, and consider that if the inequality is strict for even one of those instances then you end up with ##1= \sum_x\sum_y ... \sum_v Pr\{X = x, Y=y, ..., V=v\} \lt 1## which is a contradiction.
So the end result is:
##Pr\{X = x, Y=y, ..., V=v\} = Pr\{X =x\}Pr\{Y = y\}...Pr\{V = v\}##
as desired