Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Paradox of uniform prob. distribution on Z+ vs cardinality, etc.

  1. Sep 27, 2012 #1

    Stephen Tashi

    User Avatar
    Science Advisor

    There are no actual paradoxes in mathematics (we hope). There are only things that appear to be paradoxes to fallible intuitions. Here is one that bothers me.

    There can be no translation invariant probability measure on the non-negative integers. Yet it is possible to imagine a general class of sampling processes that "ought" to have a particular implementation that picks a non-negative integer at random and favors no particular integer.

    ( I'm aware that number theorists who want to talk about "picking a integer at random" have defined various pseudo-measures or "asymptotic measures" such as http://en.wikipedia.org/wiki/Natural_density. In case anyone was about to post that link, my question isn't about that topic. I'm not asking about how to pick an integer at random. I'm asking about a paradox. )

    On the one hand, there can be no "uniform" probability distribution on the non-negative integers in the sense of a probability distribution satisfying 1) A singleton integer is a measureable set and 2) The measure of any measureable set is the same as that of any other set formed by adding a constant to each of the first set's members.

    The impossibility follows from the countable additivity of measures, which implies the measure of the whole space would be the sum of the measures of each integer. We can't find any constant probability p that makes the sum add up to 1.

    (This does not rule-out the possibility that there could be a translation invariant measure on the non-negative integers where a singleton integer is not a measureable set.)

    On the other hand, consider the following general type of sampling process on the non-negative integers (which I mentioned in the thread https://www.physicsforums.com/showthread.php?t=637049 ).

    The existence of the function F is established by the fact that the cardinality of the real numbers in [0,1] is the same as the cardinality of the set M.

    Since there exists one such function F, we could create arbitrarily many other functions that accomplish a 1-1 mapping between M and [0,1] by exchanging the y-values among the (x,y) ordered pairs in F.

    However, the non-existence of any translation invariant measure on the non-negative integers says that one or more of the following counter-intuitive ideas must be true.

    1. Perhaps any F that exists would create a process that favored some integer more than another. This is counterintuitive because 1-1 mappings that are used in cardinality arguments are not restricted by an concerns about measure or metrics. They can be quite arbitrary. Why should there be some fundamental asymmetry in what we can accomplish with them? - especially since we are picking f= F(x) by using a uniform distribution for x on [0,1].

    2. Perhaps any F that resulted in a translation invariant probability distribution would imply a probability measure where a singleton integer is not a measureable set. This is counterintuitive since the process is described in terms of selecting a single integer as a sample.

    3. Perhaps there is a fundamental logical paradox in assuming that exact samples can be drawn from a uniform distribution on the real numbers. In practical terms it is impossible to select such a sample, but is there any contradiction involved in assuming it can be done theoretically?

    4. Perhaps the assumption that "there exists an F" is not equivalent to saying that an F can ever be explicitly defined. To do a particular sampling process, we must define F explicitly. Can there be things that theoretically exist but that are also theoretically impossible to specify explicitly?
  2. jcsd
  3. Sep 27, 2012 #2
    A singleton integer is a measurable set, and its measure is zero.

    A set which is not measurable has no defined measure. Such sets are rather exotic and difficult to construct. They are basically peculiar exceptions whose exclusion is of no great practical interest.

    As for the rest, I'll repeat myself in saying that your manipulations don't really change anything.
  4. Sep 27, 2012 #3

    Stephen Tashi

    User Avatar
    Science Advisor

    There is no axiom of measure theory that requires that a singleton set in a measure space be a measureable set.
  5. Sep 27, 2012 #4


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    That would be the natural inference from the fact that a uniform distribution on Z does not exist.

    One obvious problem with your intuition is that the uniformness of the uniform distribution on [0,1] has any bearing on the behavior of the distribution you have defined on M.

    Maybe the error will be exposed if you try using the same idea to pick from [1,2] instead of from M. That is, pick any bijection F from [0,1] to [1,2]. Then, define a distribution on [1,2] that corresponds to choosing a uniform sample on [0,1] and then applying F.

    Another way to repair your intuition might be to rephrase what you are actually doing: you are really just choosing a more-or-less arbitrary function G(r,n) that takes a real number r and an integer n and produces an integer. Then, you are sampling from the integers by choosing r and n from your specified distributions then applying G.

    (the only property you have required G to have is that, for each r and m, G(r,n)=m has a unique solution for n)
  6. Sep 27, 2012 #5
    I'm not going to waste any more time giving you useful information you reject.
  7. Sep 27, 2012 #6
    I didn't follow the other thread but I'm trying to catch up here. You choose k non-uniformly and x uniformly. You let f = F-1(x) and then n = f(k). So how do we know that this way of choosing n is uniform? The numbers n = f(k) will have the same distribution as G.

    All you're doing is taking a non-uniform distribution on the natural numbers and then permuting the natural numbers. A permutation is just a renaming. You haven't gotten rid of the non-uniformity. You've just chosen according to G and then agreed to call each number by a different number's name. The distribution's identical.
  8. Sep 28, 2012 #7

    Stephen Tashi

    User Avatar
    Science Advisor

    Not in general. I'll try to clarify that.

    I agree that the class of sampling methods I propose may behave in various was in regards to the distribution they produce and nothing guarantees that the distribution a particular method produces will be translation invariant. I'm merely stating that its counterintuitive that no particular example of this method can accomplish a translation invariant distribution because we have so much freedom of choice in creating such methods.

    One way to think of this class of methods is that we pick an integer and then we pick a renaming of the integers "at random" from a set of many different renamings. - i.e. we don't always use the same renaming.

    We pick in integer k from the distribution G. We pick an x in [0,1]. Using F, this defines a map of the integers to themselves. The single map f = F(x) is a renaming of the integers. Let v = f(k} The conditional probability of the sample value v given the value of x is simply the value of the probability density G at k since k is renamed as v. However, there are more ways to realize a value of v than to use one particular x and it's associated renaming.

    Computing the probability of getting a sample value of v is done this way

    Pr(sample value = v) = [itex] \sum_{k=0}^\infty [/itex] pr( k is picked from G) pr( k is mapped.
    to v by f = F(x)).

    Pr(k is mapped to v) = [itex] \int_{S_k} 1 dx [/itex] where [itex] S_k [/itex] is the set of all x in the unit interval such that [itex] f = F(x) [/itex] is a renaming of the integers that renames k as v.

    That result (that all such sampling methods produced a non-uniform distribution on the non-negative integers) would require the least adjustment to the way I (and the world) think about taking random samples! However, my current belief is that not all descriptions of taking random samples that trip lightly off one's tongue actually define a random variable that has a probability distribution.

    Thinking about what you said, yes the class of sampling methods could be generalized. It could expand to be the following.

    Let F be a surjective mapping from the interval [0,1] to the set M of 1-to-1 mappings of the non-negative integers onto themselves. (no need to make F 1-to 1) Pick k from some discrete distribution defined on a subset of the non-negative integers. (You could even pick k to be a constant!) Pick x from some continuous distribution w(x) defined on [0,1] (You're correct that there is nothing special about the uniform distribution). Let f = F(x). Set the value of the sample to be v = f(k).

    In computing pr( sample is v) using the sum defined above, I think there are the following possibilities:

    1. For all values of v, the sets [itex] S_k [/itex] are each measureable and some have non-zero measure. The result is a non-translation invariant distribution for the random variable v on the non-negative integers.

    2. For all values of v, the sets [itex] S_k [/itex] are each measureable and each has measure 0. Each value v has zero probabiltiy. The random variable v has no probabilty distribution.

    3. For some values of v, some of the sets [itex] S_k [/itex] are not measureable. I don't see how v can have a probability distribution in this case as long as we treat singleton sets {v} as measureable. However, if we drop that requirement and use summations like the one above to compute the probabilty of some larger subset of the non-negative integers, we might wipe-out the unmeasureability involved in the inegrations because we would be integrating over unions of the various [itex] S_k [/itex]. I don't know what measure theory says about finite or countable unions of un-measureable sets. Are they also un-measureable?

    Maybe the facts about "measureable functions" in measure theory tell us that we aren't permitted to defining a process of sampling using just any arbitrary function of a random variable.
  9. Sep 28, 2012 #8
    Ok, I've understood that much then.

    Now I'm troubled. For fixed f and fixed v,

    pr( k is mapped to v by f = F(x))

    must be 1 for exactly one value of k; and zero for all the others. That's how we defined f. Once you choose x, you fix a permutation f; then f(k) is either v or not. It's v for exactly one value of k.

    You've introduced some notational confusion by using k as a constant and also as the variable in the summation. I think if you used j as the variable in the summation, this point would be more clear. There's exactly one value of j such that f(j) = v; and that value is j = k. For all other values of j, the probability that f(j) = v is zero, and those terms of the summation drop out.

    I can't help feeling that your use of permutations of the natural numbers is adding a level of complexity here for no purpose. No renaming/permutation can possibly change any probability distribution on a set.

    I found this, is it helpful? It's titled, "Uniform Distributions on the Natural Numbers." Of course you can't do exactly that, but the author replaces countable additivity with finite additivity in order to make progress.

    Last edited: Sep 28, 2012
  10. Sep 28, 2012 #9

    Stephen Tashi

    User Avatar
    Science Advisor

    No, because the sampling process involves two random choices per sample. We pick k at random and we pick x at random for each sample. A sample value of v can result from many different possible combinations of k's and x's. We aren't picking one value of x and using in generating all samples. We pick a new x for each new sample.

    From Hurkyl's suggestions (and your doubts about merely permuting the integers), I've come up with this variation of my inuititive conflict:

    Suppose we begin with the following sampling process:

    Let F be a fixed surjection from [0,1] onto M where M is the set of 1-1 mappings of the non-negative integers onto themselves. Let U be the uniform distribution on the real numbers in [0,1].

    Process S1:
    For each sample, set k = 1 and draw x at random from U. Then f = F(x) is some mapping of the non-negative integers to themselves. Let the value of the sample be v = f(k).

    If S1 produces a non-translational invariant measure on the integers, we try adjusting the output, so to speak, by creating different sampling processes. Instead of setting k = 1, we could pick k from some discrete distribution ( - from S1, for example) and we could pick x from some non-uniform distribution on [0,1]. We could also try various F's.

    Hence it's tempting to think that one could eventually find some sampling process that treated all integers on the same footing.

    Perhaps what happens is that most F's that one can find result in sampling processes where the samples have no probability distribution because the sets [itex] S_k [/itex] either all have zero measure or present us with some un-measureable sets. If so, I find it disturbing that one can create a sampling process that is "well defined" as procedure, but whose samples have no probability distribution.
    Last edited: Sep 28, 2012
  11. Sep 28, 2012 #10
    You're engaging in speculation that after performing a bunch of random transformations of probability distributions you will somehow arrive at a uniform distribution over the integers. But you have not given any proof of this, you're simply claiming it for no reason. If you really think it's possible, stop being vague about properties of "F" and actually provide such an F (it's not hard to explicitly construct such an F). Then you'll clearly see that it doesn't work. It actually would be an interesting exercise to construct the F and figure out what the distribution would be - why not try it.

    In any case, a uniform probability distribution over the integers is not possible since if each integer is assigned a probability greater than 0, the sum of their probabilities would be infinite.
  12. Sep 28, 2012 #11
    If one renaming makes no difference, how can a lot of renamings?

    Let's forget math for a moment and consider a visualization. We have a room full of people. Each person wears a name tag, and all the name tags are different. In fact each person's name tag bears a unique natural number. Like in the old tv show The Prisoner. One person is number 1. One person is number 6. And so forth.

    Now we ask everyone their height and we determine the statistical distribution of all the heights of all the people in the room.

    Now we have everyone toss their name tag into a hat; and then we have everyone pick a new name tag. And then we re-determine the distribution of heights. It's blindingly obvious that this can not make the slightest difference in the distribution of heights.

    Ok that's your original situation. Now in your S1 process you do this:

    You take a random person and get their height. Then you do the hat thing with the nametags so everyone has a new nametag. You take another random person and record their height. Then everyone gets a new nametag, etc.

    The distribution of heights is not going to change. The switching of name tags before each random selection is not going to have the slightest effect on the distribution of heights as determined by a random sample.

    And if you imagine the population to be infinite, that won't change. Continually renaming the members of the population before choosing a random sample can not possibly have any effect on the distribution of heights.

    This is how I'm understanding your S1 experiment. Permuting the natural numbers does not make the slightest difference in anything.

    And if you measure the height of Number One, then permute the name tages and measure the height of Number One again, over and over, you are still going to end up with the same original distribution of heights. The permutations themselves serve as a randomizing mechanism. Everyone has an equal chance to be Number One after permuting the name tags.

    Remember your choice of permutations is done by using a uniform distribution on [0,1].

    Now at this point you keep claiming that something about the bijection between [0,1] and the set of permutations might introduce some new distribution. I fail to see how or why you think this. A bijection, just like a permutation, is just a renaming. Each permutation f gets renamed as a real number x between 0 and 1. So I have the permutation f1/2, the permutation f1/e, and so forth. The bijection doesn't have magic properties that allow you to somehow construct a uniform distribution on the natural numbers.

    You keep saying you have an intuition that it does; and I don't doubt that. But you haven't shown the mathematical path to your intuition.
    Last edited: Sep 28, 2012
  13. Sep 28, 2012 #12

    Stephen Tashi

    User Avatar
    Science Advisor

    I'm not claiming it. I'm merely trying to understand why it doesn't work. I'd be delighted to see a specific F if you have one.
  14. Sep 28, 2012 #13


    User Avatar
    Science Advisor
    Gold Member
    2017 Award

    Take a simpler case. fix an integer - says minus 5. Choose a bijection of the integers by you method. Then just choose the value of that function on minus 5.

    This seems the same as just randomly sampling integers without bias.
  15. Sep 28, 2012 #14
    Just to clarify a bit of confusing terminology here ... Stephen's talking about permutations of the nonnegative or positive (makes no difference) integers, ie the natural numbers [itex]\mathbb{N}[/itex].

    One reason this is convenient is that we have an obvious bijection between the subsets of [itex]\mathbb{N}[/itex] and the unit interval. For a given subset, create a binary decimal with 1 in the n-th position if and only if n is in the subset. This gives us an easily visualizable bijection between subsets of [itex]\mathbb{N}[/itex] and [0,1]. As always we have to wave our hands at the countable set of numbers with repeating binary decimals, ie .1111... = 1.0 but we always convince ourselves this doesn't matter. I assume there's a rigorous fix for this problem somewhere.

    In any event it would be better to talk about [itex]\mathbb{N}[/itex] rather than [itex]\mathbb{Z}^+[/itex] so as to avoid confusion.
  16. Sep 28, 2012 #15

    Stephen Tashi

    User Avatar
    Science Advisor

    I'll seize your example to state my questions!

    Consider a property of the name tags such as "the name tag is an even number" and an associated statistical property of the set of name tags such as "70% of the name tags are even numbers". My questions concern the properties of the name tags, not the properties of the people. However, let's also mention a property of the people ( like the person's weight) and an associated statistical property of the set of people, such as "80% of the people weigh over 100 lbs".

    We are to randomly sample the set of people The boss has assigned Gerald to pick the samples and he calls them out by their name tags. We know Gerald is biased toward picking name tags that are odd numbers. That might bias our estimate both of the statistical properties of the name tags and the statistical properties of the people ( for example suppose people with odd numbered name tags tend to weigh less than 100 lbs).

    We ask the boss to replace Gerald with someone who is unbiased. The boss says "No, can't do that. We can't hire unbiased samplers. They don't exist. It's because of countable additivity and stuff like that. You'll have to make do with Gerald. Use math or something to fix it."

    To accomplish a random sampling of a property of the people (such as weight), we can randomly re-assign the name tags each time BEFORE Gerald picks a person. I think your point is that one random reassignment per sample is enough and perhaps that we only need to do a single the random reassignment before taking all the samples. This method does not remove Gerald's bias toward picking name tags with odd numbers. It doesn't randomly sample the properties of the set of name tags.

    If we want to randomly sample the properties of the name tags, we could introduce an intermediary between Gerald and picking the sample. AFTER Gerald picks a name tag k, we could randomly pick a mapping of the name tags onto themselves and call out the name tag v that k mapped to. (This makes Gerald rather redundant, but keeps him on the payroll.)

    The method I describe for seeking a translation invariant or "uniform" distribution on the non-negative integers is analogous to attempting to remove the bias in how a non-uniform distribution Gerald selects samples by using a random reassignment of name tags AFTER Gerald has made his selection. What we can measure from the sample v is only the properties of the name tag. If you wanted to made an analogy to people's weights, you'd have to add more structure to the problem, like a having an associated sequence of weights {W_i }.

    Why does the method of reassigning names, which works well for removing bias in sampling finite sets (and is actually used in practical statistical sampling), break down when we attempt to apply it to countably infinite sets?
    Last edited: Sep 28, 2012
  17. Sep 28, 2012 #16


    User Avatar
    Science Advisor
    Gold Member
    2017 Award

    you missed my point. I do not see the difference between the simple case and the case he is proposing. It seems that the processes are the same - you are sampling integers without bias.
  18. Sep 28, 2012 #17
    Interesting example, though the fundamental issue is that the above process does not necessarily define a measurable function of random variables (showing such might be tricky), so F(U)(N) is not necessarily a random variable and any subsequent statements about probabilities will not make sense.
  19. Oct 4, 2012 #18
    Well you asked for it, here it is. It was a good deal harder to do than I expected. It took a few wrong turns producing invalid F's (that either weren't surjective, or weren't injective) before I found a solution. Reminder:

    First let b be the infinite string of binary digits representing a real number r between 0 and 1 (including 0, but not including 1). For example if r = 0.1101101000011010111... then b = 1101101000011010111.... We adopt the convention that b does NOT end in an infinite series of 1s - because any real number that ends in an infinite series of 1s has an alternate representation that does NOT end in an infinite series of 1s. for example 0.0111111... = 0.100000000... Therefore, b contains an infinite number of 0s.

    First, we map b to an infinite series of positive integers [itex]a_i[/itex]. Let p_1 be the position in b of the (i-1)'th 0 in b (or p_1 = 0 if i = 1), and let p_2 be the position in b of the i'th 0 in b. Then define [itex]a_i = p_2 - p_1[/itex].

    For example, if b = 10011000... (ending in an infinite series of 0s) then:
    [itex]a_1[/itex] = 2 - 0 = 2
    [itex]a_2[/itex] = 1
    [itex]a_3[/itex] = 1
    [itex]a_4[/itex] = 3
    [itex]a_5, a_6, a_7,[/itex] etc. = 1

    Now, to construct m from the [itex]a_i[/itex], we loop over the [itex]a_i[/itex] building m.

    Define a sequence of partial functions [itex]m_i[/itex] (corresponding to our incremental construction of m). Each [itex]m_i[/itex] can be considered as a set of ordered pairs of integers. Define [itex]D_i[/itex] to be the support of [itex]m_i[/itex] (the set of values where it is defined), and [itex]R_i[/itex] to be the image of [itex]D_i[/itex] under [itex]m_i[/itex]. [itex]m_0[/itex] is empty.

    Given [itex]a_0, a_1, ..., a_i, a_{i+1}[/itex] and [itex]m_0, m_1, ..., m_i[/itex], we now construct [itex]m_{i+1}[/itex].

    Define a function f(S, v) where S is a set of positive integers and v is a positive integer. f(S, v) gives the v'th smallest integer NOT an element of S.

    Let [itex]d_{i+1} = f(D_i, 1)[/itex]
    Let [itex]r_{i+1} = f(R_i, 1)[/itex]
    CASE 1: If [itex]d_{i+1} < r_{i+1}[/itex], then [itex]m_{i+1}[/itex] is the same as [itex]m_i[/itex] except for the additional pair [itex](d_{i+1}, f(R_i, a_{i+1}))[/itex]
    CASE 2: Else, If [itex]d_{i+1} \geq r_{i+1}[/itex], then [itex]m_{i+1}[/itex] is the same as [itex]m_i[/itex] except for the additional pair [itex](f(D_i, a_{i+1}), r_{i+1})[/itex].

    Define m as the function where m(a) = b if and only if there exists some i where [itex]m_i[/itex](a) = b.

    Finally, following this process (starting with r, deriving b from it, deriving the a_i, and finally the [itex]m_i[/itex], and then m), we define F to be the function where F(r) = m.

    To produce [itex]F^{-1}(m)[/itex], repeatedly pick the pair (v, w) from m that has not yet been chosen, that minimizes min(v, w). If there is a tie between two different pairs, choose the pair with the larger v and the smaller w. Then, produce the appropriate [itex]a_i[/itex] (based on the domain and range values already used) for the larger element of the pair. Then convert the [itex]\{a_i\}[/itex] back into a binary string b, and finally to the real number r.

    Now, I first claim that every m generated by F is an element of M. For that to be true, two things must be true: First, if (v, w) is in m, we can't have either (y, w) or (v, z) also in m, where y [itex]\neq[/itex] v and z [itex]\neq[/itex] w. Second, for every v in [itex]Z^+[/itex], there must exist a (v, w) in m (for some w), and also there must exist an (x, v) in m (for some x).

    The first thing is guaranteed because every time we added a new pair (v, w) to [itex]m_{i+1}[/itex], the values used were guaranteed not to have been used before in [itex]m_i[/itex], because v was of the form [itex]f(D_i, *)[/itex] and w was of the form [itex]f(R_i, *)[/itex].

    The second thing is guaranteed because at each step, either v or w was the smallest unused element of either the domain or the range. So after 2n steps, [itex]m_{2n}[/itex] is guaranteed to have pairs (v, w) and (x, v) for any v < n.

    So, every m generated by F is an element of M. Next we need to show that F is one-to-one. Suppose we had different real numbers r1, r2 with binary expansions b1, b2, and f(r1) = m = f(r2). Because b1 != b2, the [itex]a_i[/itex] must also differ. So choose the smallest i where [itex]a_{1,i} \neq a_{2,i}[/itex] (denoting the difference between the [itex]a_i[/itex] for b1 vs the [itex]a_i[/itex] for b2 by 1 or 2 in the subscript). Since that was the smallest i, [itex]m_{i-1}[/itex] must be the same for both b1 and b2. So if CASE 1 from the calculation procedure for m_i is met, then it must be met for both, and m1 has [itex](d_{1,i+1}, f(R_{1,i}, a_{1,i+1}))[/itex], whereas m2 has [itex](d_{2,i+1}, f(R_{2,i}, a_{2,i+1}))[/itex]. Since [itex]d_{1, i+1} = d_{2, i+1}[/itex], m1 and m2 have different values for the same element of their domain, so they are different functions. If instead CASE 2 is met, a similar argument shows that again, m1 [itex]\neq[/itex] m2. So, F is one-to-one.

    Next, F is surjective because the method described earlier to construct [itex]F^{-1}(m)[/itex] will always produce a real number r where F(r) = m.


    So what can we tell about this F in the context of your original question?

    Well, it's certainly not "uniform." For example, there's at least a 1/2 chance that given a random real number r between 0 and 1, F(r)(1) = 1. Any binary string that begins with 0 will have that property, corresponding to a real number between 0.0 (zero) and 0.1 (one half). With a random r, m = F(r) is likely to map small numbers to other small numbers.
    Last edited: Oct 5, 2012
  20. Oct 5, 2012 #19

    Stephen Tashi

    User Avatar
    Science Advisor

    In the recursion, I don't understand the initialization step. How are [itex] D_1 [/itex] and [itex] R_1 [/itex] defined? (i.e. I'm asking about the subscript that is the number 1, not the letter "i" ).
  21. Oct 5, 2012 #20
    D_i is the set of all numbers v where (v, w) is in m_i, and R_i is the set of all numbers w where (v, w) is in m_i. They don't really need a base case, but you could say that D_0 and R_0 = the empty set.
    Last edited: Oct 5, 2012
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook