Recognitions:

## Paradox of uniform prob. distribution on Z+ vs cardinality, etc.

There are no actual paradoxes in mathematics (we hope). There are only things that appear to be paradoxes to fallible intuitions. Here is one that bothers me.

There can be no translation invariant probability measure on the non-negative integers. Yet it is possible to imagine a general class of sampling processes that "ought" to have a particular implementation that picks a non-negative integer at random and favors no particular integer.

( I'm aware that number theorists who want to talk about "picking a integer at random" have defined various pseudo-measures or "asymptotic measures" such as http://en.wikipedia.org/wiki/Natural_density. In case anyone was about to post that link, my question isn't about that topic. I'm not asking about how to pick an integer at random. I'm asking about a paradox. )

On the one hand, there can be no "uniform" probability distribution on the non-negative integers in the sense of a probability distribution satisfying 1) A singleton integer is a measureable set and 2) The measure of any measureable set is the same as that of any other set formed by adding a constant to each of the first set's members.

The impossibility follows from the countable additivity of measures, which implies the measure of the whole space would be the sum of the measures of each integer. We can't find any constant probability p that makes the sum add up to 1.

(This does not rule-out the possibility that there could be a translation invariant measure on the non-negative integers where a singleton integer is not a measureable set.)

On the other hand, consider the following general type of sampling process on the non-negative integers (which I mentioned in the thread http://www.physicsforums.com/showthread.php?t=637049 ).

 Let M be the set of 1-to-1 mappings of the non-negative integers onto themselves. Let F be a 1-to-1 mapping between the elements of M and the real numbers in the interval [0,1]. Let G be a probability distribution defined on the non-negative integers (such as a Poisson distribution). Consider the following process of sampling from the integers. Pick an integer k from the distribution G. Pick a real number x from the uniform distribution on [0,1]. Find the map f = F(x). Let the value chosen from the integers be f(k).
The existence of the function F is established by the fact that the cardinality of the real numbers in [0,1] is the same as the cardinality of the set M.

Since there exists one such function F, we could create arbitrarily many other functions that accomplish a 1-1 mapping between M and [0,1] by exchanging the y-values among the (x,y) ordered pairs in F.

However, the non-existence of any translation invariant measure on the non-negative integers says that one or more of the following counter-intuitive ideas must be true.

1. Perhaps any F that exists would create a process that favored some integer more than another. This is counterintuitive because 1-1 mappings that are used in cardinality arguments are not restricted by an concerns about measure or metrics. They can be quite arbitrary. Why should there be some fundamental asymmetry in what we can accomplish with them? - especially since we are picking f= F(x) by using a uniform distribution for x on [0,1].

2. Perhaps any F that resulted in a translation invariant probability distribution would imply a probability measure where a singleton integer is not a measureable set. This is counterintuitive since the process is described in terms of selecting a single integer as a sample.

3. Perhaps there is a fundamental logical paradox in assuming that exact samples can be drawn from a uniform distribution on the real numbers. In practical terms it is impossible to select such a sample, but is there any contradiction involved in assuming it can be done theoretically?

4. Perhaps the assumption that "there exists an F" is not equivalent to saying that an F can ever be explicitly defined. To do a particular sampling process, we must define F explicitly. Can there be things that theoretically exist but that are also theoretically impossible to specify explicitly?
 PhysOrg.com science news on PhysOrg.com >> Leading 3-D printer firms to merge in $403M deal (Update)>> LA to give every student an iPad;$30M order>> CIA faulted for choosing Amazon over IBM on cloud contract

 Quote by Stephen Tashi On the one hand, there can be no "uniform" probability distribution on the non-negative integers in the sense of a probability distribution satisfying 1) A singleton integer is a measureable set and 2) The measure of any measureable set is the same as that of any other set formed by adding a constant to each of the first set's members.
A singleton integer is a measurable set, and its measure is zero.

A set which is not measurable has no defined measure. Such sets are rather exotic and difficult to construct. They are basically peculiar exceptions whose exclusion is of no great practical interest.

As for the rest, I'll repeat myself in saying that your manipulations don't really change anything.

Recognitions:
 Quote by ImaLooser A singleton integer is a measurable set
There is no axiom of measure theory that requires that a singleton set in a measure space be a measureable set.

Recognitions:
Gold Member
Staff Emeritus

## Paradox of uniform prob. distribution on Z+ vs cardinality, etc.

 1. Perhaps any F that exists would create a process that favored some integer more than another.
That would be the natural inference from the fact that a uniform distribution on Z does not exist.

One obvious problem with your intuition is that the uniformness of the uniform distribution on [0,1] has any bearing on the behavior of the distribution you have defined on M.

Maybe the error will be exposed if you try using the same idea to pick from [1,2] instead of from M. That is, pick any bijection F from [0,1] to [1,2]. Then, define a distribution on [1,2] that corresponds to choosing a uniform sample on [0,1] and then applying F.

Another way to repair your intuition might be to rephrase what you are actually doing: you are really just choosing a more-or-less arbitrary function G(r,n) that takes a real number r and an integer n and produces an integer. Then, you are sampling from the integers by choosing r and n from your specified distributions then applying G.

(the only property you have required G to have is that, for each r and m, G(r,n)=m has a unique solution for n)

 Quote by Stephen Tashi There is no axiom of measure theory that requires that a singleton set in a measure space be a measureable set.
I'm not going to waste any more time giving you useful information you reject.

 Quote by Stephen Tashi Thank you for the confirmation. Let M be the set of 1-to-1 mappings of the non-negative integers onto themselves. Let F be a 1-to-1 mapping between the elements of M and the real numbers in the interval [0,1]. Let G be a probability distribution defined on the non-negative integers (such as a Poisson distribution). Consider the following process of sampling from the integers. Pick an integer k from the distribution G. Pick a real number x from the uniform distribution on [0,1]. Find the map f = F(x). Let the value chosen from the integers be f(k).
I didn't follow the other thread but I'm trying to catch up here. You choose k non-uniformly and x uniformly. You let f = F-1(x) and then n = f(k). So how do we know that this way of choosing n is uniform? The numbers n = f(k) will have the same distribution as G.

All you're doing is taking a non-uniform distribution on the natural numbers and then permuting the natural numbers. A permutation is just a renaming. You haven't gotten rid of the non-uniformity. You've just chosen according to G and then agreed to call each number by a different number's name. The distribution's identical.

Recognitions:
 Quote by SteveL27 The numbers n = f(k) will have the same distribution as G.
Not in general. I'll try to clarify that.

I agree that the class of sampling methods I propose may behave in various was in regards to the distribution they produce and nothing guarantees that the distribution a particular method produces will be translation invariant. I'm merely stating that its counterintuitive that no particular example of this method can accomplish a translation invariant distribution because we have so much freedom of choice in creating such methods.

One way to think of this class of methods is that we pick an integer and then we pick a renaming of the integers "at random" from a set of many different renamings. - i.e. we don't always use the same renaming.

We pick in integer k from the distribution G. We pick an x in [0,1]. Using F, this defines a map of the integers to themselves. The single map f = F(x) is a renaming of the integers. Let v = f(k} The conditional probability of the sample value v given the value of x is simply the value of the probability density G at k since k is renamed as v. However, there are more ways to realize a value of v than to use one particular x and it's associated renaming.

Computing the probability of getting a sample value of v is done this way

Pr(sample value = v) = $\sum_{k=0}^\infty$ pr( k is picked from G) pr( k is mapped.
to v by f = F(x)).

Pr(k is mapped to v) = $\int_{S_k} 1 dx$ where $S_k$ is the set of all x in the unit interval such that $f = F(x)$ is a renaming of the integers that renames k as v.

 Quote by Hurkyl That would be the natural inference from the fact that a uniform distribution on Z does not exist.
That result (that all such sampling methods produced a non-uniform distribution on the non-negative integers) would require the least adjustment to the way I (and the world) think about taking random samples! However, my current belief is that not all descriptions of taking random samples that trip lightly off one's tongue actually define a random variable that has a probability distribution.

Thinking about what you said, yes the class of sampling methods could be generalized. It could expand to be the following.

Let F be a surjective mapping from the interval [0,1] to the set M of 1-to-1 mappings of the non-negative integers onto themselves. (no need to make F 1-to 1) Pick k from some discrete distribution defined on a subset of the non-negative integers. (You could even pick k to be a constant!) Pick x from some continuous distribution w(x) defined on [0,1] (You're correct that there is nothing special about the uniform distribution). Let f = F(x). Set the value of the sample to be v = f(k).

In computing pr( sample is v) using the sum defined above, I think there are the following possibilities:

1. For all values of v, the sets $S_k$ are each measureable and some have non-zero measure. The result is a non-translation invariant distribution for the random variable v on the non-negative integers.

2. For all values of v, the sets $S_k$ are each measureable and each has measure 0. Each value v has zero probabiltiy. The random variable v has no probabilty distribution.

3. For some values of v, some of the sets $S_k$ are not measureable. I don't see how v can have a probability distribution in this case as long as we treat singleton sets {v} as measureable. However, if we drop that requirement and use summations like the one above to compute the probabilty of some larger subset of the non-negative integers, we might wipe-out the unmeasureability involved in the inegrations because we would be integrating over unions of the various $S_k$. I don't know what measure theory says about finite or countable unions of un-measureable sets. Are they also un-measureable?

----
Maybe the facts about "measureable functions" in measure theory tell us that we aren't permitted to defining a process of sampling using just any arbitrary function of a random variable.

 Quote by Stephen Tashi We pick in integer k from the distribution G. We pick an x in [0,1]. Using F, this defines a map of the integers to themselves. The single map f = F(x) is a renaming of the integers. Let v = f(k} The conditional probability of the sample value v given the value of x is simply the value of the probability density G at k since k is renamed as v.
Ok, I've understood that much then.

 Quote by Stephen Tashi Pr(sample value = v) = $\sum_{k=0}^\infty$ pr( k is picked from G) pr( k is mapped. to v by f = F(x)).
Now I'm troubled. For fixed f and fixed v,

pr( k is mapped to v by f = F(x))

must be 1 for exactly one value of k; and zero for all the others. That's how we defined f. Once you choose x, you fix a permutation f; then f(k) is either v or not. It's v for exactly one value of k.

You've introduced some notational confusion by using k as a constant and also as the variable in the summation. I think if you used j as the variable in the summation, this point would be more clear. There's exactly one value of j such that f(j) = v; and that value is j = k. For all other values of j, the probability that f(j) = v is zero, and those terms of the summation drop out.

I can't help feeling that your use of permutations of the natural numbers is adding a level of complexity here for no purpose. No renaming/permutation can possibly change any probability distribution on a set.

I found this, is it helpful? It's titled, "Uniform Distributions on the Natural Numbers." Of course you can't do exactly that, but the author replaces countable additivity with finite additivity in order to make progress.

http://www.stat.cmu.edu/tr/tr814/tr814.pdf

Recognitions:
 Quote by SteveL27 Now I'm troubled. For fixed f and fixed v, pr( k is mapped to v by f = F(x)) must be 1 for exactly one value of k; and zero for all the others.
No, because the sampling process involves two random choices per sample. We pick k at random and we pick x at random for each sample. A sample value of v can result from many different possible combinations of k's and x's. We aren't picking one value of x and using in generating all samples. We pick a new x for each new sample.

From Hurkyl's suggestions (and your doubts about merely permuting the integers), I've come up with this variation of my inuititive conflict:

Suppose we begin with the following sampling process:

Let F be a fixed surjection from [0,1] onto M where M is the set of 1-1 mappings of the non-negative integers onto themselves. Let U be the uniform distribution on the real numbers in [0,1].

Process S1:
For each sample, set k = 1 and draw x at random from U. Then f = F(x) is some mapping of the non-negative integers to themselves. Let the value of the sample be v = f(k).

If S1 produces a non-translational invariant measure on the integers, we try adjusting the output, so to speak, by creating different sampling processes. Instead of setting k = 1, we could pick k from some discrete distribution ( - from S1, for example) and we could pick x from some non-uniform distribution on [0,1]. We could also try various F's.

Hence it's tempting to think that one could eventually find some sampling process that treated all integers on the same footing.

Perhaps what happens is that most F's that one can find result in sampling processes where the samples have no probability distribution because the sets $S_k$ either all have zero measure or present us with some un-measureable sets. If so, I find it disturbing that one can create a sampling process that is "well defined" as procedure, but whose samples have no probability distribution.
 You're engaging in speculation that after performing a bunch of random transformations of probability distributions you will somehow arrive at a uniform distribution over the integers. But you have not given any proof of this, you're simply claiming it for no reason. If you really think it's possible, stop being vague about properties of "F" and actually provide such an F (it's not hard to explicitly construct such an F). Then you'll clearly see that it doesn't work. It actually would be an interesting exercise to construct the F and figure out what the distribution would be - why not try it. In any case, a uniform probability distribution over the integers is not possible since if each integer is assigned a probability greater than 0, the sum of their probabilities would be infinite.

 Quote by Stephen Tashi Process S1: For each sample, set k = 1 and draw x at random from U. Then f = F(x) is some mapping of the non-negative integers to themselves. Let the value of the sample be v = f(k). If S1 produces a non-translational invariant measure on the integers, we try adjusting the output, so to speak, by creating different sampling processes. Instead of setting k = 1, we could pick k from some discrete distribution ( - from S1, for example) and we could pick x from some non-uniform distribution on [0,1]. We could also try various F's.
If one renaming makes no difference, how can a lot of renamings?

Let's forget math for a moment and consider a visualization. We have a room full of people. Each person wears a name tag, and all the name tags are different. In fact each person's name tag bears a unique natural number. Like in the old tv show The Prisoner. One person is number 1. One person is number 6. And so forth.

Now we ask everyone their height and we determine the statistical distribution of all the heights of all the people in the room.

Now we have everyone toss their name tag into a hat; and then we have everyone pick a new name tag. And then we re-determine the distribution of heights. It's blindingly obvious that this can not make the slightest difference in the distribution of heights.

Ok that's your original situation. Now in your S1 process you do this:

You take a random person and get their height. Then you do the hat thing with the nametags so everyone has a new nametag. You take another random person and record their height. Then everyone gets a new nametag, etc.

The distribution of heights is not going to change. The switching of name tags before each random selection is not going to have the slightest effect on the distribution of heights as determined by a random sample.

And if you imagine the population to be infinite, that won't change. Continually renaming the members of the population before choosing a random sample can not possibly have any effect on the distribution of heights.

This is how I'm understanding your S1 experiment. Permuting the natural numbers does not make the slightest difference in anything.

And if you measure the height of Number One, then permute the name tages and measure the height of Number One again, over and over, you are still going to end up with the same original distribution of heights. The permutations themselves serve as a randomizing mechanism. Everyone has an equal chance to be Number One after permuting the name tags.

Remember your choice of permutations is done by using a uniform distribution on [0,1].

Now at this point you keep claiming that something about the bijection between [0,1] and the set of permutations might introduce some new distribution. I fail to see how or why you think this. A bijection, just like a permutation, is just a renaming. Each permutation f gets renamed as a real number x between 0 and 1. So I have the permutation f1/2, the permutation f1/e, and so forth. The bijection doesn't have magic properties that allow you to somehow construct a uniform distribution on the natural numbers.

You keep saying you have an intuition that it does; and I don't doubt that. But you haven't shown the mathematical path to your intuition.

Recognitions:
 Quote by mXSCNT You're engaging in speculation that after performing a bunch of random transformations of probability distributions you will somehow arrive at a uniform distribution over the integers. But you have not given any proof of this, you're simply claiming it for no reason.
I'm not claiming it. I'm merely trying to understand why it doesn't work. I'd be delighted to see a specific F if you have one.
 Recognitions: Science Advisor Take a simpler case. fix an integer - says minus 5. Choose a bijection of the integers by you method. Then just choose the value of that function on minus 5. This seems the same as just randomly sampling integers without bias.

 Quote by lavinia Take a simpler case. fix an integer - says minus 5. Choose a bijection of the integers by you method. Then just choose the value of that function on minus 5. This seems the same as just randomly sampling integers without bias.
Just to clarify a bit of confusing terminology here ... Stephen's talking about permutations of the nonnegative or positive (makes no difference) integers, ie the natural numbers $\mathbb{N}$.

One reason this is convenient is that we have an obvious bijection between the subsets of $\mathbb{N}$ and the unit interval. For a given subset, create a binary decimal with 1 in the n-th position if and only if n is in the subset. This gives us an easily visualizable bijection between subsets of $\mathbb{N}$ and [0,1]. As always we have to wave our hands at the countable set of numbers with repeating binary decimals, ie .1111... = 1.0 but we always convince ourselves this doesn't matter. I assume there's a rigorous fix for this problem somewhere.

In any event it would be better to talk about $\mathbb{N}$ rather than $\mathbb{Z}^+$ so as to avoid confusion.

Recognitions:
 Quote by SteveL27 If one renaming makes no difference, how can a lot of renamings? Let's forget math for a moment and consider a visualization. We have a room full of people. Each person wears a name tag, and all the name tags are different. In fact each person's name tag bears a unique natural number. Like in the old tv show The Prisoner. One person is number 1. One person is number 6. And so forth.
I'll seize your example to state my questions!

Consider a property of the name tags such as "the name tag is an even number" and an associated statistical property of the set of name tags such as "70% of the name tags are even numbers". My questions concern the properties of the name tags, not the properties of the people. However, let's also mention a property of the people ( like the person's weight) and an associated statistical property of the set of people, such as "80% of the people weigh over 100 lbs".

We are to randomly sample the set of people The boss has assigned Gerald to pick the samples and he calls them out by their name tags. We know Gerald is biased toward picking name tags that are odd numbers. That might bias our estimate both of the statistical properties of the name tags and the statistical properties of the people ( for example suppose people with odd numbered name tags tend to weigh less than 100 lbs).

We ask the boss to replace Gerald with someone who is unbiased. The boss says "No, can't do that. We can't hire unbiased samplers. They don't exist. It's because of countable additivity and stuff like that. You'll have to make do with Gerald. Use math or something to fix it."

To accomplish a random sampling of a property of the people (such as weight), we can randomly re-assign the name tags each time BEFORE Gerald picks a person. I think your point is that one random reassignment per sample is enough and perhaps that we only need to do a single the random reassignment before taking all the samples. This method does not remove Gerald's bias toward picking name tags with odd numbers. It doesn't randomly sample the properties of the set of name tags.

If we want to randomly sample the properties of the name tags, we could introduce an intermediary between Gerald and picking the sample. AFTER Gerald picks a name tag k, we could randomly pick a mapping of the name tags onto themselves and call out the name tag v that k mapped to. (This makes Gerald rather redundant, but keeps him on the payroll.)

The method I describe for seeking a translation invariant or "uniform" distribution on the non-negative integers is analogous to attempting to remove the bias in how a non-uniform distribution Gerald selects samples by using a random reassignment of name tags AFTER Gerald has made his selection. What we can measure from the sample v is only the properties of the name tag. If you wanted to made an analogy to people's weights, you'd have to add more structure to the problem, like a having an associated sequence of weights {W_i }.

Why does the method of reassigning names, which works well for removing bias in sampling finite sets (and is actually used in practical statistical sampling), break down when we attempt to apply it to countably infinite sets?

Recognitions:
 Quote by SteveL27 Just to clarify a bit of confusing terminology here ... Stephen's talking about permutations of the nonnegative or positive (makes no difference) integers, ie the natural numbers $\mathbb{N}$. One reason this is convenient is that we have an obvious bijection between the subsets of $\mathbb{N}$ and the unit interval. For a given subset, create a binary decimal with 1 in the n-th position if and only if n is in the subset. This gives us an easily visualizable bijection between subsets of $\mathbb{N}$ and [0,1]. As always we have to wave our hands at the countable set of numbers with repeating binary decimals, ie .1111... = 1.0 but we always convince ourselves this doesn't matter. I assume there's a rigorous fix for this problem somewhere. In any event it would be better to talk about $\mathbb{N}$ rather than $\mathbb{Z}^+$ so as to avoid confusion.