Convergence of a subsequence of a sum of iid r.v.s

• A
##X_i## is an independent and identically distributed random variable drawn from a non-negative discrete distribution with known mean ##0 < \mu < 1## and finite variance. No probability is assigned to ##\infty##.

Now, given ##1<M##, a sequence ##\{X_i\}## for ##i\in1...n## is said to meet ##conditions (1) and (2) for M## if it satisfies the following conditions: $$(1) S_n = X_1 + ... + X_n < 1 + n - M$$ $$(2) n>=M$$.

It is clear that the unconditional sample mean ##E_n[X_i]\xrightarrow{p}\mu##. But if we were to take ##M\to\infty## by considering sequences of ##\{X_i\}## for ##i\in1...n_k## that meet ##conditions (1) and (2) for M_k##, where ##\{M_k\}## is a sequence of successively larger numbers, does ##S_{n_k}/n_k\xrightarrow{p}\mu##?

Last edited:

FactChecker
Gold Member
How can M be a constant and also go to infinity? Other than the mysterious M, this sounds like it's asking if the Central Limit Theorem will apply.

StoneTemplePython
How can M be a constant and also go to infinity? Other than the mysterious M, this sounds like it's asking if the Central Limit Theorem will apply.
updated the question, so that it may be found less confusing

How can M be a constant and also go to infinity? Other than the mysterious M, this sounds like it's asking if the Central Limit Theorem will apply.
Unless by the C.L.T. ##\sqrt{n}(S_n-\mu) \xrightarrow{d}\mu##, then I'm not so sure it is the exact same question, although interesting enough.

Stephen Tashi
Consider $$S_n = X_1 + ... + X_n < 1 + n - M$$ where ##0<M<n## and each ##X_i## is an independent and identically distributed random variable drawn from a non-negative discrete distribution with known mean ##0 < \mu < 1## and finite variance.
It's unclear what you mean by the inequality ##S_n = X_1 + ... + X_n < 1 + n - M##.

Is this inequality supposed to impose a condition on the (common) distribution of the ##X_i##? When ##M = 50, n = 100## does ##X_3## have the same distribution as it does when ##M = 500, n = 630## ?

It's unclear what you mean by the inequality ##S_n = X_1 + ... + X_n < 1 + n - M##.

Is this inequality supposed to impose a condition on the (common) distribution of the ##X_i##? When ##M = 50, n = 100## does ##X_3## have the same distribution as it does when ##M = 500, n = 630## ?
The ##X_i## are ##i.i.d##. The inequality is conditional information. So we are only considering events where the sequence ##\{X_i\}## for ##i\in1...n## satisfies the condition of the inequality. So the distribution of ##X_3## should not change based on ##M## or ##n##. I will update to try to make this clearer.

StoneTemplePython
Gold Member
To the extent ##S_n## is a sum of an arbitrarily large number of random variables each with positive mean but the Right Hand Side stays bounded (I think that is what ##M \to \infty## is supposed to mean though a lot more care is needed in stating how these limits work), for any ##\mu \gt 0##, the answer is no.

If you think of this in terms of a counting process, then the random variable associated with the counting the number of arrivals (where each ##X_i## gives the time per epoch) then the random variable for the counting process is defective as you can have an infinite number of arrivals in a finite amount of time. This tells you that the time per arrival, given by ##\frac{S_n}{n}##, tends to zero, quickly.

Incidentally, to the extent ##X_i\gt 0## and is discrete there is a strong potential for a contradiction to arise in this setup, which I think needs more care.

To the extent ##S_n## is a sum of an arbitrarily large number of random variables each with positive mean but the Right Hand Side stays bounded (I think that is what ##M \to \infty## is supposed to mean though a lot more care is needed in stating how these limits work), for any ##\mu \gt 0##, the answer is no.

If you think of this in terms of a counting process, then the random variable associated with the counting the number of arrivals (where each ##X_i## gives the time per epoch) then the random variable for the counting process is defective as you can have an infinite number of arrivals in a finite amount of time. This tells you that the time per arrival, given by ##\frac{S_n}{n}##, tends to zero, quickly.

Incidentally, to the extent ##X_i\gt 0## and is discrete there is a strong potential for a contradiction to arise in this setup, which I think needs more care.
Tried to make more clear the convergence in the problem. I am still thinking about your example, but I think I meant to exclude any ##X_i## from being infinite and the ##X_i>=0## with a discrete distribution which is not always a contradiction.

StoneTemplePython
Gold Member
Tried to make more clear the convergence in the problem.

haha I liked the no probability assigned to ##\infty## comment.

Look as is, the key thing is to understand how ##n## grows as ##M## grows. I still don't see that. Is ##(n-m)## that is constant or perhaps there is some kind of constant ratio they tend toward. Or ...

Once you have that nailed down, you may be surprised to find that this becomes a nice exercise in creatively using Markov's Inequality.

haha I liked the no probability assigned to ##\infty## comment.

Look as is, the key thing is to understand how ##n## grows as ##M## grows. I still don't see that. Is ##(n-m)## that is constant or perhaps there is some kind of constant ratio they tend toward. Or ...

Once you have that nailed down, you may be surprised to find that this becomes a nice exercise in creatively using Markov's Inequality.

I think I want to deliberately keep ##n-M## arbitrary, but maybe that is unhelpful. One way to construct such sequences is to imagine starting with ##M## dollars and paying ##1## dollar to draw each ##X_i## adding the result to your bankroll, until you can no longer afford to play (that should be precisely the above condition). I can't say much about the ##n## you achieve in this set up but it is a concrete way to choose ##n## given ##M##.

StoneTemplePython
Gold Member
I think I want to deliberately keep ##n-M## arbitrary, but maybe that is unhelpful.
This is no good.

One way to construct such sequences is to imagine starting with ##M## dollars and paying ##1## dollar to draw each ##X_i## adding the result to your bankroll, until you can no longer afford to play (that should be precisely the above condition). I can't say much about the ##n## you achieve in this set up but it is a concrete way to choose ##n## given ##M##.

This is great, why didn't you say so? It's wise to explicitly say that this is a stopping problem. Are you familiar with Wald's Equality? What do you mean 'choose n'? Like optimize it with respect to...? I can give you expected number of plays via Wald's Equality... in general stopping rules aren't going to give you what (I think) you're hoping for, though they can give some nice structure.

This is no good.

This is great, why didn't you say so? It's wise to explicitly say that this is a stopping problem. Are you familiar with Wald's Equality? What do you mean 'choose n'? Like optimize it with respect to...? I can give you expected number of plays via Wald's Equality... in general stopping rules aren't going to give you what (I think) you're hoping for, though they can give some nice structure.
I don't think I've seen Wald's inequality before, reading it now. By choose ##n## I mean that given ##M_k## for instance simply keeping the first sequence of ##\{X_i\}## you get when the stopping condition is reached.

Last edited:
StoneTemplePython
Gold Member
I don't think I've seen Wald's inequality before, reading it now. By choose ##n## I mean that given ##M_k## for instance simply keeping the first sequence of ##\{X_i\}## you get when the stopping condition is reached.

I'd probably suggest a change of variables, so you have ##0\lt \bar{Y} := 1 - \mu \lt 1## this is your average loss per game but its reframed here as a positive number.

and have a newly defined ##S_n: = Y_1 + Y_2 + ... + Y_n##

Wald's Equality would tell you that

##E\big[S_n\big] = \bar{Y} E\big[J\big]##

where ##J## is the random variable of total number of plays -- note that this seems to be what you had ##n## doing but lower case n didn't seem like a random variable... given your stopping rule you actually want the number of plays to be an r.v. itself which I denoted by ##J##... and you also have ##S_n:=M## which is total 'score' at termination. (There are some technical nits in making sure that you terminate not only with probability 1, but sufficiently fast so that ##E\big[J\big] \lt \infty## but this is, in effect, assured by the fact that your drift of ##\bar{Y}## is constant and positive -- long story I'm afraid.)

after some significant technical maneuvering it ends up almost an accounting identity saying that your expected total 'score' is equal to the expected plays times expected payoff (of ##Y_i = 1 - \mu##) per play. I don't think this is quite what you original wrote, but my sense it is at least close to what you're asking now. This holds for any given ##M \gt 0## large or not.

There may be some fine tuning still, but this is close. Note: there are an awful lot of interesting problems in probability that are framed as "a gambler walks into a casino and bets... "

Stephen Tashi
Now, given ##1<M##, a sequence ##\{X_i\}## for ##i\in1...n## is said to meet ##conditions (1) and (2) for M## if it satisfies the following conditions: $$(1) S_n = X_1 + ... + X_n < 1 + n - M$$ $$(2) n>=M$$.

That description is unclear about the role of ##n##. Here are two different possibilities for how we take samples of the random variable ##S_n##.

1) We are given the value of ##n## and ##M##. We realize samples ##x_1,x_2...x_M## of the ##X_i##. Then we form the sum s = ##x_1 + x_2 + ...x_n##. If ##s < 1 + n - M## we use ##s## as a sample of ##S_n##. Otherwise we discard ##s## and repeat the process.

2) We are given the value of ##M##, but not the value of ##n##. We realize samples ##x_1,x_2,...x_M## of the ##X_i##. Then we form the sum ##s = x_1 + x_2 + ...x_n## where ##n## is the largest integer such that ##s < 1 + n - M##. We use ##s## as a sample of ##S_n##.

That description is unclear about the role of ##n##. Here are two different possibilities for how we take samples of the random variable ##S_n##.

1) We are given the value of ##n## and ##M##. We realize samples ##x_1,x_2...x_M## of the ##X_i##. Then we form the sum s = ##x_1 + x_2 + ...x_n##. If ##s < 1 + n - M## we use ##s## as a sample of ##S_n##. Otherwise we discard ##s## and repeat the process.

2) We are given the value of ##M##, but not the value of ##n##. We realize samples ##x_1,x_2,...x_M## of the ##X_i##. Then we form the sum ##s = x_1 + x_2 + ...x_n## where ##n## is the largest integer such that ##s < 1 + n - M##. We use ##s## as a sample of ##S_n##.
Correct. Does a different method make a difference? A counterexample is sufficient...of course there is not always a largest n in your second example

Last edited:
I'd probably suggest a change of variables, so you have ##0\lt \bar{Y} := 1 - \mu \lt 1## this is your average loss per game but its reframed here as a positive number.

and have a newly defined ##S_n: = Y_1 + Y_2 + ... + Y_n##

Wald's Equality would tell you that

##E\big[S_n\big] = \bar{Y} E\big[J\big]##

where ##J## is the random variable of total number of plays -- note that this seems to be what you had ##n## doing but lower case n didn't seem like a random variable... given your stopping rule you actually want the number of plays to be an r.v. itself which I denoted by ##J##... and you also have ##S_n:=M## which is total 'score' at termination. (There are some technical nits in making sure that you terminate not only with probability 1, but sufficiently fast so that ##E\big[J\big] \lt \infty## but this is, in effect, assured by the fact that your drift of ##\bar{Y}## is constant and positive -- long story I'm afraid.)

after some significant technical maneuvering it ends up almost an accounting identity saying that your expected total 'score' is equal to the expected plays times expected payoff (of ##Y_i = 1 - \mu##) per play. I don't think this is quite what you original wrote, but my sense it is at least close to what you're asking now. This holds for any given ##M \gt 0## large or not.

There may be some fine tuning still, but this is close. Note: there are an awful lot of interesting problems in probability that are framed as "a gambler walks into a casino and bets... "
The problem with using ##E[J]## in practice for my games is that distributions like slot machines with very heavy tails tend to give absurdly high ##E[J]## with very low chances of ever experiencing it, whereas the order statistics have empirically very tantalizing properties just out of reach of my analytic abilities.

It would seem strange to me if the theorem was false, but I am unfamiliar with talking about restricted subsequences in terms of convergence in probability. For the stopping problem example I provided and the distributions I am looking at, I believe the theorem is true and empirically I can verify this. In addition, for that example there is a third condition: Given ##n##, or rather ##j## in your terms, if ##n## was reached we also know that ##S_N>=N-M+1## for all ##N\in1,...,n-1## which gives a very tight bound for the EV conditional on having reached ##n##. On the surface, this all seems independent of ##\mu## for fixed ##M##, but as ##M## gets larger, again empirically I am seeing convergence.

Stephen Tashi
Correct. Does a different method make a difference?
The two methods define two different random variables. Before we worry about two different random variables, which one of them is the ##S_n## that you are asking about?

The two methods define two different random variables. Before we worry about two different random variables, which one of them is the ##S_n## that you are asking about?
I don't mean to specify any one way of constructing the sequences as the "correct" random variable. The random variable is constructed by restricting the event space to only those events which satisfy the condition, then standard projection arguments well define the random variable. You don't need to construct the sequences to know this object unambiguously exists. There may only be one correct way of constructing the sequences in relation to the conditional mean, but I wouldn't know about that. My usual intuition in measure theory is there are probably many representations.

But I guess you are also asking if ##n## is arbitrary? No we are not given ##n##, we are talking about all sequences of ##\{X_i\}## that satisfy the condition including any ##n>=M##. If you'd rather work with a concrete example, I have described a very specific application in the discussions with @StoneTemplePython

Stephen Tashi
I don't mean to specify any one way of constructing the sequences as the "correct" random variable. The random variable is constructed by restricting the event space to only those events which satisfy the condition, then standard projection arguments well define the random variable. You don't need to construct the sequences to know this object unambiguously exists.

In post #10, you are talking about method 2), which makes ##n## a function of the realized values ##x_1,x_2,....x_M##. This is not the same as considering all subsequences of the ##X_i## that satisfy ##S_n < 1 + n - M## because different ##n## may be selected for the same realized values ##x_1,x_2,...x_M##. So if your intent is ask about an ##S_n## defined only by restricting the event space with the constraint ##S_n < 1 + n - M## for some ##n##, the stopping problem stated in post #10 is not a correct example.

In post #10, you are talking about method 2), which makes ##n## a function of the realized values ##x_1,x_2,....x_M##. This is not the same as considering all subsequences of the ##X_i## that satisfy ##S_n < 1 + n - M## because different ##n## may be selected for the same realized values ##x_1,x_2,...x_M##. So if your intent is ask about an ##S_n## defined only by restricting the event space with the constraint ##S_n < 1 + n - M## for some ##n##, the stopping problem stated in post #10 is not a correct example.
True, if you keep looking you'll see there is an additional condition 3 which it satisfies and that problem may be easier to solve. If you have an answer to either, would love to see it. Somehow though, I think if the theorem is true, we would need something pretty exotic to keep further restrictions on the subsequences from overturning the result...

Stephen Tashi
True, if you keep looking you'll see there is an additional condition 3 which it satisfies and that problem may be easier to solve. If you have an answer to either, would love to see it.

I think the situation you intend to ask about is the "stopping problem" described in post #10. Considering the expected value of ##S_n/n## in that problem is interesting, but I'm also interested in whether the constraints you state in post #1 define a different scenario or whether the don't actually define a random variable.

The basic probability space involved (in either scenario) is ##\Omega##, the set of infinite sequences of realized values for ##X_1,X_2,...##. The events in this probability space that have non-zero probabilities are described by specifying that a finite subset of the ##X_i## have specific values and allowing the remainder of the ##X_i## to be arbitrary.

Consider the case M = 3 with each ##X_i## distributed according to ##Pr(X = 0) = 1/2, Pr(X=1) = 1/2##. The definition of ##S_n## in post#1 includes events such as

1) ##X_1 = 1, X_2 = 1, X_3 = 1, X_4 = 1##, ##S_n = S_4 = 4/4##
2) ##X_1 = 1, X_2 = 1, X_3 = 1, X_4 = 1, X_5 = 0 ##, ##S_n = S_5 = 4/5##

However, in the probability space ##\Omega##, events 1) and 2) are not mutually exclusive. So the constraints given in post #1 don't define a set of outcomes for a random variable ##S = S_n## (using the term "outcome" to mean a "point" in a probability space for ##S_n##, i.e. an event that is mutually exclusive of other outcomes in the probability space. )

I think the situation you intend to ask about is the "stopping problem" described in post #10. Considering the expected value of ##S_n/n## in that problem is interesting, but I'm also interested in whether the constraints you state in post #1 define a different scenario or whether the don't actually define a random variable.

The basic probability space involved (in either scenario) is ##\Omega##, the set of infinite sequences of realized values for ##X_1,X_2,...##. The events in this probability space that have non-zero probabilities are described by specifying that a finite subset of the ##X_i## have specific values and allowing the remainder of the ##X_i## to be arbitrary.

Consider the case M = 3 with each ##X_i## distributed according to ##Pr(X = 0) = 1/2, Pr(X=1) = 1/2##. The definition of ##S_n## in post#1 includes events such as

1) ##X_1 = 1, X_2 = 1, X_3 = 1, X_4 = 1##, ##S_n = S_4 = 4/4##
2) ##X_1 = 1, X_2 = 1, X_3 = 1, X_4 = 1, X_5 = 0 ##, ##S_n = S_5 = 4/5##

However, in the probability space ##\Omega##, events 1) and 2) are not mutually exclusive. So the constraints given in post #1 don't define a set of outcomes for a random variable ##S = S_n## (using the term "outcome" to mean a "point" in a probability space for ##S_n##, i.e. an event that is mutually exclusive of other outcomes in the probability space. )
Good point, I see now that for a given sequence of ##\{X_i\},i\in1...\infty##, that ##S_n## may definitely be ambiguous. I think the third condition makes a clear definition of ##S_n## (by choosing the minimum index which would satisfy the conditions).

As for the original problem in the context of infinite sequences, does there have to be a maximum ##n## for a given sequence? It could be that with probability one, that conditions (1) and (2) will be met infinitely often which would mean that all typical sequences in the original ##\Omega## are kept by the conditions.

So I'm not so convinced that infinite sequences are the the most natural setting for this problem, and it remains unclear to me why ##S_n## would be ambiguous when considering only finite sequences. I'm pretty sure you would have similar issues defining the ordinary sample average in any such infinite sequence context, due to ambiguity in choosing the sample size.

Last edited:
StoneTemplePython
Gold Member
I think the situation you intend to ask about is the "stopping problem" described in post #10. Considering the expected value of ##S_n/n## in that problem is interesting, but I'm also interested in whether the constraints you state in post #1 define a different scenario or whether the don't actually define a random variable.

The basic probability space involved (in either scenario) is ##\Omega##, the set of infinite sequences of realized values for ##X_1,X_2,...##. The events in this probability space that have non-zero probabilities are described by specifying that a finite subset of the ##X_i## have specific values and allowing the remainder of the ##X_i## to be arbitrary...

You're welcome to set it up this way but I wouldn't. I'd suggest taking a page from Feller volume 1, and consider a discrete sample space, with finite numbers of trials, then passing limits later on. It's a more careful (but perhaps clunky) approach that avoids introducing extra, purely analytic challenges.

- - - -
22 posts in and I still don't see a direct coherent question from OP, so I'm dropping.

22 posts in and I still don't see a direct coherent question from OP, so I'm dropping.
let's start over...
imagine you have a sample average of N trials and you record that. Repeat, with a larger N. now imagine having any rule that keeps you from recording some of the sample averages based on their underlying draws. in general, the new sample averages do not need to converge to the mean of the underlying distribution. but some rules might: for instance throwing out every second sample mean won't bias the result. what about conditions 1 and 2 above where i only keep the results that satisfy the inequalities? To tie in ##M##, just pick any number bigger than the ##n## from the previous recorded sample average.

Last edited:
StoneTemplePython
Gold Member
let's start over...
imagine you have a sample average of N trials and you record that. Repeat, with a larger N. now imagine having any rule that keeps you from recording some of the sample averages based on their underlying draws. in general, the new sample averages do not need to converge to the mean of the underlying distribution. but some rules might: for instance throwing out every second sample mean won't bias the result. what about conditions 1 and 2 above where i only keep the results that satisfy the inequalities? To tie in ##M##, just pick any number bigger than the ##n## from the previous recorded sample average.

So maybe this is a question about the method of truncation. I really can't tell. If it is, you should be able to figure out that truncating events with positive probability who uniformly have payoffs ##\geq## the mean, must decrease the the observed mean.

I also still don't understand how the ##M \to \infty## works.

I'm still spending a lot of time guessing what your question is. This is not a wise use of my time.
- - - -
A general hint here is: when you crystallize your question, think in terms of realized sample paths (ala SLLN) and figure out a smart/natural way to bipartition these events to reflect the constraint you are considering. (I.e. one set conforms with the constraint and the complement, does not.) Now evaluate both of those sets and confirm that they have non-zero probability. I assume yes.... Then evaluate the mean of (at least) one of them -- technically you only need the mean of one of them because they are complements and you know the mean over their union is ##\mu##. This should answer your question, whatever it truly is.

Good luck.