Measuring the degree of convergence of a stochastic process

estebanox · Jun 11, 2016

Consider a sample consisting of {y1,y2,...,yk} realisations of a random variable Y, and let S(k) denote the variance of the sample as a function of its size; that is

S(k)=1/k( ∑^k_i=1(yi−y¯)²)
for y¯=1/k( ∑^k_i=1 yi)

I do not know the distribution of Y, but I do know that S(k) tends to zero as k tends to infinity.

Suppose that I can increase the sample size gradually, so that I can calculate a sequence of variances {S(1),S(2),...,S(k)} for any strictly positive integer k.

I would like to determine the sample size that guarantees a minimum (arbitrary) 'degree of convergence'; in other words, I would like to determine the minimum value of k for which we expect that S(k+1)=S(k)±ϵ , for some small ϵ.

Ideas? I'm not interested in the analytical answer to this question, but rather something that I can implement numerically.

NOTE: This question has also been asked here

andrewkirk · Jun 11, 2016

If you have observed k realisations you could use non-linear regression (GLM) to fit a curve to the set of data points (1,S(1)), (2,S(2)), ...,(k,S(k)), subject to the constraint that the curve is always positive and asymptotically approaches 0. A very simple curve to fit would be the two-parameter curve ##f(n)=ae^{-bn}##. The linear model for your regression of ##y## against ##n## would then be
$$y=\log a-bn+\epsilon_n$$
where the observed ##y## values are the ##k## values of ##\log S(n)##
If you assume the ##\epsilon_n## error random variables are iid normal with st dev ##\sigma## you can use OLS regression on this to estimate ##a## and ##b##.

mfb · Jun 12, 2016

You cannot make a reliable estimate without more information. Suppose you have k=1000. and S(k) went nicely towards 0 so far. That doesn't tell you anything about k=1100, where it could be much larger, and only get smaller after k=10¹⁰ again. To guarantee anything, you need more information about y_k. Clearly they are not iid, otherwise the variance would not go down.

estebanox · Jun 15, 2016

mfb said:

You cannot make a reliable estimate without more information. Suppose you have k=1000. and S(k) went nicely towards 0 so far. That doesn't tell you anything about k=1100, where it could be much larger, and only get smaller after k=10¹⁰ again. To guarantee anything, you need more information about y_k. Clearly they are not iid, otherwise the variance would not go down.

@mfb: I think I left relevant information when trying to explain my question. Here is a new attempt

--------
Consider a set of random variables ##(X_1,X_2,X_3,...X_k)## that are i.i.d. ##Bernoulli(p)##

While I do not know ##p## I can estimate it using
$$
Y(k)=\frac{1}{k}\sum_{i=1}^k X_i
$$

Notice that ##Y(k)## is a random variable, and its distribution has mean ##p## and variance ##\frac{p(1-p)}{k}##. So ##Y(k)## is a consistent estimator of ##p##.

**Question:**
How can I determine the sample size that guarantees a minimum (arbitrary) 'degree of convergence' of the estimator? In other words, what is the point after which we can be confident that increasing the sample size from ##k## to ##k+1## will only yield a small change in our estimate of ##p## (for any ##p##)?

One idea I had was to look at the convergence of the variance of the sequence of estimators that is obtained by increasing ##k## gradually; that is
$$S(k)=\frac{1}{k}\sum_{i=1}^k (Y(i)-\bar{Y}(i))^2$$
for ##\bar{Y(k)}=\frac{1}{k}\sum_{i=1}^k Y(i)##

Numerically, I find that ##S(k)## converges to ##0## as expected; so perhaps what I need is a condition on ##S(k)##?

mfb · Jun 15, 2016

In your example, going from k to k+1 has the trivial upper limit of changing the estimate by 1/(k+1). If your p is very close to 0 or 1, those changes will happen to a very good approximation, so the upper bound is already optimal in some way.

If you have a Gaussian distribution, then you can use something like ##\frac{n \sigma}{k}## as (not strict) upper estimate where n depends on the fraction of changes you want to cover. Not sure if the k in the denominator is correct, or needs something like ##\sqrt{k(k+1)}##, but for large k it works.

Similar statements should be possible for all distributions if you have some prior knowledge about the distribution. You need some knowledge, otherwise very long tails in your distribution can ruin every estimate.

andrewkirk · Jun 15, 2016

estebanox said:

@mfb**Question:**
... what is the point after which we can be confident that increasing the sample size from ##k## to ##k+1## will only yield a small change in our estimate of ##p## (for any ##p##)?

It depends on how you measure the change in your estimate. If you measure it as (new estimate) minus (old estimate) then the problem is very easy, using what mfb points out.

But if you measure it as ((new estimate) divided by (old estimate)) minus 1 then you cannot place any limits on the size of the change because, for any k, it will always be possible that the first k observations of X were zero and the (k+1)th observation is 1, in which case the estimate of p will go from 0 to 1(k+1) in that step, which is an infinitely large proportional increase.

estebanox · Jun 15, 2016

If you measure it as (new estimate) minus (old estimate) then the problem is very easy, using what mfb points out.

@andrewkirk and @mfb :
I see... but given that I don't know the true value of ##p##, solving the problem assuming the upper bound seems an extreme assumption. My point is that it seems rather wasteful to assume the worst-case scenario to solve for ##k## ex-ante, rather than using information from the interim estimates that I obtain from sequentially increasing sample size...

andrewkirk · Jun 15, 2016

@estebanox It wasn't clear from the reformulated question in post #4 whether the bound was to be calculated ex-ante or ex-post. If it is ex-post then the answer for the differencing case remains the same but the answer for the ratio case changes to ##\frac1k## or ##\frac{1-Y(k)}{Y(k)(k-1)}## according to whether the ##(k+1)##th value is 1 or 0.

What prevents any greater accuracy than that is the word 'guarantees' in your question. Statistics deals with probabilities, not certainties, which means there are no guarantees. A more reasonable question, which might allow for a more helpful answer, would be something like:

'Given a tolerance ##\epsilon>0##, find a formula ##k## as a function of ##\epsilon## and the first ##k## values of ##X_i## (*), such that we can be at least 95% confident that for ##n\geq k## we will have ##\left|\frac{Y(n+1)}{Y(n)}-1\right|<\epsilon##?'

(*) To avoid circularity, we define ##k## as a function of ##\epsilon## and all values of ##X_i##, but require it to obey the additional condition that if the formula delivers result ##K## it makes use of no values of ##X_i## for ##i>K##. This specification is probably much more neatly expressed using conditional probabilities but I have to rush off now so don't have time to work out how to write it that way.

estebanox · Jun 16, 2016

andrewkirk said:

@estebanox

'Given a tolerance ##\epsilon>0##, find a formula ##k## as a function of ##\epsilon## and the first ##k## values of ##X_i## (*), such that we can be at least 95% confident that for ##n\geq k## we will have ##\left|\frac{Y(n+1)}{Y(n)}-1\right|<\epsilon##?'

Thanks @andrewkirk. Indeed, saying "guarantee" in my question was poor use of language – as a matter of fact I also say "In other words, what is the point after which we can be confident that increasing the sample size...". So your proposal to reformulate the problem is indeed helpful. It'd be great to hear any ideas as to what the answer to the reformulated problem might be (do you think it makes sense to post a new thread?)

andrewkirk · Jun 17, 2016

We can write a confidence interval for ##p## based on the first ##k## observations. Let ##B_{k,x}:[0,1]\to[0,1]## be the function such that ##B_{k,x}(y)## is the probability that a binomial random variable with ##k## trials and success probability ##y## is less than or equal to ##x##. Then a two-sided ##1-\alpha## confidence interval jas lower bound
$$L=B_{k,Y(k)}{}^{-1}\left(\frac\alpha2\right)$$
and upper bound
$$U=B_{k,Y(k)}{}^{-1}\left(1-\frac\alpha2\right)$$

micromass · Jun 17, 2016

https://en.wikipedia.org/wiki/Law_of_the_iterated_logarithm

For any distribution which has zero mean and unit variance, if you take means ##S_n/n## and increase the sample size, then the bound you are looking for is
[tex]\frac{\sqrt{2n\log\log(n)}}{n}[/tex]
in the following sense: it has been proven mathematically that only finitely many values for ##n## will have the property that ##S_n/n## will surpass this bound, while infinitely many values will still come close to this bound.

For the means of a Bernouilli variable, this means that the bound for ##|S_n/n - p|## is ##p(1-p)\frac{\sqrt{2n\log\log(n)}}{n}##. This is maximized for the worst case scenario of ##p = 1/2##, which gives us a bound of
[tex]\frac{\sqrt{2n\log\log(n)}}{4n}[/tex]

estebanox · Jun 29, 2016

micromass said:

For the means of a Bernouilli variable, this means that the bound for ##|S_n/n - p|## is ##p(1-p)\frac{\sqrt{2n\log\log(n)}}{n}##. This is maximized for the worst case scenario of ##p = 1/2##, which gives us a bound of
[tex]\frac{\sqrt{2n\log\log(n)}}{4n}[/tex]

Thanks for this @micromass – very helpful!

estebanox · Sep 2, 2016

@micromass: I have continued working on a related problem – I have posted a new thread here. Would be great to hear your thoughts!

Measuring the degree of convergence of a stochastic process

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect