MHB Estimating result of an iterative operation involving random number

AI Thread Summary
The discussion revolves around estimating the result of an iterative operation involving random numbers, specifically the equation x(t) = x(t-1) + rand(m..n). A proposed approximation for large iterations is x(t) ≈ x(0) + t(m+n)/2, which works well for certain ranges but fails when the random variable range includes negative values. Adjustments to the estimation method are suggested when the operation includes a variable k that affects the random range in each iteration, leading to a new estimate of x(t) ≈ x(0) due to the mean remaining zero. The original poster seeks to optimize a simulation engine by reducing the number of iterations needed, proposing a method that involves calculating mean and standard deviation of errors from simulations to improve estimates. The community encourages experimentation with different approaches to find the most effective solution.
jarhead70
Messages
4
Reaction score
0
Hi,

Suppose there's an operation like the following:

x(t) = x(t-1) + rand(m..n)

with
x(t) = current value
x(t-1) = previous value
rand(m..n) = equally distributed random real number in the range of m and n

is there a way to estimate the value of x(t) after Y iteration without actually doing the iteration Y times, assuming Y is larger than 1000?

Thanks
 
Mathematics news on Phys.org
l would think that for a large number of iterations, we could reasonably approximate the solution with:

$$x(t)\approx x(0)+t\frac{m+n}{2}$$

In other words, there should be about as many values above the arithmetic mean of the range of random numbers as below.

edit: I ran some simulations, with $x(0)=0,\,m=1,\,n=9$:

[table="width: 300, class: grid"]
[tr]
[td]$Y$[/td]
[td]Estimate[/td]
[td]Actual[/td]
[/tr]
[tr]
[td]1000[/td]
[td]5000[/td]
[td]4952.42945031[/td]
[/tr]
[tr]
[td]10000[/td]
[td]50000[/td]
[td]50297.7891738[/td]
[/tr]
[tr]
[td]100000[/td]
[td]500000[/td]
[td]499801.102515[/td]
[/tr]
[tr]
[td]1000000[/td]
[td]5000000[/td]
[td]5000437.34467[/td]
[/tr]
[/table]
 
MarkFL said:
l would think that for a large number of iterations, we could reasonably approximate the solution with:

$$x(t)\approx x(0)+t\frac{m+n}{2}$$

In other words, there should be about as many values above the arithmetic mean of the range of random numbers as below.

edit: I ran some simulations, with $x(0)=0,\,m=1,\,n=9$:
...
Thanks for your reply.

However, your solution fails when the range of the random variable is $-m \le r \le m$

Sample simulations:
$x(0)=0,\,m=-1,\,n=1$
[table="width: 300, class: grid"]
[tr]
[td]$Y$[/td]
[td]Estimate[/td]
[td]Actual[/td]
[/tr]
[tr]
[td]1000[/td]
[td]0.0[/td]
[td]-12.91132354196654[/td]
[/tr]
[tr]
[td]10000[/td]
[td]0.0[/td]
[td]88.76996209606638[/td]
[/tr]
[tr]
[td]100000[/td]
[td]0.0[/td]
[td]-214.62703168619245[/td]
[/tr]
[tr]
[td]1000000[/td]
[td]0.0[/td]
[td]158.93316829030073[/td]
[/tr]
[/table]$x(0)=0,\,m=-9,\,n=9$
[table="width: 300, class: grid"]
[tr]
[td]$Y$[/td]
[td]Estimate[/td]
[td]Actual[/td]
[/tr]
[tr]
[td]1000[/td]
[td]0.0[/td]
[td]-28.623493620088897[/td]
[/tr]
[tr]
[td]10000[/td]
[td]0.0[/td]
[td]-504.7370383329392[/td]
[/tr]
[tr]
[td]100000[/td]
[td]0.0[/td]
[td]412.8417707372953[/td]
[/tr]
[tr]
[td]1000000[/td]
[td]0.0[/td]
[td]2313.431324375944[/td]
[/tr]
[/table]

For other random variable ranges, your solution does give an acceptable estimate. Are there a way to make the solution more general, or does the above case require special handling?

Thanks
 
I think the absolute difference between the estimate and actual values will be comparable, it just looks worse when the estimate is small.
 
MarkFL said:
I think the absolute difference between the estimate and actual values will be comparable, it just looks worse when the estimate is small.

Just making sure I'm getting it right, if error is defined as the absolute difference between the estimate value and the actual value (the result of doing the actual iterations), the error will become larger as the estimate value become smaller, or put it another way, error is inversely related to estimate value. Is this right?

And from my sample simulation, is it also safe to say that in addition to error being inversely related to estimate value, it is also linearly related to the number of iterations?

Thanks for your illuminating post, but unfortunately after re-examining the codes it seemed that I slightly misunderstood the problem. While the underlying operation is still the same, there's another detail of the operation that I miss.

The actual code for each iteration is the following:

$n = x_{t-1} \times k, (k \in real, 0 \le k \le 1)$
$m = -n$
$x_t = x_{t-1} + rand(m..n)$

The only thing immutable in each iteration is $k$. As you can see, the random value range change in each iteration, cos $x_t$ from previous iteration become $x_{t-1}$ in the current one.

Does the estimate solution still hold in this case?

Thanks
PS: the sample simulation I posted earlier was using the original operation, the one without variable $k$ introduced
 
The error should only vary linearly with the number of iterations, and with the difference between $m$ and $n$, and not depend on the estimate. I have run a series of simulations with 100000 iterations each, where in one we have $(m,n)=(1,9)$ and another where $(m,n)=(-4,4)$. So, in the first case, our estimate is 500000, while in the second case our estimate is 0:

[table="width: 500, class: grid"]
[tr]
[td]Trial[/td]
[td]Value 1[/td]
[td]Error 1[/td]
[td]Value 2[/td]
[td]Error 2[/td]
[/tr]
[tr]
[td]1[/td]
[td]499400.521646[/td]
[td]599.478354153[/td]
[td]478.231223299[/td]
[td]478.231223299[/td]
[/tr]
[tr]
[td]2[/td]
[td]500512.517299[/td]
[td]512.517299246[/td]
[td]184.730949398[/td]
[td]184.730949398[/td]
[/tr]
[tr]
[td]3[/td]
[td]499101.401411[/td]
[td]898.598589477[/td]
[td]382.213193408[/td]
[td]382.213193408[/td]
[/tr]
[tr]
[td]4[/td]
[td]499573.912897[/td]
[td]426.087102908[/td]
[td]1750.90938973[/td]
[td]1750.90938973[/td]
[/tr]
[tr]
[td]5[/td]
[td]499045.029516[/td]
[td]954.970484013[/td]
[td]-36.4236090278[/td]
[td]36.4236090278[/td]
[/tr]
[tr]
[td]6[/td]
[td]499662.20499[/td]
[td]337.795009887[/td]
[td]-1261.48426344[/td]
[td]1261.48426344[/td]
[/tr]
[tr]
[td]7[/td]
[td]499133.838544[/td]
[td]866.16145649[/td]
[td]-337.858854127[/td]
[td]337.858854127[/td]
[/tr]
[tr]
[td]8[/td]
[td]499269.888445[/td]
[td]730.111554581[/td]
[td]-2670.99289262[/td]
[td]2670.99289262[/td]
[/tr]
[tr]
[td]9[/td]
[td]500014.594257[/td]
[td]14.5942568078[/td]
[td]78.3711088367[/td]
[td]78.3711088367[/td]
[/tr]
[tr]
[td]10[/td]
[td]499827.472268[/td]
[td]172.527732044[/td]
[td]-948.894046287[/td]
[td]948.894046287[/td]
[/tr]
[/table]

In the new recursion, because the mean of $m$ and $n$ is always the same (zero), I would say a good estimate would be:

$$x(t)\approx x(0)$$
 
MarkFL said:
The error should only vary linearly with the number of iterations, and with the difference between $m$ and $n$, and not depend on the estimate. I have run a series of simulations with 100000 iterations each, where in one we have $(m,n)=(1,9)$ and another where $(m,n)=(-4,4)$. So, in the first case, our estimate is 500000, while in the second case our estimate is 0:
...
In the new recursion, because the mean of $m$ and $n$ is always the same (zero), I would say a good estimate would be:

$$x(t)\approx x(0)$$

Thanks very much for your help in this.

The original purpose of my question is to optimize a hobbyist simulation engine I'm tweaking. The original engine has several hundred of such iterative operations done in each frame, each with their own set of parameters. I am trying to reduce the workload of each frame.

I figure, if there's a way to simulate the result of the iterative with one operation, then instead of doing millions of iterations per frame, I can get away with doing several hundred instead.

On average, the number of iterations of each operations in one frame is roughly 10000. Seeing the simulation result, the error can get very large. So I'm considering these steps to "improve" the estimate:

  1. Run simulations using your equation a large number of times for one set of parameters, then calculate the mean and standard deviation of the absolute error
  2. Run #1 for another set of parameters, differ incrementally from the one used in #1, and again calculate the mean and standard deviation. This will give a set of mean and standard deviation
  3. Calculate linear regression parameters for mean and standard deviation data from #2
After getting the regression parameters for mean and standard deviation of the errors, the engine will estimate the operation at runtime by

  1. Get the estimate for a parameter set using your equation
  2. Use the same parameters to choose a mean and standard deviation using the regression parameters calculated before
  3. Use the mean and standard deviation calculated in #2 to generate a number using a random number generator with gaussian distribution
  4. Add the result from #1 with the result from #3 to give the final estimate

Please tell me what you think of such approach.

Thanks
 
Just off the top of my head, it would seem that you could simply use for your estimate a random number (with gaussian or normal distribution) where the mean is the estimate we have been using and make it so that 6 sigmas above and below this mean are the bounds.

I would encourage you to try your method first, see what you get, and let us know if this works well for you. :)
 
Back
Top