Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Variably spaced data

  1. Dec 18, 2009 #1
    Hi all,

    I'm looking to generate a series of integers in the range [1,1000], that start off with a large spacing, like 1 50 98, etc. At a peak of some distribution, they should be spaced with some minimal [tex]\Delta[/tex] spacing. For example if the peak is at 500 and [tex]\Delta=2[/tex], you might see ... 495, 498, 500, 502, 504, ... Then at the other end of the [1,1000] window, it's back to big spacing 950 999. I'd like the number of integers found in a sliding window of some size in this interval to follow some bell-shaped distribution, but with a constraint on the minimum spacing between numbers.

    I get the feeling this is somehow related to a poisson process, so that's why I'm posting here, but that's about it.

    Suggestions to bump me in the right direction?
    Last edited: Dec 18, 2009
  2. jcsd
  3. Dec 18, 2009 #2
    If you have access to Minitab (or some other similar package with random number generators) you can specify the parameters for a random normal distribution including kurtosis and a minimal interval between numbers.
    Last edited: Dec 18, 2009
  4. Dec 18, 2009 #3
    But can you make that interval follow some distribution, but with a minimum interval? I have matlab, and I could easily obtain R, but I'm looking for some math behind it, really. Maybe this is more of an algorithmic thing than a statistics thing...
  5. Dec 18, 2009 #4
    I haven't used Minitab for a while, but I believe you can write a short program (algorithm) such that [tex]x_{i+1}-x_{i} \geq 2 [/tex] before invoking the random normal generator.

    EDIT: If this can't be done, the OP can run this algorithm over the generated sequence to purge it of consecutive integers and interpolate if necessary. The OP did not specify a particular distribution, only a result.
    Last edited: Dec 18, 2009
  6. Dec 18, 2009 #5
    One way is to set [tex]x_{i+1}=x_i+\left\lceil c/f(x_i)\right\rceil[/tex] where f(x) is some density function - say it's normal, so choose mu=500 then choose c so that the gap is 2 in the middle then choose sigma to specify the gaps at x=1. If you want the gaps to be more random then simply add or multiply by a random number from some appropriate distribution.

  7. Dec 18, 2009 #6
    The OP is apparently thinking of a discrete distribution (only integers in this case) so f(x) is really a normal approximation. I don't understand how one chooses "c" to assure that the result is a non-consecutive integer.
    Last edited: Dec 18, 2009
  8. Dec 18, 2009 #7
    no it doesn't but the ceiling function [tex]\left\lceil\cdot\right\rceil[/tex] does :)
  9. Dec 18, 2009 #8
    cei-ing the c/f(x) does an adequate job for my purposes, so I'm set for now, but I guess I'm still curious about this problem in general.
    A motivation: Given some bus station has buses arriving throughout the day (or here, seconds 1 to 1000). About noon (500), there are a lot of buses, whereas at midnight they're few and far between. I'm trying to generate these times, but the bus station has a capacity, so I have to have a minimum spacing between arrival times in my generated numbers.
    So the ceil(c/f(x)) works for now. It's just tricky finding a sigma that works for some input window, peak, and capacity without trial and error...

    As an alternative, would something like this work:
    with [tex]x_2 >x_1 + \delta[/tex], then control [tex]\delta[/tex] with some distribution?
    Last edited: Dec 18, 2009
  10. Dec 18, 2009 #9
    At some places I've lived at the buses tended to arrive in groups - the first one crammed full and some empty ones stuck behind (not that that helps the model).

    You might want to take other peak times into account, such as 8am & 6pm, so maybe a multipeaked or piecewise constant or spline shape might be more appropriate as the gap function - then instead of a nonlinear inverse density problem you've just got to choose some interpolation coefficients to match the shape of the desired gap rate function.
  11. Dec 19, 2009 #10
    I haven't gotten the [tex]x_2=x_1+\delta[/tex] to work yet, but I like your [tex]c/f(x)[/tex] suggestion since it can work for multiple peaks, as you say. I can get 2 equally-weighted peaks to work fairly well. I know this isn't a coding forum, but maybe this matlab can help someone
    Code (Text):

    maxVal=1000; %interval from 1 to 1000
    minSpacing=2; %min spacing 2
    maxSpacing=30; %max spacing 30
    peakvec=[200 800]; %location of double peaks
    weights=[1 1];

    %build up a function to minimize
    for i=1:length(peakvec)
       s=[s '+' num2str(weights(i)) '*normpdf([' num2str(1:maxVal) '],' num2str(peakvec(i)) ',sigmavec(' num2str(i) '))'];
    pdf = @(sigma) func(sigma)/max(func(sigma));
    minpdf = @(sigma) abs(max(ceil(minSpacing./pdf(sigma))) - maxSpacing);
    %minimizing minpdf should result in a pair of sigmas that keep maxSpaxing

    sigma_0 = repmat(20,size(peakvec));
    sigmavec = fminsearch(minpdf,sigma_0); %,repmat(0,size(peakvec)),[]) %the second part if you have fminsearchbnd
    disp(['"Best" sigma chosen as ' num2str(sigmavec)]);

    %build up a new pdf to generate x(i+1) from x(i)
    for i=1:length(peakvec)
       s=[s '+' num2str(weights(i)) '*normpdf(x,' num2str(peakvec(i)) ',' num2str(sigmavec(i)) ')'];

    pdf = @(x) func(x)./max(func([1:maxVal]));

    %plot what it looks like
    title('distribution of buses');

    %find the actual maxSpacing,minSpacing
    maxseen=0; minseen=inf;
    maxseenat=[]; minseenat=[];
    while x(i)<=maxVal
        if x(i+1)-x(i)>maxseen
            maxseenat=[maxseenat; i];
        elseif x(i+1)-x(i) < minseen
            minseen =x(i+1)-x(i);
            minseenat=[minseenat; i];

    disp(['Maximum spacing seen: ' num2str(maxseen) ' at position ' num2str(maxseenat')]);
    disp(['Minimum spacing seen: ' num2str(minseen) ' at position ' num2str(minseenat')]);
    title('bus times')
    Now back to stats. Add a third peak in the above peakvec and the distribution of bus times isn't even. Say I want n peaks, weighted with [tex]w_1, w_2, \ldots w_n[/tex]. What would my f(x) look like? If I just sum them as [tex]w_1N(\mu_1,\sigma_1) + ... + w_nN(\mu_n,\sigma_n)[/tex], my least squares business above doesn't work. Peaks are arbitrarily scaled as far as I can tell. Do I have to deal with covariance matrices somehow to make the amount of buses arriving in a peak time weighted as I wish? Also, this is insufficient for large intervals (like 60*60*24 seconds instead of 1000)

    ps. to be honest this has nothing to do with bus times but it's a fun analogy :]
    Last edited: Dec 19, 2009
  12. Dec 20, 2009 #11
    You might like to try the Shape Language Modeling download on Matlab Central, which might give you a simpler way of fitting the gap function. In particular there's no need to restrict yourself to inverted probability densities. Also avoid using strings/inlines to define functions - in your case it should be possible with something like @(x)sum(arrayfun(...)).

  13. Dec 22, 2009 #12
    I think I would need something like cellfun, but my version of matlab doesn't allow anonymous functions in cellfun, just built-in functions, so that won't work. But anyways, that's all fine and good.

    Followup question:
    Is there a term or subject or something I can look into to describe the distribution of an iterative sequence like the one you offered?
    x_{i+1}=x_i+\left\lceil c/f(x_i)\right\rceil
    but with f(x) general, not specifically a normal? If the ceil() complicates it, real numbers are fine too.
  14. Dec 22, 2009 #13
    Aside from what the name of those iterative distributions might be, here's another question.

    I'm trying to actually find the distribution of [tex]\max(c/f(x))[/tex] so I can fit it, where f(x) is a [tex]N(\mu,\sigma^2)[/tex]. If I have more peaks I can just sum up my mus and sigmas, so this seems general enough.
    So I would want to find the nth order statistic of c/f(x).
    So X is a r.v. from this normal, and let Z=c/X.
    f_Z(z) = \frac{C}{z^2 \sqrt{2\pi\sigma^2}} \exp\left\{\frac{-1}{2\sigma^2 z^2}(C-z\mu)^2\right\} \;\;z\ne 0 [/tex]
    F_z(z) =\begin{cases}
    -\frac{1}{2}\left[1+\mathrm{erf}\left( \frac{C - z\mu}{z\sqrt{2\sigma^2}}\right)\right]&\text{if } z<0 \\
    \frac{1}{2}&\text{if } z=0\\
    \frac{1}{2}\left[1-\mathrm{erf}\left( \frac{C - z\mu}{z\sqrt{2\sigma^2}}\right)\right]&\text{if } z>0\end{cases}
    f(z_{(n)}) = n f(z_{(n)}) \left[F(z_{(n)})\right]^{n-1}
    E(z_{(n)}) = n \left(-\frac{1}{2}\right)^{n-1} \int\limits_{-\infty}^0 \frac{C}{z_{(n)} \sqrt{2\pi\sigma^2}} \exp\left\{\frac{-1}{2\sigma^2 z_{(n)}^2}(C-z\mu)^2\right\}\left[1+\mathrm{erf}\left( \frac{C - z_{(n)}\mu}{z_{(n)}\sqrt{2\sigma^2}}\right) \right]^{n-1} dz_{(n)}
    \;\;\;\;\;+ n \left(\frac{1}{2}\right)^{n-1} \int\limits_{0}^\infty \frac{C}{z_{(n)} \sqrt{2\pi\sigma^2}} \exp\left\{\frac{-1}{2\sigma^2 z_{(n)}^2}(C-z\mu)^2\right\}\left[1-\mathrm{erf}\left( \frac{C - z_{(n)}\mu}{z_{(n)}\sqrt{2\sigma^2}}\right) \right]^{n-1} dz_{(n)}
    Let [tex]u=\frac{C - z_{(n)}\mu}{z_{(n)}\sqrt{2\sigma^2}}[/tex], so [tex] dz_{(n)}= \frac{-z_{(n)}^2\sqrt{2\sigma^2}}{C}du[/tex]
    E(z_{(n)}) = -Cn \left(-\frac{1}{2}\right)^{n-1} \int\limits_{-\mu/2\sigma^2}^\infty \frac{1}{\sqrt{2\pi\sigma^2}u+\sqrt{\pi}\mu} \exp^{-u^2}\left[1+\mathrm{erf}\left( u\right) \right]^{n-1} du
    +Cn \left(\frac{1}{2}\right)^{n-1} \int\limits_{-\mu/2\sigma^2}^{\infty} \frac{1}{\sqrt{2\pi\sigma^2}u+\sqrt{\pi}\mu} \exp^{-u^2}\left[1-\mathrm{erf}\left( u\right) \right]^{n-1} du
    Any insight into this integral? :bugeye::rolleyes:

    and finally, if the probabilities for a multinomial arrive from n normals, is there a distribution for the result? Or do you first have to calculate the normal probabilities and plug in?
    Last edited: Dec 22, 2009
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook