Variably spaced data

Hi all,

I'm looking to generate a series of integers in the range [1,1000], that start off with a large spacing, like 1 50 98, etc. At a peak of some distribution, they should be spaced with some minimal [tex]\Delta[/tex] spacing. For example if the peak is at 500 and [tex]\Delta=2[/tex], you might see ... 495, 498, 500, 502, 504, ... Then at the other end of the [1,1000] window, it's back to big spacing 950 999. I'd like the number of integers found in a sliding window of some size in this interval to follow some bell-shaped distribution, but with a constraint on the minimum spacing between numbers.

I get the feeling this is somehow related to a poisson process, so that's why I'm posting here, but that's about it.

Suggestions to bump me in the right direction?
 
Last edited:
2,033
78
If you have access to Minitab (or some other similar package with random number generators) you can specify the parameters for a random normal distribution including kurtosis and a minimal interval between numbers.
 
Last edited:
But can you make that interval follow some distribution, but with a minimum interval? I have matlab, and I could easily obtain R, but I'm looking for some math behind it, really. Maybe this is more of an algorithmic thing than a statistics thing...
 
2,033
78
But can you make that interval follow some distribution, but with a minimum interval? I have matlab, and I could easily obtain R, but I'm looking for some math behind it, really. Maybe this is more of an algorithmic thing than a statistics thing...
I haven't used Minitab for a while, but I believe you can write a short program (algorithm) such that [tex]x_{i+1}-x_{i} \geq 2 [/tex] before invoking the random normal generator.

EDIT: If this can't be done, the OP can run this algorithm over the generated sequence to purge it of consecutive integers and interpolate if necessary. The OP did not specify a particular distribution, only a result.
 
Last edited:
525
5
Hi all,

I'm looking to generate a series of integers in the range [1,1000], that start off with a large spacing, like 1 50 98, etc. At a peak of some distribution, they should be spaced with some minimal [tex]\Delta[/tex] spacing. For example if the peak is at 500 and [tex]\Delta=2[/tex], you might see ... 495, 498, 500, 502, 504, ... Then at the other end of the [1,1000] window, it's back to big spacing 950 999. I'd like the number of integers found in a sliding window of some size in this interval to follow some bell-shaped distribution, but with a constraint on the minimum spacing between numbers.

I get the feeling this is somehow related to a poisson process, so that's why I'm posting here, but that's about it.

Suggestions to bump me in the right direction?
One way is to set [tex]x_{i+1}=x_i+\left\lceil c/f(x_i)\right\rceil[/tex] where f(x) is some density function - say it's normal, so choose mu=500 then choose c so that the gap is 2 in the middle then choose sigma to specify the gaps at x=1. If you want the gaps to be more random then simply add or multiply by a random number from some appropriate distribution.

HTH
 
2,033
78
One way is to set [tex]x_{i+1}=x_i+\left\lceil c/f(x_i)\right\rceil[/tex] where f(x) is some density function - say it's normal, so choose mu=500 then choose c so that the gap is 2 in the middle then choose sigma to specify the gaps at x=1.
The OP is apparently thinking of a discrete distribution (only integers in this case) so f(x) is really a normal approximation. I don't understand how one chooses "c" to assure that the result is a non-consecutive integer.
 
Last edited:
525
5
I don't understand how one chooses "c" to assure that the result is a non-consecutive integer.
no it doesn't but the ceiling function [tex]\left\lceil\cdot\right\rceil[/tex] does :)
 
cei-ing the c/f(x) does an adequate job for my purposes, so I'm set for now, but I guess I'm still curious about this problem in general.
A motivation: Given some bus station has buses arriving throughout the day (or here, seconds 1 to 1000). About noon (500), there are a lot of buses, whereas at midnight they're few and far between. I'm trying to generate these times, but the bus station has a capacity, so I have to have a minimum spacing between arrival times in my generated numbers.
So the ceil(c/f(x)) works for now. It's just tricky finding a sigma that works for some input window, peak, and capacity without trial and error...

As an alternative, would something like this work:
http://www.mathpages.com/home/kmath452.htm
with [tex]x_2 >x_1 + \delta[/tex], then control [tex]\delta[/tex] with some distribution?
 
Last edited:
525
5
A motivation: Given some bus station has buses arriving throughout the day (or here, seconds 1 to 1000). About noon (500), there are a lot of buses, whereas at midnight they're few and far between. I'm trying to generate these times, but the bus station has a capacity, so I have to have a minimum spacing between arrival times in my generated numbers.
At some places I've lived at the buses tended to arrive in groups - the first one crammed full and some empty ones stuck behind (not that that helps the model).

You might want to take other peak times into account, such as 8am & 6pm, so maybe a multipeaked or piecewise constant or spline shape might be more appropriate as the gap function - then instead of a nonlinear inverse density problem you've just got to choose some interpolation coefficients to match the shape of the desired gap rate function.
 
I haven't gotten the [tex]x_2=x_1+\delta[/tex] to work yet, but I like your [tex]c/f(x)[/tex] suggestion since it can work for multiple peaks, as you say. I can get 2 equally-weighted peaks to work fairly well. I know this isn't a coding forum, but maybe this matlab can help someone
Code:
clc
maxVal=1000; %interval from 1 to 1000
minSpacing=2; %min spacing 2
maxSpacing=30; %max spacing 30
peakvec=[200 800]; %location of double peaks
weights=[1 1];

%build up a function to minimize
s='';
for i=1:length(peakvec)
   s=[s '+' num2str(weights(i)) '*normpdf([' num2str(1:maxVal) '],' num2str(peakvec(i)) ',sigmavec(' num2str(i) '))'];
end
s(1)=[];
func=inline(s,'sigmavec');
pdf = @(sigma) func(sigma)/max(func(sigma));
minpdf = @(sigma) abs(max(ceil(minSpacing./pdf(sigma))) - maxSpacing);
%minimizing minpdf should result in a pair of sigmas that keep maxSpaxing

sigma_0 = repmat(20,size(peakvec));
warning('off','MATLAB:divideByZero');
sigmavec = fminsearch(minpdf,sigma_0); %,repmat(0,size(peakvec)),[]) %the second part if you have fminsearchbnd
warning('on','MATLAB:divideByZero');
disp(['"Best" sigma chosen as ' num2str(sigmavec)]);

%build up a new pdf to generate x(i+1) from x(i)
s='';
for i=1:length(peakvec)
   s=[s '+' num2str(weights(i)) '*normpdf(x,' num2str(peakvec(i)) ',' num2str(sigmavec(i)) ')'];
end
s(1)=[];

func=inline(s,'x');
pdf = @(x) func(x)./max(func([1:maxVal]));

%plot what it looks like
figure
subplot(2,1,1);
plot([1:maxVal],pdf(1:maxVal));
title('distribution of buses');

%find the actual maxSpacing,minSpacing
maxseen=0; minseen=inf;
maxseenat=[]; minseenat=[];
x(1)=1;
i=1;
while x(i)<=maxVal
    x(i+1)=x(i)+ceil(minSpacing/pdf(x(i)));
    if x(i+1)-x(i)>maxseen
        maxseen=x(i+1)-x(i);
        maxseenat=[maxseenat; i];
    elseif x(i+1)-x(i) < minseen
        minseen =x(i+1)-x(i);
        minseenat=[minseenat; i];
    end
    i=i+1;
end
x(x>maxVal)=[];

disp(['Maximum spacing seen: ' num2str(maxseen) ' at position ' num2str(maxseenat')]);
disp(['Minimum spacing seen: ' num2str(minseen) ' at position ' num2str(minseenat')]);
subplot(2,1,2);
plot(x,1);
title('bus times')
Now back to stats. Add a third peak in the above peakvec and the distribution of bus times isn't even. Say I want n peaks, weighted with [tex]w_1, w_2, \ldots w_n[/tex]. What would my f(x) look like? If I just sum them as [tex]w_1N(\mu_1,\sigma_1) + ... + w_nN(\mu_n,\sigma_n)[/tex], my least squares business above doesn't work. Peaks are arbitrarily scaled as far as I can tell. Do I have to deal with covariance matrices somehow to make the amount of buses arriving in a peak time weighted as I wish? Also, this is insufficient for large intervals (like 60*60*24 seconds instead of 1000)

ps. to be honest this has nothing to do with bus times but it's a fun analogy :]
 
Last edited:
525
5
You might like to try the Shape Language Modeling download on Matlab Central, which might give you a simpler way of fitting the gap function. In particular there's no need to restrict yourself to inverted probability densities. Also avoid using strings/inlines to define functions - in your case it should be possible with something like @(x)sum(arrayfun(...)).

HTH
 
I think I would need something like cellfun, but my version of matlab doesn't allow anonymous functions in cellfun, just built-in functions, so that won't work. But anyways, that's all fine and good.

Followup question:
Is there a term or subject or something I can look into to describe the distribution of an iterative sequence like the one you offered?
[tex]
x_{i+1}=x_i+\left\lceil c/f(x_i)\right\rceil
[/tex]
but with f(x) general, not specifically a normal? If the ceil() complicates it, real numbers are fine too.
 
Aside from what the name of those iterative distributions might be, here's another question.

I'm trying to actually find the distribution of [tex]\max(c/f(x))[/tex] so I can fit it, where f(x) is a [tex]N(\mu,\sigma^2)[/tex]. If I have more peaks I can just sum up my mus and sigmas, so this seems general enough.
So I would want to find the nth order statistic of c/f(x).
So X is a r.v. from this normal, and let Z=c/X.
[tex]
f_Z(z) = \frac{C}{z^2 \sqrt{2\pi\sigma^2}} \exp\left\{\frac{-1}{2\sigma^2 z^2}(C-z\mu)^2\right\} \;\;z\ne 0 [/tex]
[tex]
F_z(z) =\begin{cases}
-\frac{1}{2}\left[1+\mathrm{erf}\left( \frac{C - z\mu}{z\sqrt{2\sigma^2}}\right)\right]&\text{if } z<0 \\
\frac{1}{2}&\text{if } z=0\\
\frac{1}{2}\left[1-\mathrm{erf}\left( \frac{C - z\mu}{z\sqrt{2\sigma^2}}\right)\right]&\text{if } z>0\end{cases}
[/tex]
[tex]
f(z_{(n)}) = n f(z_{(n)}) \left[F(z_{(n)})\right]^{n-1}
[/tex]
[tex]
E(z_{(n)}) = n \left(-\frac{1}{2}\right)^{n-1} \int\limits_{-\infty}^0 \frac{C}{z_{(n)} \sqrt{2\pi\sigma^2}} \exp\left\{\frac{-1}{2\sigma^2 z_{(n)}^2}(C-z\mu)^2\right\}\left[1+\mathrm{erf}\left( \frac{C - z_{(n)}\mu}{z_{(n)}\sqrt{2\sigma^2}}\right) \right]^{n-1} dz_{(n)}
[/tex]
[tex]
\;\;\;\;\;+ n \left(\frac{1}{2}\right)^{n-1} \int\limits_{0}^\infty \frac{C}{z_{(n)} \sqrt{2\pi\sigma^2}} \exp\left\{\frac{-1}{2\sigma^2 z_{(n)}^2}(C-z\mu)^2\right\}\left[1-\mathrm{erf}\left( \frac{C - z_{(n)}\mu}{z_{(n)}\sqrt{2\sigma^2}}\right) \right]^{n-1} dz_{(n)}
[/tex]
Let [tex]u=\frac{C - z_{(n)}\mu}{z_{(n)}\sqrt{2\sigma^2}}[/tex], so [tex] dz_{(n)}= \frac{-z_{(n)}^2\sqrt{2\sigma^2}}{C}du[/tex]
[tex]
E(z_{(n)}) = -Cn \left(-\frac{1}{2}\right)^{n-1} \int\limits_{-\mu/2\sigma^2}^\infty \frac{1}{\sqrt{2\pi\sigma^2}u+\sqrt{\pi}\mu} \exp^{-u^2}\left[1+\mathrm{erf}\left( u\right) \right]^{n-1} du
+Cn \left(\frac{1}{2}\right)^{n-1} \int\limits_{-\mu/2\sigma^2}^{\infty} \frac{1}{\sqrt{2\pi\sigma^2}u+\sqrt{\pi}\mu} \exp^{-u^2}\left[1-\mathrm{erf}\left( u\right) \right]^{n-1} du
[/tex]
Any insight into this integral? :bugeye::rolleyes:

and finally, if the probabilities for a multinomial arrive from n normals, is there a distribution for the result? Or do you first have to calculate the normal probabilities and plug in?
 
Last edited:

The Physics Forums Way

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving
Top