Variably Spaced Data: Generating Integer Series with Minimal Spacing

  • Context: Undergrad 
  • Thread starter Thread starter coolnessitself
  • Start date Start date
  • Tags Tags
    Data
Click For Summary

Discussion Overview

The discussion revolves around generating a series of integers within the range [1,1000] that exhibit variable spacing, starting with large gaps, transitioning to minimal spacing around a peak, and then returning to larger gaps. Participants explore the mathematical and algorithmic approaches to achieve this while adhering to specific distribution characteristics and spacing constraints.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant suggests generating integers with a large initial spacing that transitions to minimal spacing at a peak, potentially related to a Poisson process.
  • Another participant proposes using Minitab or similar software to specify parameters for a random normal distribution, including kurtosis and minimum intervals.
  • There is a question about whether the interval can follow a distribution while maintaining a minimum spacing, with some participants indicating that this might require an algorithmic approach rather than purely statistical methods.
  • A method is proposed involving the ceiling function to ensure non-consecutive integers, with a focus on using a density function to control spacing.
  • One participant shares a MATLAB code snippet aimed at generating the desired integer series, discussing the challenges of finding appropriate parameters for the distribution.
  • Another participant raises the idea of using multiple peaks in the distribution and questions how to manage weights and scaling for these peaks effectively.

Areas of Agreement / Disagreement

Participants express various methods and ideas for generating the integer series, but there is no consensus on a single approach. Multiple competing views remain regarding the best mathematical or algorithmic strategies to achieve the desired outcome.

Contextual Notes

Participants note the complexity of ensuring that the generated integers adhere to both the desired distribution and the minimum spacing requirements. There are unresolved questions about how to effectively manage multiple peaks and the implications of using different statistical methods.

Who May Find This Useful

This discussion may be useful for those interested in mathematical modeling, algorithm development, or statistical analysis, particularly in contexts where variable spacing and distribution characteristics are important.

coolnessitself
Messages
29
Reaction score
0
Hi all,

I'm looking to generate a series of integers in the range [1,1000], that start off with a large spacing, like 1 50 98, etc. At a peak of some distribution, they should be spaced with some minimal [tex]\Delta[/tex] spacing. For example if the peak is at 500 and [tex]\Delta=2[/tex], you might see ... 495, 498, 500, 502, 504, ... Then at the other end of the [1,1000] window, it's back to big spacing 950 999. I'd like the number of integers found in a sliding window of some size in this interval to follow some bell-shaped distribution, but with a constraint on the minimum spacing between numbers.

I get the feeling this is somehow related to a poisson process, so that's why I'm posting here, but that's about it.

Suggestions to bump me in the right direction?
 
Last edited:
Physics news on Phys.org
If you have access to Minitab (or some other similar package with random number generators) you can specify the parameters for a random normal distribution including kurtosis and a minimal interval between numbers.
 
Last edited:
But can you make that interval follow some distribution, but with a minimum interval? I have matlab, and I could easily obtain R, but I'm looking for some math behind it, really. Maybe this is more of an algorithmic thing than a statistics thing...
 
coolnessitself said:
But can you make that interval follow some distribution, but with a minimum interval? I have matlab, and I could easily obtain R, but I'm looking for some math behind it, really. Maybe this is more of an algorithmic thing than a statistics thing...

I haven't used Minitab for a while, but I believe you can write a short program (algorithm) such that [tex]x_{i+1}-x_{i} \geq 2[/tex] before invoking the random normal generator.

EDIT: If this can't be done, the OP can run this algorithm over the generated sequence to purge it of consecutive integers and interpolate if necessary. The OP did not specify a particular distribution, only a result.
 
Last edited:
coolnessitself said:
Hi all,

I'm looking to generate a series of integers in the range [1,1000], that start off with a large spacing, like 1 50 98, etc. At a peak of some distribution, they should be spaced with some minimal [tex]\Delta[/tex] spacing. For example if the peak is at 500 and [tex]\Delta=2[/tex], you might see ... 495, 498, 500, 502, 504, ... Then at the other end of the [1,1000] window, it's back to big spacing 950 999. I'd like the number of integers found in a sliding window of some size in this interval to follow some bell-shaped distribution, but with a constraint on the minimum spacing between numbers.

I get the feeling this is somehow related to a poisson process, so that's why I'm posting here, but that's about it.

Suggestions to bump me in the right direction?


One way is to set [tex]x_{i+1}=x_i+\left\lceil c/f(x_i)\right\rceil[/tex] where f(x) is some density function - say it's normal, so choose mu=500 then choose c so that the gap is 2 in the middle then choose sigma to specify the gaps at x=1. If you want the gaps to be more random then simply add or multiply by a random number from some appropriate distribution.

HTH
 
bpet said:
One way is to set [tex]x_{i+1}=x_i+\left\lceil c/f(x_i)\right\rceil[/tex] where f(x) is some density function - say it's normal, so choose mu=500 then choose c so that the gap is 2 in the middle then choose sigma to specify the gaps at x=1.

The OP is apparently thinking of a discrete distribution (only integers in this case) so f(x) is really a normal approximation. I don't understand how one chooses "c" to assure that the result is a non-consecutive integer.
 
Last edited:
SW VandeCarr said:
I don't understand how one chooses "c" to assure that the result is a non-consecutive integer.

no it doesn't but the ceiling function [tex]\left\lceil\cdot\right\rceil[/tex] does :)
 
cei-ing the c/f(x) does an adequate job for my purposes, so I'm set for now, but I guess I'm still curious about this problem in general.
A motivation: Given some bus station has buses arriving throughout the day (or here, seconds 1 to 1000). About noon (500), there are a lot of buses, whereas at midnight they're few and far between. I'm trying to generate these times, but the bus station has a capacity, so I have to have a minimum spacing between arrival times in my generated numbers.
So the ceil(c/f(x)) works for now. It's just tricky finding a sigma that works for some input window, peak, and capacity without trial and error...

As an alternative, would something like this work:
http://www.mathpages.com/home/kmath452.htm
with [tex]x_2 >x_1 + \delta[/tex], then control [tex]\delta[/tex] with some distribution?
 
Last edited:
coolnessitself said:
A motivation: Given some bus station has buses arriving throughout the day (or here, seconds 1 to 1000). About noon (500), there are a lot of buses, whereas at midnight they're few and far between. I'm trying to generate these times, but the bus station has a capacity, so I have to have a minimum spacing between arrival times in my generated numbers.

At some places I've lived at the buses tended to arrive in groups - the first one crammed full and some empty ones stuck behind (not that that helps the model).

You might want to take other peak times into account, such as 8am & 6pm, so maybe a multipeaked or piecewise constant or spline shape might be more appropriate as the gap function - then instead of a nonlinear inverse density problem you've just got to choose some interpolation coefficients to match the shape of the desired gap rate function.
 
  • #10
I haven't gotten the [tex]x_2=x_1+\delta[/tex] to work yet, but I like your [tex]c/f(x)[/tex] suggestion since it can work for multiple peaks, as you say. I can get 2 equally-weighted peaks to work fairly well. I know this isn't a coding forum, but maybe this MATLAB can help someone
Code:
clc
maxVal=1000; %interval from 1 to 1000
minSpacing=2; %min spacing 2
maxSpacing=30; %max spacing 30
peakvec=[200 800]; %location of double peaks
weights=[1 1];

%build up a function to minimize
s='';
for i=1:length(peakvec)
   s=[s '+' num2str(weights(i)) '*normpdf([' num2str(1:maxVal) '],' num2str(peakvec(i)) ',sigmavec(' num2str(i) '))'];
end
s(1)=[];
func=inline(s,'sigmavec');
pdf = @(sigma) func(sigma)/max(func(sigma));
minpdf = @(sigma) abs(max(ceil(minSpacing./pdf(sigma))) - maxSpacing);
%minimizing minpdf should result in a pair of sigmas that keep maxSpaxing

sigma_0 = repmat(20,size(peakvec));
warning('off','MATLAB:divideByZero');
sigmavec = fminsearch(minpdf,sigma_0); %,repmat(0,size(peakvec)),[]) %the second part if you have fminsearchbnd
warning('on','MATLAB:divideByZero');
disp(['"Best" sigma chosen as ' num2str(sigmavec)]);

%build up a new pdf to generate x(i+1) from x(i)
s='';
for i=1:length(peakvec)
   s=[s '+' num2str(weights(i)) '*normpdf(x,' num2str(peakvec(i)) ',' num2str(sigmavec(i)) ')'];
end
s(1)=[];

func=inline(s,'x');
pdf = @(x) func(x)./max(func([1:maxVal]));

%plot what it looks like
figure
subplot(2,1,1);
plot([1:maxVal],pdf(1:maxVal));
title('distribution of buses');

%find the actual maxSpacing,minSpacing
maxseen=0; minseen=inf;
maxseenat=[]; minseenat=[];
x(1)=1;
i=1;
while x(i)<=maxVal
    x(i+1)=x(i)+ceil(minSpacing/pdf(x(i)));
    if x(i+1)-x(i)>maxseen
        maxseen=x(i+1)-x(i);
        maxseenat=[maxseenat; i];
    elseif x(i+1)-x(i) < minseen
        minseen =x(i+1)-x(i);
        minseenat=[minseenat; i];
    end
    i=i+1;
end
x(x>maxVal)=[];

disp(['Maximum spacing seen: ' num2str(maxseen) ' at position ' num2str(maxseenat')]);
disp(['Minimum spacing seen: ' num2str(minseen) ' at position ' num2str(minseenat')]);
subplot(2,1,2);
plot(x,1);
title('bus times')

Now back to stats. Add a third peak in the above peakvec and the distribution of bus times isn't even. Say I want n peaks, weighted with [tex]w_1, w_2, \ldots w_n[/tex]. What would my f(x) look like? If I just sum them as [tex]w_1N(\mu_1,\sigma_1) + ... + w_nN(\mu_n,\sigma_n)[/tex], my least squares business above doesn't work. Peaks are arbitrarily scaled as far as I can tell. Do I have to deal with covariance matrices somehow to make the amount of buses arriving in a peak time weighted as I wish? Also, this is insufficient for large intervals (like 60*60*24 seconds instead of 1000)

ps. to be honest this has nothing to do with bus times but it's a fun analogy :]
 
Last edited:
  • #11
You might like to try the Shape language Modeling download on Matlab Central, which might give you a simpler way of fitting the gap function. In particular there's no need to restrict yourself to inverted probability densities. Also avoid using strings/inlines to define functions - in your case it should be possible with something like @(x)sum(arrayfun(...)).

HTH
 
  • #12
I think I would need something like cellfun, but my version of MATLAB doesn't allow anonymous functions in cellfun, just built-in functions, so that won't work. But anyways, that's all fine and good.

Followup question:
Is there a term or subject or something I can look into to describe the distribution of an iterative sequence like the one you offered?
[tex] x_{i+1}=x_i+\left\lceil c/f(x_i)\right\rceil[/tex]
but with f(x) general, not specifically a normal? If the ceil() complicates it, real numbers are fine too.
 
  • #13
Aside from what the name of those iterative distributions might be, here's another question.

I'm trying to actually find the distribution of [tex]\max(c/f(x))[/tex] so I can fit it, where f(x) is a [tex]N(\mu,\sigma^2)[/tex]. If I have more peaks I can just sum up my mus and sigmas, so this seems general enough.
So I would want to find the nth order statistic of c/f(x).
So X is a r.v. from this normal, and let Z=c/X.
[tex] f_Z(z) = \frac{C}{z^2 \sqrt{2\pi\sigma^2}} \exp\left\{\frac{-1}{2\sigma^2 z^2}(C-z\mu)^2\right\} \;\;z\ne 0[/tex]
[tex] F_z(z) =\begin{cases}<br /> -\frac{1}{2}\left[1+\mathrm{erf}\left( \frac{C - z\mu}{z\sqrt{2\sigma^2}}\right)\right]&\text{if } z<0 \\<br /> \frac{1}{2}&\text{if } z=0\\<br /> \frac{1}{2}\left[1-\mathrm{erf}\left( \frac{C - z\mu}{z\sqrt{2\sigma^2}}\right)\right]&\text{if } z>0\end{cases}[/tex]
[tex] f(z_{(n)}) = n f(z_{(n)}) \left[F(z_{(n)})\right]^{n-1}[/tex]
[tex] E(z_{(n)}) = n \left(-\frac{1}{2}\right)^{n-1} \int\limits_{-\infty}^0 \frac{C}{z_{(n)} \sqrt{2\pi\sigma^2}} \exp\left\{\frac{-1}{2\sigma^2 z_{(n)}^2}(C-z\mu)^2\right\}\left[1+\mathrm{erf}\left( \frac{C - z_{(n)}\mu}{z_{(n)}\sqrt{2\sigma^2}}\right) \right]^{n-1} dz_{(n)}[/tex]
[tex] \;\;\;\;\;+ n \left(\frac{1}{2}\right)^{n-1} \int\limits_{0}^\infty \frac{C}{z_{(n)} \sqrt{2\pi\sigma^2}} \exp\left\{\frac{-1}{2\sigma^2 z_{(n)}^2}(C-z\mu)^2\right\}\left[1-\mathrm{erf}\left( \frac{C - z_{(n)}\mu}{z_{(n)}\sqrt{2\sigma^2}}\right) \right]^{n-1} dz_{(n)} [/tex]
Let [tex]u=\frac{C - z_{(n)}\mu}{z_{(n)}\sqrt{2\sigma^2}}[/tex], so [tex]dz_{(n)}= \frac{-z_{(n)}^2\sqrt{2\sigma^2}}{C}du[/tex]
[tex] E(z_{(n)}) = -Cn \left(-\frac{1}{2}\right)^{n-1} \int\limits_{-\mu/2\sigma^2}^\infty \frac{1}{\sqrt{2\pi\sigma^2}u+\sqrt{\pi}\mu} \exp^{-u^2}\left[1+\mathrm{erf}\left( u\right) \right]^{n-1} du <br /> +Cn \left(\frac{1}{2}\right)^{n-1} \int\limits_{-\mu/2\sigma^2}^{\infty} \frac{1}{\sqrt{2\pi\sigma^2}u+\sqrt{\pi}\mu} \exp^{-u^2}\left[1-\mathrm{erf}\left( u\right) \right]^{n-1} du[/tex]
Any insight into this integral? :bugeye::rolleyes:

and finally, if the probabilities for a multinomial arrive from n normals, is there a distribution for the result? Or do you first have to calculate the normal probabilities and plug in?
 
Last edited:

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 96 ·
4
Replies
96
Views
12K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 13 ·
Replies
13
Views
10K
  • · Replies 10 ·
Replies
10
Views
5K
Replies
5
Views
18K