Calculating Probability of N Points on a Line Being Within Given Distance

CoNiss · Apr 18, 2023

Probability of any random n points on a line being within a given distance

Hi,

I am a software engineer trying to solve the following problem analytically

given a line segment in cm and n random points on it
what is the probability that the distance between any 2 consecutive points on the line is less than a given minimum distance?

For Example:
n = 10 points
lineSegment = 1000 cm
minimumDistance = 2 cm

Running a Montecarlo simulation I took the following steps:
1. generate n random points
2. sort the points by order of smaller first
3. calculate the distances between consecutive points
4. count how many distances are smaller or equal to the minimumDistance.

Link to Python Montecarlo simulation on replit:
https://replit.com/@NissimCohen/MonteCarlo1#main.py

Is there an analytical solution to deal with any n number of points?

Thanks...

BvU · Apr 18, 2023

Hello @CoNiss ,

##\qquad## !

Not familiar with Python, and no statistics expert -- just curious. And intrigued by the 5/6 probability of zero instances (see below).

My reasoning:
With one randomly distributed point on a 1000 cm line, the probability that a second randomly distributed point is NOT within 2 cm is 0.996.
For 10 points there are 45 pairs and I think that the probability for zero instances is 0.996⁴⁵=0.835, which is pretty close to 5/6.

For (exactly) one instance I would propose a probability of 0.996⁴⁴ times 0.004 times 45 = 0.151 (the 45 for the number of pairs in 10 points)

This 0.151 is in the neighborhood of the 0.155 a run of your code found:

Code:

================
Monte Carlo Simulation: ( sample size  =  1000000 )

Results:
(10 points on a 1000 cm line segment)

0 instances:  833286
1 instances:  154916
2 instances:  11352
3 instances:  436
4 instances:  10
5 instances:  0
6 instances:  0
7 instances:  0
8 instances:  0

Probability of all distances (per sample of 10  points) being smaller than 2 cm:
1 - ( 833286  /  1000000 ) =  0.16671400000000003
================

But for two instances I haven't found a reasonable probability expression so far.

Interesting exercise, for sure !

##\ ##

Philip Koeck · Apr 19, 2023

CoNiss said:

TL;DR Summary: Probability of any random n points on a line being within a given distance

Probability of any random n points on a line being within a given distance

Hi,

I am a software engineer trying to solve the following problem analytically

given a line segment in cm and n random points on it
what is the probability that the distance between any 2 consecutive points on the line is less than a given minimum distance?

For Example:
n = 10 points
lineSegment = 1000 cm
minimumDistance = 2 cm

Running a Montecarlo simulation I took the following steps:
1. generate n random points
2. sort the points by order of smaller first
3. calculate the distances between consecutive points
4. count how many distances are smaller or equal to the minimumDistance.

Link to Python Montecarlo simulation on replit:
https://replit.com/@NissimCohen/MonteCarlo1#main.py

Is there an analytical solution to deal with any n number of points?

Thanks...

An idea: You could make the problem discrete by dividing the line segment into a large number of "slots", for example 10 000 or 100 000 of them.
Then you work out in how many ways you can place n points in these slots.
You have to decide whether the points are distinguishable or not and whether several points can be in the same slot.
Then you work out the same number of arrangements for a smaller number of slots to take into account that you don't want two points within a certain distance of each other. Here you also have to exclude the possibility that two points are in the same slot.
Division of the two results should give you 1 minus the probability you are looking for.
Does that sound right?

PeroK · Apr 19, 2023

CoNiss said:

TL;DR Summary: Probability of any random n points on a line being within a given distance

Probability of any random n points on a line being within a given distance

Hi,

I am a software engineer trying to solve the following problem analytically

given a line segment in cm and n random points on it
what is the probability that the distance between any 2 consecutive points on the line is less than a given minimum distance?

For Example:
n = 10 points
lineSegment = 1000 cm
minimumDistance = 2 cm

Running a Montecarlo simulation I took the following steps:
1. generate n random points
2. sort the points by order of smaller first
3. calculate the distances between consecutive points
4. count how many distances are smaller or equal to the minimumDistance.

Link to Python Montecarlo simulation on replit:
https://replit.com/@NissimCohen/MonteCarlo1#main.py

Is there an analytical solution to deal with any n number of points?

Thanks...

There's a homework problem on here to calculate the expected value of the shortest segment for ##n## random points. And you can find analytic solutions elsewhere online.

You might be able to modify some of Techniques involved to solve your problem.

Stephen Tashi · Apr 29, 2023

I think carrying out the plan indicated by @Philip Koeck requires finding a convenient expression for the total number of iterations of some nested loops where the loop variables have a simple dependence.

To elaborate:
Approximate a continuous segment by dividing it into L equal intervals ("bins"). Define a slection of a point to be a selection of an entire interval, The point's location is given by its interval number, numbering from left to right on the x-axis of a graph. So the smallest possible location of a "point" is 1 and the largest possible location is L.

Assume that we use a sufficiently large number of intervals L so that the probability of two points being chosen in the same interval is approximately zero.

Let the distance between two points at locations j < k be k - j. So for example, adjacent intervals are a distance of 1 apart and a point located at k is a distance M from the point located at k+M.

With those assumptions, the number of possible arrangements of N points in L bins is given by the binomial coefficient ##\binom{N}{L} = \frac{N!}{L! (N-L)!}##. Each arrangement has equal probability by the assumptions of the problem.

Counting the number of these arrangements that do not have two points smaller than the (integer) distance M between them can (in theory) be done by writing a program with nested loops.

For example for N = 3 points, pseudo code for the program looks like:

(a3 is the location of the rightmost point and a1 is the location of the leftmost point )

Count=0
For a3 = 1+2*M to a3= L
For a1 = 1 to a3-2*M,
For a2 = a1+M to a2 = a3-M
Count = Count+1
End for a2
End For a1
End for a3

It wouldn't surprise me if people who study computation complexity or combinatorics have developed formulas for finding the Count.

To solve the whole problem,we need to take the limit of the ratio of Count to the total number of possible arrangements as the number of bins approaches infinity. One hopes that Sterlings approximation to the factorial function will allow that. However, the error between K! and its Sterlings approximation approaches infinity as K approaches infinity. So the expression we are dealing with needs to involve ratios of factorials. If the expression for Count involves sums of factorials, things could get tricky.

Stephen Tashi · May 3, 2023

(I now realize) the analysis implied by the previous post doesn't require factorials.

The proposed solution is ##\lim_{L \rightarrow \infty} \frac {Count} {\binom{L}{N}}##

The denominator ##\binom{L}{N}= (L)(L-1)...(L-N+1)/N! ## is a polynomial of degree ##N## with the highest order term ##1L^N/N!##.

The numerator, Count, can be computed as nested summations. These evaluate to a polynomial of degree ##N## in ##L##.

The limit is the ratio of the coefficients of ##L^N## in the two polynomials.

To illustrate this in the case N=3 of the previous post.

A program loop "from A to B" is evaluated (B-A+1) times. So the loop on a2 increments Count by ## (a_3-M)- (a_1+M) + 1 = a_3-a_1- 2M+1 ##

The final value of Count is given by the nested summations:
##\sum_{a_3=2M}^{a_3=L} ( \sum_{a_1=1}^{a_1=a3-2M} (a_3 - a_1 - 2M + 1))##

To retain the approximation of a length m by a number of intervals M, we also must let M approach infinity as L does. We keep the ratio M/L as close as possible to the ratio of the minimum distance to the length of the line segment. Let this ratio be ##r##. So, in the limit, we can replace M by Lr. With that substitution, the polynomial expression for Count will involve only the variable L and constants.

For example the innermost sum above is
##\sum_{a_1=1}^{a_1=a_3-2M} (a_3-a_1-2M+1)##

## = (a_3-2M-1+1)(a_3-2M+1) - \sum_{a_1=1}^{a_1=a3-2M} a_1 ##

(apply the summation formula ## \sum_{k=1}^n k = (k)(k+1)/2## )

##= (a_3-2M)(a_3-2M + 1) - (a_3-2M)(a_3-2M+1)/2) ##

##= (1/2) (a_3-2M)(a_3-2M+1) ##

The summation on ##a_3## is thus
## \sum_{a_3=2M}^{a_3=L} ((1/2)(a_3-2M)(a_3-2M+1))##

##= (1/2) \sum_{a_3=2M}^{a_3=L} ((a_3)^2 - 4a_3M + 4M^2 + a_3 - 2M)##

## = (1/2) (L -2M+1) (4M^2 - 2M) ##
##\ \ + (1/2)\sum_{a3=2M}^{a_3=L} (a_3)^2 ##
##\ \ + (1/2)(1 - 4M) \sum_{a_3=2M}^{a3=L} a_3 ##

At the moment, I don't have the energy to work this further! However, I hope this illustrates how the formula for Count will have its highest order term a multiple of ##L^3## after applying the summation formula for ##\sum K^2## and substituting ##M = rL##

Vanadium 50 · May 3, 2023

Isn't the same as "a stick is broken into N random pieces"? Don't textbooks have this for small N (exact) and large N (approximations, and possibly exact)?

Stephen Tashi · May 6, 2023

It turns out that the continuous approach leads to simpler algebra than the approximation by bins. Let the line segment be ##[0,L] ## and the desired spacing be ##m##. For example, take the case of n = 3 points.

The joint probability density of 3 points ##(x,y,z)## in the line segment is ##f(x,y,z) = \frac{1}{L^3}##

Integrating this density over the set where ##(x,y,z)## are at least ##m## apart requires doing integrals over sets such as ## \{ (x,y,z)| 2m \le z \le L,\ 0 \le x \le z-2m,\ x+m \le y \le z-m\} ## or ##\{ (x,y,z)| 2m \le y \le L,\ 0 \le x \le y-2m,\ x+m \le z \le y-m\}##
etc.

There are 3! = 6 such mutually exclusive sets.

Let ##a_1< a_2<a_3## be one ordering of the variables ##(x,y,z)##.

##\int_{a_3=2m}^{a_3=L} \int_{a_1=0}^{a_1=a_3-2m} \int_{a_2=a_1 + m}^{a_2=a_3-m} \frac {1}{L^3} \ da_2 \ da_1\ da_3##

## =\frac{1}{L^3} \int_{a_3=2m}^{a_3=L} \int_{a_1=0}^{a_1=a_3-2m} (a_3 - a_1 - 2m)\ da_1\ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L}( \ |_{a_1=0}^{a_1=a_3-2m} ( a_3 - 2m)a_1 - \frac{a_1^2}{2} )\ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L}( (a_3 - 2m)^2 - \frac{ (a^3 - 2m)^2}{2} ) \ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L} \frac{ (a^3 - 2m)^2}{2} \ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L} \frac{ (a_3^2 - 4ma_3 + 4m^2)}{2} \ da_3 ##

## = \frac{1}{L^3} |_{a_3=2m}^{a_3=L} \frac {a_3^3}{6} - ma_3^2 + 2 m^2 a_3 ) ##

## = \frac{1}{L^3} ( \frac{ L^3}{6} - mL^2 + 2 m^2 L - \frac{8m^3}{6} + 4m^3 - 4m^3)##

## = \frac{1}{6} - \frac{m}{L} + 2 ( \frac{m}{L})^2 - \frac{8 (\frac{m}{L})^3}{6} ##

Let ##\frac{m}{L} = r#### = \frac{1}{6} - r + 2r^2 - \frac{8r^3}{6}##

Since there are 6 = 3! orderings of the three variables ##x,y,z##, the total probability is 6 times the above result which is ##1 - 6r + 12r^2 -8 r^3 = T(r) ##

Checks:

As ## r \rightarrow 0 ##, ## T(r) \rightarrow 1##.

As ## r \rightarrow 1/2 ##, ##T(r) \rightarrow 0 ##. (A segment of length L containing 3 equally spaced points with distance m between them must be at least 2m long )

Calculating Probability of N Points on a Line Being Within Given Distance

1. What is the formula for calculating probability of N points on a line being within a given distance?

2. How do you determine the value of N for this calculation?

3. Can this formula be applied to any type of line?

4. What does the value of P represent in this formula?

5. How can this formula be useful in real-world applications?

Similar threads

Hot Threads

Recent Insights