Calculating Probability of N Points on a Line Being Within Given Distance

In summary, there is an analytical solution to deal with any number of points, but it requires a bit of work.
  • #1
CoNiss
1
0
TL;DR Summary
Probability of any random n points on a line being within a given distance
Probability of any random n points on a line being within a given distance

Hi,

I am a software engineer trying to solve the following problem analytically

given a line segment in cm and n random points on it
what is the probability that the distance between any 2 consecutive points on the line is less than a given minimum distance?

For Example:
n = 10 points
lineSegment = 1000 cm
minimumDistance = 2 cm

Running a Montecarlo simulation I took the following steps:
1. generate n random points
2. sort the points by order of smaller first
3. calculate the distances between consecutive points
4. count how many distances are smaller or equal to the minimumDistance.

Link to Python Montecarlo simulation on replit:
https://replit.com/@NissimCohen/MonteCarlo1#main.py

Is there an analytical solution to deal with any n number of points?

Thanks...
 
Physics news on Phys.org
  • #2
Hello @CoNiss ,

:welcome: ##\qquad## !​

Not familiar with Python, and no statistics expert -- just curious. And intrigued by the 5/6 probability of zero instances (see below).

My reasoning:
With one randomly distributed point on a 1000 cm line, the probability that a second randomly distributed point is NOT within 2 cm is 0.996.
For 10 points there are 45 pairs and I think that the probability for zero instances is 0.99645=0.835, which is pretty close to 5/6.

For (exactly) one instance I would propose a probability of 0.99644 times 0.004 times 45 = 0.151 (the 45 for the number of pairs in 10 points)

This 0.151 is in the neighborhood of the 0.155 a run of your code found:

Code:
================
Monte Carlo Simulation: ( sample size  =  1000000 )

Results:
(10 points on a 1000 cm line segment)

0 instances:  833286
1 instances:  154916
2 instances:  11352
3 instances:  436
4 instances:  10
5 instances:  0
6 instances:  0
7 instances:  0
8 instances:  0

Probability of all distances (per sample of 10  points) being smaller than 2 cm:
1 - ( 833286  /  1000000 ) =  0.16671400000000003
================

But for two instances I haven't found a reasonable probability expression so far.

Interesting exercise, for sure !

##\ ##
 
  • #3
CoNiss said:
TL;DR Summary: Probability of any random n points on a line being within a given distance

Probability of any random n points on a line being within a given distance

Hi,

I am a software engineer trying to solve the following problem analytically

given a line segment in cm and n random points on it
what is the probability that the distance between any 2 consecutive points on the line is less than a given minimum distance?

For Example:
n = 10 points
lineSegment = 1000 cm
minimumDistance = 2 cm

Running a Montecarlo simulation I took the following steps:
1. generate n random points
2. sort the points by order of smaller first
3. calculate the distances between consecutive points
4. count how many distances are smaller or equal to the minimumDistance.

Link to Python Montecarlo simulation on replit:
https://replit.com/@NissimCohen/MonteCarlo1#main.py

Is there an analytical solution to deal with any n number of points?

Thanks...
An idea: You could make the problem discrete by dividing the line segment into a large number of "slots", for example 10 000 or 100 000 of them.
Then you work out in how many ways you can place n points in these slots.
You have to decide whether the points are distinguishable or not and whether several points can be in the same slot.
Then you work out the same number of arrangements for a smaller number of slots to take into account that you don't want two points within a certain distance of each other. Here you also have to exclude the possibility that two points are in the same slot.
Division of the two results should give you 1 minus the probability you are looking for.
Does that sound right?
 
  • #4
CoNiss said:
TL;DR Summary: Probability of any random n points on a line being within a given distance

Probability of any random n points on a line being within a given distance

Hi,

I am a software engineer trying to solve the following problem analytically

given a line segment in cm and n random points on it
what is the probability that the distance between any 2 consecutive points on the line is less than a given minimum distance?

For Example:
n = 10 points
lineSegment = 1000 cm
minimumDistance = 2 cm

Running a Montecarlo simulation I took the following steps:
1. generate n random points
2. sort the points by order of smaller first
3. calculate the distances between consecutive points
4. count how many distances are smaller or equal to the minimumDistance.

Link to Python Montecarlo simulation on replit:
https://replit.com/@NissimCohen/MonteCarlo1#main.py

Is there an analytical solution to deal with any n number of points?

Thanks...
There's a homework problem on here to calculate the expected value of the shortest segment for ##n## random points. And you can find analytic solutions elsewhere online.

You might be able to modify some of Techniques involved to solve your problem.
 
  • #5
I think carrying out the plan indicated by @Philip Koeck requires finding a convenient expression for the total number of iterations of some nested loops where the loop variables have a simple dependence.

To elaborate:
Approximate a continuous segment by dividing it into L equal intervals ("bins"). Define a slection of a point to be a selection of an entire interval, The point's location is given by its interval number, numbering from left to right on the x-axis of a graph. So the smallest possible location of a "point" is 1 and the largest possible location is L.

Assume that we use a sufficiently large number of intervals L so that the probability of two points being chosen in the same interval is approximately zero.

Let the distance between two points at locations j < k be k - j. So for example, adjacent intervals are a distance of 1 apart and a point located at k is a distance M from the point located at k+M.

With those assumptions, the number of possible arrangements of N points in L bins is given by the binomial coefficient ##\binom{N}{L} = \frac{N!}{L! (N-L)!}##. Each arrangement has equal probability by the assumptions of the problem.

Counting the number of these arrangements that do not have two points smaller than the (integer) distance M between them can (in theory) be done by writing a program with nested loops.

For example for N = 3 points, pseudo code for the program looks like:

(a3 is the location of the rightmost point and a1 is the location of the leftmost point )

Count=0
For a3 = 1+2*M to a3= L
For a1 = 1 to a3-2*M,
For a2 = a1+M to a2 = a3-M
Count = Count+1
End for a2
End For a1
End for a3

It wouldn't surprise me if people who study computation complexity or combinatorics have developed formulas for finding the Count.

To solve the whole problem,we need to take the limit of the ratio of Count to the total number of possible arrangements as the number of bins approaches infinity. One hopes that Sterlings approximation to the factorial function will allow that. However, the error between K! and its Sterlings approximation approaches infinity as K approaches infinity. So the expression we are dealing with needs to involve ratios of factorials. If the expression for Count involves sums of factorials, things could get tricky.
 
  • Like
Likes Philip Koeck
  • #6
(I now realize) the analysis implied by the previous post doesn't require factorials.

The proposed solution is ##\lim_{L \rightarrow \infty} \frac {Count} {\binom{L}{N}}##

The denominator ##\binom{L}{N}= (L)(L-1)...(L-N+1)/N! ## is a polynomial of degree ##N## with the highest order term ##1L^N/N!##.

The numerator, Count, can be computed as nested summations. These evaluate to a polynomial of degree ##N## in ##L##.

The limit is the ratio of the coefficients of ##L^N## in the two polynomials.

To illustrate this in the case N=3 of the previous post.

A program loop "from A to B" is evaluated (B-A+1) times. So the loop on a2 increments Count by ## (a_3-M)- (a_1+M) + 1 = a_3-a_1- 2M+1 ##

The final value of Count is given by the nested summations:
##\sum_{a_3=2M}^{a_3=L} ( \sum_{a_1=1}^{a_1=a3-2M} (a_3 - a_1 - 2M + 1))##

To retain the approximation of a length m by a number of intervals M, we also must let M approach infinity as L does. We keep the ratio M/L as close as possible to the ratio of the minimum distance to the length of the line segment. Let this ratio be ##r##. So, in the limit, we can replace M by Lr. With that substitution, the polynomial expression for Count will involve only the variable L and constants.

For example the innermost sum above is
##\sum_{a_1=1}^{a_1=a_3-2M} (a_3-a_1-2M+1)##

## = (a_3-2M-1+1)(a_3-2M+1) - \sum_{a_1=1}^{a_1=a3-2M} a_1 ##

(apply the summation formula ## \sum_{k=1}^n k = (k)(k+1)/2## )

##= (a_3-2M)(a_3-2M + 1) - (a_3-2M)(a_3-2M+1)/2) ##

##= (1/2) (a_3-2M)(a_3-2M+1) ##

The summation on ##a_3## is thus
## \sum_{a_3=2M}^{a_3=L} ((1/2)(a_3-2M)(a_3-2M+1))##

##= (1/2) \sum_{a_3=2M}^{a_3=L} ((a_3)^2 - 4a_3M + 4M^2 + a_3 - 2M)##

## = (1/2) (L -2M+1) (4M^2 - 2M) ##
##\ \ + (1/2)\sum_{a3=2M}^{a_3=L} (a_3)^2 ##
##\ \ + (1/2)(1 - 4M) \sum_{a_3=2M}^{a3=L} a_3 ##

At the moment, I don't have the energy to work this further! However, I hope this illustrates how the formula for Count will have its highest order term a multiple of ##L^3## after applying the summation formula for ##\sum K^2## and substituting ##M = rL##
 
Last edited:
  • #7
Isn't the same as "a stick is broken into N random pieces"? Don't textbooks have this for small N (exact) and large N (approximations, and possibly exact)?
 
  • #8
It turns out that the continuous approach leads to simpler algebra than the approximation by bins. Let the line segment be ##[0,L] ## and the desired spacing be ##m##. For example, take the case of n = 3 points.

The joint probability density of 3 points ##(x,y,z)## in the line segment is ##f(x,y,z) = \frac{1}{L^3}##

Integrating this density over the set where ##(x,y,z)## are at least ##m## apart requires doing integrals over sets such as ## \{ (x,y,z)| 2m \le z \le L,\ 0 \le x \le z-2m,\ x+m \le y \le z-m\} ## or ##\{ (x,y,z)| 2m \le y \le L,\ 0 \le x \le y-2m,\ x+m \le z \le y-m\}##
etc.

There are 3! = 6 such mutually exclusive sets.

Let ##a_1< a_2<a_3## be one ordering of the variables ##(x,y,z)##.

##\int_{a_3=2m}^{a_3=L} \int_{a_1=0}^{a_1=a_3-2m} \int_{a_2=a_1 + m}^{a_2=a_3-m} \frac {1}{L^3} \ da_2 \ da_1\ da_3##

## =\frac{1}{L^3} \int_{a_3=2m}^{a_3=L} \int_{a_1=0}^{a_1=a_3-2m} (a_3 - a_1 - 2m)\ da_1\ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L}( \ |_{a_1=0}^{a_1=a_3-2m} ( a_3 - 2m)a_1 - \frac{a_1^2}{2} )\ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L}( (a_3 - 2m)^2 - \frac{ (a^3 - 2m)^2}{2} ) \ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L} \frac{ (a^3 - 2m)^2}{2} \ da_3 ##

## = \frac{1}{L^3} \int_{a_3=2m}^{a_3=L} \frac{ (a_3^2 - 4ma_3 + 4m^2)}{2} \ da_3 ##

## = \frac{1}{L^3} |_{a_3=2m}^{a_3=L} \frac {a_3^3}{6} - ma_3^2 + 2 m^2 a_3 ) ##

## = \frac{1}{L^3} ( \frac{ L^3}{6} - mL^2 + 2 m^2 L - \frac{8m^3}{6} + 4m^3 - 4m^3)##

## = \frac{1}{6} - \frac{m}{L} + 2 ( \frac{m}{L})^2 - \frac{8 (\frac{m}{L})^3}{6} ##

Let ##\frac{m}{L} = r#### = \frac{1}{6} - r + 2r^2 - \frac{8r^3}{6}##

Since there are 6 = 3! orderings of the three variables ##x,y,z##, the total probability is 6 times the above result which is ##1 - 6r + 12r^2 -8 r^3 = T(r) ##

Checks:

As ## r \rightarrow 0 ##, ## T(r) \rightarrow 1##.

As ## r \rightarrow 1/2 ##, ##T(r) \rightarrow 0 ##. (A segment of length L containing 3 equally spaced points with distance m between them must be at least 2m long )
 

1. What is the formula for calculating probability of N points on a line being within a given distance?

The formula for calculating this probability is P = (N choose 2) * (d/L)^(N-1), where N is the number of points, d is the given distance, and L is the length of the line.

2. How do you determine the value of N for this calculation?

The value of N can be determined by counting the number of points on the line or by knowing the number of points beforehand.

3. Can this formula be applied to any type of line?

Yes, this formula can be applied to any type of line, as long as the length of the line is known and the points are evenly distributed along the line.

4. What does the value of P represent in this formula?

The value of P represents the probability of N points on a line being within the given distance d.

5. How can this formula be useful in real-world applications?

This formula can be useful in various fields such as engineering, statistics, and physics, where the probability of points falling within a certain distance on a line is important for analyzing data and making predictions.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
787
  • Precalculus Mathematics Homework Help
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Replies
2
Views
961
  • MATLAB, Maple, Mathematica, LaTeX
Replies
14
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
14
Views
2K
Replies
96
Views
9K
Back
Top