Random set of N points in a unit disc, what is the average nearest distance

Spinnor · Aug 24, 2018

Pick a random set of N points from the unit disc. Calculate the distance between all pairs of points and call the smallest value r. Do this calculation for many such sets. Please give me a hint how to estimate what the average value of r is. I guess a computer program could quickly come up with an accurate estimate as long as N is not too big?

Thanks!

mfb · Aug 24, 2018

If you don't need an analytic solution (and I don't know if one exists for general N) a computer simulation is a nice approach.

For very large N there are good analytic approximations.

Spinnor · Aug 24, 2018

mfb said:

For very large N there are good analytic approximations.

Would 115 billion be considered a large number in this case?

I guess my 2 dimensional problem simulated on a computer could be simplified by doing the 1 dimensional problem on a computer, N random points on the unit interval, coming up with r and then inferring that for the 2 dimensional case r should be approximately (or exactly?) √2 times r for the 1D problem?

Thanks!

EngWiPy · Aug 24, 2018

Spinnor said:

Would 115 billion be considered a large number in this case?

I guess my 2 dimensional problem simulated on a computer could be simplified by doing the 1 dimensional problem on a computer, N random points on the unit interval, coming up with r and then inferring that for the 2 dimensional case r should be approximately (or exactly?) √2 times r for the 1D problem?

Thanks!

You need to increase the number of repetitions to get more accurate results, not ##N##. What you need to do is to first generate ##N## random points, such that distance from the center ,say (0,0) is less than or equal to 1. This means that ##\sqrt{r_1^2+r_2^2}\leq 1##, where ##\mathbf{r}=(r_1,\,r_2)## is a point in the unit disk. I guess how you can do this, is first to generate ##N## random variables in the interval ##[0, 1]## that represent ##r_1##, and then generate another ##N## random variables between ##[0,\sqrt{1-r_1^2}]## that represent ##r_2##. Now you have ##N## random points on a unit disk. Then you find the pair distance of all points and select the minimum, and store it. Repeat this process say ##K=10^5## times. At the end add all the minimum distances from all trials, and divide it by ##K##. This way you get the average minimum distance. Maybe there is another simpler way, but this what comes to mind now.

mfb · Aug 24, 2018

Spinnor said:

Would 115 billion be considered a large number in this case?

Yes.
Keep in mind that the distribution of what you want to study is not uniform.
Also keep in mind that (a) protons are not point particles, (b) the bunches collide with a non-zero crossing angle and (c) there are many collisions per bunch crossing.

Spinnor said:

I guess my 2 dimensional problem simulated on a computer could be simplified by doing the 1 dimensional problem on a computer, N random points on the unit interval, coming up with r and then inferring that for the 2 dimensional case r should be approximately (or exactly?) √2 times r for the 1D problem?

For a uniform distribution that should be a reasonable approximation.

Spinnor · Aug 24, 2018

Thank you all your corrections. So if I used the numbers for focused beam bunches, bunch diameter of order several millimeters and a bunch number of a hundred billion protons should give us a r minimum of order the proton radius? Protons need to be pretty close to interact strongly?

Thank you for your help. Need to look into free easy to use math calculating software.

mfb · Aug 24, 2018

Spinnor said:

bunch diameter of order several millimeters

That is way too much.

Spinnor said:

should give us a r minimum of order the proton radius?

No as they don't have to overlap to interact and they are not pool balls with a single sharp border anyway.

We have much more accurate numbers for the proton radius from other measurements. Overall LHC cross sections don't help with that.

Spinnor · Aug 25, 2018

So back to the original math question. I think I have a solution and would appreciate flaws in my approximation pointed out.

If we have N points we then have (N-1)N/2 distinct pairs of points. Focus on one pair of points and call them a and b. If I give you the coordinates of a then you could say that because b is random it could be anywhere in the unit disc. divide the unit disc into angular regions centered on point a. These annular regions do not form complete annular rings unless point a was at the center of the unit disc, consider this an error in my approximation a value I think will be between approximately between 2 and 1/2, brain is too fuzzy to figure if the error factor is larger or smaller than 1. Call the radius to the random point b from a, r_i. The probability that the random point b is in a particular annular region is proportional to the area of the annular region. Considering all pairs of points there will be (N-1)N/2 values of r_i. Randomly distribute (N-1)N/2 points about point a. The density of points is the number of points divided by the area of the unit disc, and is (N-1)N/2]/π, call that ρ. Use this density to calculate the minimum value of r, r_min such there is likely one point in an annular region centered on point a with radius r_min. We want,

Density of points times area of circular region of radius r_min = 1

[(N-1)N/2]/π X π(r_min)^2 = 1 or,

(r_min)^2 ≅ 2/[(N-1)N/2] for large N

r_min ≅ √2/N

Extrapolating, in D dimensions r_min ≅ √D/N Edit, unless I can come up with an argument for the D dimensional case let's forget it for now.

Edit, the factor of √2 should be 2, simple math error.

So at this point I should step back and ask myself, should the answer scale as 1/N. Maybe you smarter guys out there can argue yes or no I do not know.

Thanks for your help!

Spinnor · Aug 25, 2018

mfb said:

That is way too much.

"With transverse dimensions of the order a mm, but in a collider as small as possible at the collision point (LHC - 16 microns fully squeezed)"

From https://www.google.com/search?q=Wha...ome..69i57.17199j0j8&sourceid=chrome&ie=UTF-8

Thanks again for the corrections.

mfb · Aug 25, 2018

##r_{min} = \frac{\sqrt{2}}{N}## for a unit disk looks good. Edge effects are negligible, so the shape (disk) doesn't matter.
We can generalize it to ##r_{min} = \frac{\sqrt{2A}}{\sqrt{\pi} N}## for an area A that is not too fractured.

In d dimensions a point will have an expected ##c x^d N## points within a radius x where c is some numerical constant depending on the volume of the unit ball in d dimensions. To estimate the r_min let this value be 2/N: ##\frac{2}{N} = c r_{min}^d N## or ##r_{min}=\left(\frac{2}{cN^2}\right)^{1/d}##.
You can see the curse of dimensionality here. ##r_{min} \propto N^{-2/d}##. For large d the radius will stay quite large even for large N.

WWGD · Aug 26, 2018

As a general result, assuming each point comes from the same distribution, i.e., the points ##p_1,...,p_N## are IID RVs, then the distribution of the minimum

## Min (p_1,p_2,...,p_N) ## is given by ##P(Min(p_1,...,p_N)<y)= 1- (1-\phi)^N ## , where ##\phi## is the cdf.

Spinnor · Aug 27, 2018

WWGD said:

As a general result, assuming each point comes from the same distribution, i.e., the points p1,...,pNp1,...,pNp_1,...,p_N are IID RVs, then the distribution of the minimum

Min(p1,p2,...,pN)Min(p1,p2,...,pN) Min (p_1,p_2,...,p_N) is given by P(Min(p1,...,pN)<y)=1−(1−ϕ)NP(Min(p1,...,pN)<y)=1−(1−ϕ)NP(Min(p_1,...,p_N)ϕϕ\phi is the cdf.

Does the above result agree with that of mfb, I don't see it right now? I will start here, Cumulative distribution function,
https://en.wikipedia.org/wiki/Cumulative_distribution_function

Can you point to a link or translate "IID RVs"?

Thanks!

Spinnor · Aug 27, 2018

Spinnor said:

Can you point to a link or translate "IID RVs"?

Would that be 2 dimensional random values?

WWGD · Aug 27, 2018

Spinnor said:

Does the above result agree with that of mfb, I don't see it right now? I will start here, Cumulative distribution function,
https://en.wikipedia.org/wiki/Cumulative_distribution_function

Can you point to a link or translate "IID RVs"?

Thanks!

Sorry for my laziness :). This stands for Independently, Identically Distributed Random Variables, a condition assumed in some results like the Central Limit Theorem and others.

mfb · Aug 28, 2018

They are not independent - we always have two points with the smallest distance to a neighbor. I would expect the approach with independent distributions to miss a factor sqrt(2) in the result.

ftr · Sep 7, 2018

http://mathworld.wolfram.com/DiskLinePicking.html

Random set of N points in a unit disc, what is the average nearest distance

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect