I Computing the Density of a Point Cloud

AI Thread Summary
The discussion focuses on computing the density of a point cloud derived from molecular dynamics simulations of silica fracture in LAMMPS. Participants explore various methods for calculating density, including the use of delta functions and Gaussian convolution to avoid issues with singularities and to achieve a smoother density representation. The nearest neighbors method is also mentioned, but concerns arise regarding its discontinuities. A consensus emerges that Gaussian smearing is a standard and effective approach for density estimation, particularly in MATLAB, where built-in functions for kernel density estimation are available. Overall, the Gaussian method is favored for its ability to provide continuous and interpretable density distributions.
person123
Messages
326
Reaction score
52
TL;DR Summary
I have a point cloud representing broken bonds and I would like to represent the density of the cloud at some location in space.
Hi. For some background, I am running molecular dynamics simulations of silica fracture in LAMMPS. Each point represents the location of a broken bond. I would like to find regions where many bonds are breaking, which I speculate would be locations of crack formation. These computations are being done on MATLAB using data from the LAMMPS simulation.

I could come up with a way to compute the density. For example, I could sum the inverse of the distance from a specified location and each point. The sum would represent the density at that specified location. However, I was wondering if there was some standard and simple way of computing density of a point cloud, or one which would be particularly relevant considering the application. Thank you!
 
  • Like
Likes Delta2
Mathematics news on Phys.org
How about using ##\delta## function ?
\rho=\sum_i \delta(x-x_i)\delta(y-y_i)\delta(z-z_i)
the average of inverse of the distance from Origin is
\sum_i \int dx \int dy \int dz \frac{\delta(x-x_i)\delta(y-y_i)\delta(z-z_i)}{\sqrt{(x-x_i)^2+(y-y_i)^2+(z-z_i)^2}}
where ##(x_i,y_i,z_i)## is position of i-th fracture.
 
I don't think average of inverse distance would be physically meaningful because it wouldn't take into account the number of points (if there was one broken bond very close to the selected location, it would be represented as very dense location even though there would be essentially no crack formation). It might be more reasonable to just take the weighted sum instead. Wouldn't the equation just simplify to this?
$$\sum_i \frac{1}{\sqrt{(x-x_i)^2+(y-y_i)^2+(z-z_i)^2}}$$
I'm a bit confused about the integrals. I'm assuming they are definite integrals which would (at least in theory) integrate from ##-\infty## to ##\infty## so the ##\delta## function would be a real value. However, wouldn't this then return a single value as opposed to a function of ##<x,y,z>## (the specified location)?
 
I have scarce knowledge of the physics of your system, so the quantity of your interested in general might be
\frac{1}{V}\int\int\int_V dxdydz f(\ \sqrt{(x-x_i)^2+(y-y_i)^2+(z-z_i)^2}\ )
where (x,y,z) is the coordinate of the point in system volume V and f is function of your interest, e.g. inverse of the distance of the point and point of broken bond.
 
I realized that one issue with using a function with an asymptote like the inverse function is if the selected location happens to be very close to one of the points, the value can become arbitrarily large even if there's just a single point.

I tried using a nearest neighbors method, computing the number of points which are within a certain cutoff distance from the selected location. Changing the cutoff changes how fine grained the density distribution is. This seems to work reasonably well.
 
If your system is lattice-like, the integration would be replaced with sum i.e.
\frac{1}{V}\int\int\int_V dxdydz \rightarrow \frac{v}{V}\sum_{(l,m,n)\ \ \neq(i,k,j)}
where summation is done on (l,m,n) which denote all the cell, (i,k,j) denotes the lattice point of fracture and v is volume of a cell. We can avoid zero distance so infinity here.
 
I'm still a bit confused with the expressions you're writing. I want a computation which has the coordinate of some location in space as an input and the density at that location as an output. If it's a function, it should be a function of that location. As far as I can tell what you're writing would give a single number instead.
 
I misunderstood that you needed a value for a system. So let us go back to #2 density of fracture
\rho(x,y,z)=\sum_{i} \delta(x-x_i)\delta(y-y_i)\delta(z-z_i)
Dividing space with cells numbered j, coarse grained fracture number density of the j-th cell whose volume is v_j,
\rho_j=\frac{1}{v_j}\int_{v_j} dv_j \sum_{i} \delta(x-x_i)\delta(y-y_i)\delta(z-z_i)
might work for your evaluation avoiding zero-distance.
 
What if we represent each point by an impulse (##\delta##) as discussed above, but then you convolve (smear out) these impulses with a fairly wide gaussian -- wide enough to span a large number of points? This would give you a smoothed out function that would reflect the neighborhood density, but it won't get kicked up by individual points.

Simply put, a kind of 3-D running average with gaussian window. But since each data point has the same magnitude, the running average could easily be scaled to tell us the density.
 
  • Wow
Likes person123
  • #10
Swamp Thing said:
What if we represent each point by an impulse (##\delta##) as discussed above, but then you convolve (smear out) these impulses with a fairly wide gaussian -- wide enough to span a large number of points? This would give you a smoothed out function that would reflect the neighborhood density, but it won't get kicked up by individual points.

Simply put, a kind of 3-D running average with gaussian window. But since each data point has the same magnitude, the running average could easily be scaled to tell us the density.
I'm not sure if I fully understand, but I take it that I would represent each data point with a gaussian, then take the sum (scaling it to find the density). This would give a function representing the density at each point in space. Here's what I'm imagining in one dimension:
gaussian.jpg

I imagine this would work very well, and it wouldn't blow up around the point. It would also be more continuous than my nearest neighbor approach.
 
  • #11
person123 said:
Here's what I'm imagining in one dimension:

Yes, that's the idea.
 
  • #12
OK, I think this would be the calculation then. I'm going to do it for the 2-dimensional case. ##s## will be a parameter which can be adjusted to change how fine-grained the distribution is.

First I would compute the integral of the function so I can normalize it:
$$\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{\frac{-x^2-y^2}{s^2}}dxdy=\pi s^2$$

Then, the density would be the following:
$$d(x,y)=\frac{1}{\pi s^2}\sum_i e^{\frac{-(x-x_i)^2-(y-y_i)^2}{s^2}}$$

I think my two approaches for calculating density are:
  • this method, representing each point with a gaussian
  • finding the number of points within a certain range of the selected location
This might be an overly vague question, but which more method would be more accepted/ more physically meaningful/ more straight-forward/ less likely to run into issues/ overall better? I think an issue with using the range method is there are jumps when points come into range, which show up as faint rings around the point. However, I think what the value represents is a bit more straight-forward than for the gaussian method.
 
  • #13
There are tools that do this smearing as it is a standard approach, and they also choose the width of the Gaussian for each point based on the local density of points (denser points -> smaller width to be more sensitive to local changes in the density). I don't know if the latter is done recursively or with other methods. You want the width to be larger than the typical separation so you are not too sensitive to individual entries, but it shouldn't be too much larger, otherwise you are blurring the overall distribution.
 
  • Like
Likes person123
  • #14
That's good to know that it's a standard approach. I could definitely write MATLAB code to do this (assuming my equation is correct). Adjusting the Gaussian width depending on local density seems to be a lot more difficult; I'm curious if MATLAB has any built-in functions for this (I couldn't find anything in the documentation).
 
  • #15
Don't know about MATLAB, I know ROOT has something. I don't know how it is called, but I have seen colleagues using it long ago.
 
  • #16
Since

\delta (\alpha x)=\lim _{a\to 0}{\frac {1}{|a|{\sqrt {\pi }}}}e^{-(\alpha x/a)^{2}}

Precise density of delta distribution is approximately represented by Gaussian. You may use Gaussian with width parameter a and finally adjust it.
 
Last edited:
  • #17
anuttarasammyak said:
Since

\delta (\alpha x)=\lim _{a\to 0}{\frac {1}{|a|{\sqrt {\pi }}}}e^{-(\alpha x/a)^{2}}

Precise density of delta distribution is approximately represented by Gaussian. You may use Gaussian with width parameter a and finally adjust it.
In the 2D case, would ##\frac{1}{|a|\sqrt{\pi}}## be replaced by ##\frac{1}{a^2\pi}## like in the equation I wrote above?

I wrote it on MATLAB, and it seems to work very well, clearly indicating where the crack formed. I think it will also be much easier to analyze than the number of neighbors function because it's continuous. If this is a standard approach, I think it definitely makes sense to use it for analysis.
 
Last edited:
  • Like
Likes anuttarasammyak
  • #18
person123 said:
In the 2D case, would ##\frac{1}{|a|\sqrt{\pi}}## be replaced by ##\frac{1}{a^2\pi}## like in the equation I wrote above?
Yes, it makes sure the overall integral is one. For the relative density it doesn't matter of course (if all points use the same a).
 
  • Like
Likes person123
  • #19
(I think someone wrote a reply about using kernel estimation and then deleted it). I hadn't heard of that before, but I think using a normal kernel is equivalent to the "gaussian smearing" -- it's good to know the term and that it's a standard procedure. It also seems to be a built-in function on MATLAB (for 1D at least) using fitdist(data,'kernel','Kernel','normal'). I also think that the nearest neighbor approach is equivalent to a box kernel, and the inverse distance idea is the equivalent to an "inverse function kernel," which I'm guessing is never actually used because it blows up.
 
Back
Top