Computing the Density of a Point Cloud

Click For Summary

Discussion Overview

The discussion revolves around methods for computing the density of a point cloud derived from molecular dynamics simulations of silica fracture. Participants explore various mathematical approaches to determine regions of high bond breakage, which may indicate crack formation, using data processed in MATLAB.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant suggests summing the inverse of the distance from a specified location to each point to compute density, questioning if there is a standard method for this computation.
  • Another proposes using the delta function to represent point locations and integrate to find density, but later expresses confusion about the resulting output being a single value rather than a function of location.
  • A different participant critiques the average of inverse distance as potentially misleading, advocating for a weighted sum instead.
  • One participant mentions the use of a nearest neighbors method to count points within a cutoff distance, noting that varying the cutoff affects the granularity of the density distribution.
  • Another suggests a coarse-grained approach using volume elements to avoid issues with zero distance in calculations.
  • Several participants discuss the idea of convolving delta functions with a Gaussian to create a smoothed density function, emphasizing the benefits of avoiding spikes from individual points.
  • One participant expresses uncertainty about the Gaussian approach but recognizes its potential for providing a continuous density function.
  • Another mentions that adjusting the Gaussian width based on local density is a standard approach, although they are unsure of its implementation in MATLAB.
  • One participant references the approximation of delta distributions by Gaussian functions, suggesting that this could be a viable method for density representation.

Areas of Agreement / Disagreement

Participants express a variety of methods for computing density, with no consensus on a single approach. There are differing opinions on the effectiveness and physical meaning of the proposed methods, indicating ongoing debate and exploration of the topic.

Contextual Notes

Participants highlight limitations such as the potential for misleading results when using certain functions (e.g., inverse distance) and the challenges of defining appropriate cutoff distances or Gaussian widths. The discussion reflects a range of assumptions and conditions that may affect the outcomes of the proposed methods.

person123
Messages
326
Reaction score
52
TL;DR
I have a point cloud representing broken bonds and I would like to represent the density of the cloud at some location in space.
Hi. For some background, I am running molecular dynamics simulations of silica fracture in LAMMPS. Each point represents the location of a broken bond. I would like to find regions where many bonds are breaking, which I speculate would be locations of crack formation. These computations are being done on MATLAB using data from the LAMMPS simulation.

I could come up with a way to compute the density. For example, I could sum the inverse of the distance from a specified location and each point. The sum would represent the density at that specified location. However, I was wondering if there was some standard and simple way of computing density of a point cloud, or one which would be particularly relevant considering the application. Thank you!
 
  • Like
Likes   Reactions: Delta2
Physics news on Phys.org
How about using ##\delta## function ?
\rho=\sum_i \delta(x-x_i)\delta(y-y_i)\delta(z-z_i)
the average of inverse of the distance from Origin is
\sum_i \int dx \int dy \int dz \frac{\delta(x-x_i)\delta(y-y_i)\delta(z-z_i)}{\sqrt{(x-x_i)^2+(y-y_i)^2+(z-z_i)^2}}
where ##(x_i,y_i,z_i)## is position of i-th fracture.
 
I don't think average of inverse distance would be physically meaningful because it wouldn't take into account the number of points (if there was one broken bond very close to the selected location, it would be represented as very dense location even though there would be essentially no crack formation). It might be more reasonable to just take the weighted sum instead. Wouldn't the equation just simplify to this?
$$\sum_i \frac{1}{\sqrt{(x-x_i)^2+(y-y_i)^2+(z-z_i)^2}}$$
I'm a bit confused about the integrals. I'm assuming they are definite integrals which would (at least in theory) integrate from ##-\infty## to ##\infty## so the ##\delta## function would be a real value. However, wouldn't this then return a single value as opposed to a function of ##<x,y,z>## (the specified location)?
 
I have scarce knowledge of the physics of your system, so the quantity of your interested in general might be
\frac{1}{V}\int\int\int_V dxdydz f(\ \sqrt{(x-x_i)^2+(y-y_i)^2+(z-z_i)^2}\ )
where (x,y,z) is the coordinate of the point in system volume V and f is function of your interest, e.g. inverse of the distance of the point and point of broken bond.
 
I realized that one issue with using a function with an asymptote like the inverse function is if the selected location happens to be very close to one of the points, the value can become arbitrarily large even if there's just a single point.

I tried using a nearest neighbors method, computing the number of points which are within a certain cutoff distance from the selected location. Changing the cutoff changes how fine grained the density distribution is. This seems to work reasonably well.
 
If your system is lattice-like, the integration would be replaced with sum i.e.
\frac{1}{V}\int\int\int_V dxdydz \rightarrow \frac{v}{V}\sum_{(l,m,n)\ \ \neq(i,k,j)}
where summation is done on (l,m,n) which denote all the cell, (i,k,j) denotes the lattice point of fracture and v is volume of a cell. We can avoid zero distance so infinity here.
 
I'm still a bit confused with the expressions you're writing. I want a computation which has the coordinate of some location in space as an input and the density at that location as an output. If it's a function, it should be a function of that location. As far as I can tell what you're writing would give a single number instead.
 
I misunderstood that you needed a value for a system. So let us go back to #2 density of fracture
\rho(x,y,z)=\sum_{i} \delta(x-x_i)\delta(y-y_i)\delta(z-z_i)
Dividing space with cells numbered j, coarse grained fracture number density of the j-th cell whose volume is v_j,
\rho_j=\frac{1}{v_j}\int_{v_j} dv_j \sum_{i} \delta(x-x_i)\delta(y-y_i)\delta(z-z_i)
might work for your evaluation avoiding zero-distance.
 
What if we represent each point by an impulse (##\delta##) as discussed above, but then you convolve (smear out) these impulses with a fairly wide gaussian -- wide enough to span a large number of points? This would give you a smoothed out function that would reflect the neighborhood density, but it won't get kicked up by individual points.

Simply put, a kind of 3-D running average with gaussian window. But since each data point has the same magnitude, the running average could easily be scaled to tell us the density.
 
  • Wow
Likes   Reactions: person123
  • #10
Swamp Thing said:
What if we represent each point by an impulse (##\delta##) as discussed above, but then you convolve (smear out) these impulses with a fairly wide gaussian -- wide enough to span a large number of points? This would give you a smoothed out function that would reflect the neighborhood density, but it won't get kicked up by individual points.

Simply put, a kind of 3-D running average with gaussian window. But since each data point has the same magnitude, the running average could easily be scaled to tell us the density.
I'm not sure if I fully understand, but I take it that I would represent each data point with a gaussian, then take the sum (scaling it to find the density). This would give a function representing the density at each point in space. Here's what I'm imagining in one dimension:
gaussian.jpg

I imagine this would work very well, and it wouldn't blow up around the point. It would also be more continuous than my nearest neighbor approach.
 
  • #11
person123 said:
Here's what I'm imagining in one dimension:

Yes, that's the idea.
 
  • #12
OK, I think this would be the calculation then. I'm going to do it for the 2-dimensional case. ##s## will be a parameter which can be adjusted to change how fine-grained the distribution is.

First I would compute the integral of the function so I can normalize it:
$$\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{\frac{-x^2-y^2}{s^2}}dxdy=\pi s^2$$

Then, the density would be the following:
$$d(x,y)=\frac{1}{\pi s^2}\sum_i e^{\frac{-(x-x_i)^2-(y-y_i)^2}{s^2}}$$

I think my two approaches for calculating density are:
  • this method, representing each point with a gaussian
  • finding the number of points within a certain range of the selected location
This might be an overly vague question, but which more method would be more accepted/ more physically meaningful/ more straight-forward/ less likely to run into issues/ overall better? I think an issue with using the range method is there are jumps when points come into range, which show up as faint rings around the point. However, I think what the value represents is a bit more straight-forward than for the gaussian method.
 
  • #13
There are tools that do this smearing as it is a standard approach, and they also choose the width of the Gaussian for each point based on the local density of points (denser points -> smaller width to be more sensitive to local changes in the density). I don't know if the latter is done recursively or with other methods. You want the width to be larger than the typical separation so you are not too sensitive to individual entries, but it shouldn't be too much larger, otherwise you are blurring the overall distribution.
 
  • Like
Likes   Reactions: person123
  • #14
That's good to know that it's a standard approach. I could definitely write MATLAB code to do this (assuming my equation is correct). Adjusting the Gaussian width depending on local density seems to be a lot more difficult; I'm curious if MATLAB has any built-in functions for this (I couldn't find anything in the documentation).
 
  • #15
Don't know about MATLAB, I know ROOT has something. I don't know how it is called, but I have seen colleagues using it long ago.
 
  • #16
Since

\delta (\alpha x)=\lim _{a\to 0}{\frac {1}{|a|{\sqrt {\pi }}}}e^{-(\alpha x/a)^{2}}

Precise density of delta distribution is approximately represented by Gaussian. You may use Gaussian with width parameter a and finally adjust it.
 
Last edited:
  • #17
anuttarasammyak said:
Since

\delta (\alpha x)=\lim _{a\to 0}{\frac {1}{|a|{\sqrt {\pi }}}}e^{-(\alpha x/a)^{2}}

Precise density of delta distribution is approximately represented by Gaussian. You may use Gaussian with width parameter a and finally adjust it.
In the 2D case, would ##\frac{1}{|a|\sqrt{\pi}}## be replaced by ##\frac{1}{a^2\pi}## like in the equation I wrote above?

I wrote it on MATLAB, and it seems to work very well, clearly indicating where the crack formed. I think it will also be much easier to analyze than the number of neighbors function because it's continuous. If this is a standard approach, I think it definitely makes sense to use it for analysis.
 
Last edited:
  • Like
Likes   Reactions: anuttarasammyak
  • #18
person123 said:
In the 2D case, would ##\frac{1}{|a|\sqrt{\pi}}## be replaced by ##\frac{1}{a^2\pi}## like in the equation I wrote above?
Yes, it makes sure the overall integral is one. For the relative density it doesn't matter of course (if all points use the same a).
 
  • Like
Likes   Reactions: person123
  • #19
(I think someone wrote a reply about using kernel estimation and then deleted it). I hadn't heard of that before, but I think using a normal kernel is equivalent to the "gaussian smearing" -- it's good to know the term and that it's a standard procedure. It also seems to be a built-in function on MATLAB (for 1D at least) using fitdist(data,'kernel','Kernel','normal'). I also think that the nearest neighbor approach is equivalent to a box kernel, and the inverse distance idea is the equivalent to an "inverse function kernel," which I'm guessing is never actually used because it blows up.
 

Similar threads

Replies
7
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 21 ·
Replies
21
Views
8K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 13 ·
Replies
13
Views
3K