'Distance' between MANY points ?

  • Thread starter juliette sekx
  • Start date
  • Tags
    Points
In summary: There are 39 million pairs of matrices. Trusting that you've worked out the computational cost to around 5 minutes of runtime for this calculation, this would be far too slow. I'm doing this for scores of different 8900-tuples of matrices. And then, after the 2x2 case, I will be doing 4x4, 8x8, 16x16, ... 1024x1024 ! If it would take 5 minutes to calcualate the average distance between 39 million pairs of 2x2 matrices, I can't afford to wait days to run this program on 1024x1024 matrices!Also, I think there's some redundancy in calculating the
  • #1
juliette sekx
31
0
'Distance' between MANY points ??

It's easy to calculate the Euclidean distance between two points A and B.
Between 3 points we can do the Eucliean distance between AB, AC and BC, and then take the average of the three to find the "distance" (I don't know what it's called for more than 2 points!).

But I have 8900 points, and to go about finding "how similar" the points are, in this manner, would take (8900 choose 2) = 39600550 calculations of the euclidean distance (not practical!)

Is there a good way to find the 'similarity' between a large number of points ??

One way I considered was, finding the median (centre of mass) of the object whose boundary was formed by connecting the 8900 points, and then determining what the Euclidean distance is between this 'median' and the FARTHEST point (out of the 8900) from that median.

This is not good if my 8900 points form a shape like a tight ball with one small sharp spike sticking out.

Another method would maybe be finding the Euclidean distance between the median and all 8900 other points, and averaging this (8900 calculations is much less than 39600550).

But how is this problem usually addressed ?? Can anyone please tell me what they know, or refer me to some literature ?? Thanks!
 
Physics news on Phys.org
  • #2


How many dimensions do your points occupy?
 
  • #3


Thank you for your response. The points are 2x2 matrices over the non-negative integers.
 
Last edited:
  • #4


Well, the biggest problem is to figure out just what you really want to know. If you are able to figure that out, then that might give us some ideas about what calculation to do.




Anyways, just to throw random ideas out there, I believe it's actually pretty easy to compute the average of the squared distance between pairs of points, as well as the average squared distance from their mean -- the method is to write this calculation as a sum of dot products, and then expand and reorganize te terms.
 
  • #5


Hurkyl said:
Well, the biggest problem is to figure out just what you really want to know. If you are able to figure that out, then that might give us some ideas about what calculation to do.

My calcualtions are still in their infancy right now, so I'm just trying things out. All 8900 matrices should (in theory) be exactly the same, but of course, they have small differences. I want to quantitatively describe how 'close together' all these matrices are. This way when I look at the 8900 matrices produced using a different theory, I can do the same calculation on those 8900 matrices. And the theory that produces matrices that are 'more similar' or 'closer to being all the same' is the better theory. But to do this type of comparison, I need a robust way to measure 'how similar' 8900 points are to each other!

Hurkyl said:
the method is to write this calculation as a sum of dot products, and then expand and reorganize te terms.

I don't under stand what you mean by sum of dot products here :S
To find the Euclidean distance between vectors V and U, i can do sqrt [(U-V) dot (U-V)].
But what do you mean by SUM of dot products ??
 
  • #6


juliette sekx said:
The points are 2x2 matrices over the non-negative integers.

I'm sorry, I don't understand what you mean by the above. Do you want to find the average distance between each pair of points?

Do you have any background in programming? I'd think a computer program to do that shouldn't take more than about 5 minutes. (That is to run, not to write)
 
  • #7


skeptic2 said:
I'm sorry, I don't understand what you mean by the above. Do you want to find the average distance between each pair of points?

Do you have any background in programming? I'd think a computer program to do that shouldn't take more than about 5 minutes. (That is to run, not to write)

What I mean is that they are 2x2 matrices whose entries are non-negative integers. Let's just say, for now, that my distance between two matrices A and B is the 2x2 matrix where each entry is |Aij-Bij|. I can turn this into a scalar later on but this will do for now.

There are 39 million pairs of matrices. Trusting that you've worked out the computational cost to around 5 minutes of runtime for this calculation, this would be far too slow. I'm doing this for scores of different 8900-tuples of matrices. And then, after the 2x2 case, I will be doing 4x4, 8x8, 16x16, ... 1024x1024 ! If it would take 5 minutes to calcualate the average distance between 39 million pairs of 2x2 matrices, I can't afford to wait days to run this program on 1024x1024 matrices!

Also, I think there's some redundancy in calculating the distance between each pair. But I might be wrong.
 
  • #8


I said 5 minutes because I just finished writing a program to calculate the distance between each point and every other point for 1000 points defined by latitude and longitude. Your calculations will at least be with integers. Mine were with Lat. & Lon converted to radians and took about 2.5 seconds including reading in the data and converting to radians.

My calculation involved 500,000 pairs ( 1000^2 / 2) just as yours is 8900^2 / 2 = 39 million.

I understand the 2x2 matrices but why do you then need to do 4x4, 8x8, etc.?

How close do you think you'd be if you calculated the average x distance and the average y distance, and calculate the Euclidean distance between the two?
 
  • #9


skeptic2 said:
I understand the 2x2 matrices but why do you then need to do 4x4, 8x8, etc.?

Without getting too far off topic .. the 2x2 matrices would represent 1D CGR (chaos game representation) , 4x4 = 2D CGR, 2^k x 2^k = k-D CGR. It really doesn't matter though, the point I was trying to make was that calculating pairwise distances for 39 million pairs might be too computationally demanding for me. Also, my intuition says calc'ing all pairwise distances has some redundancy, although my ideas about that aren't fully crystallized.

skeptic2 said:
How close do you think you'd be if you calculated the average x distance and the average y distance, and calculate the Euclidean distance between the two?

Do you mean, I calculate the average A_{1,1} , the average A{1,2} , the averge A{2,1} and the average A_{2,2} and then summed up the squares of these averages and then square-rooted the result ??
 
  • #10


Never mind, it doesn't work.
 
  • #11


juliette sekx said:
Is there a good way to find the 'similarity' between a large number of points ??

There are many ways, which vary in their sensitivity to individual points (robustness) and ability to represent the clusters. The choice you make depends on what you expect the distribution of each cluster to look like. Some of the most common are

1) Distance between means -- simple to calculate, works ok if both distributions are roughly the same scale and tightly grouped
2) Mahalanobis distance - good if distributions can be modeled well by hyper-ellipsoid, or come from multivariate normal distribution. This is generally the best.
3) UPGMA - inefficient, and discrimination ability is usually not improved compared to simpler methods
4) Single link, average link, and complete link - single link is useful if the distributions are highly irregular, but it breaks down for large numbers of points and is inherently sensitive to outliers
 
  • #12


I haven't read through all the posts in this thread but I just wanted to point out that when we talk about the "distance" between two "extended objects" (not just two points) we mean the minimum distance between two points in the two objects, NOT the average distance as suggested in the first post.
 
  • #13


I believe what you want is the sample variance, which is the sum of squared distances from the mean, divided by n-1 where n is the number of points.
 
  • #14


Hi juliette sekx
I am not sure what you are trying to do. But if you are dealing with a cloud of points and seeking some relative distance property among these points, then you may to do a Delaunay triangulation (see also convex hull). Very efficient algorithms exist that can handle many million points in few seconds. I hope this will help and feel free to ask for references may you need any.
 

1. What is the distance between two points on a graph?

The distance between two points on a graph is the length of the straight line connecting the two points. This can be calculated using the Pythagorean theorem, which states that the square of the length of the hypotenuse (the line connecting the two points) is equal to the sum of the squares of the lengths of the other two sides.

2. How do you calculate the distance between multiple points?

To calculate the distance between multiple points, you can use the formula for finding the distance between two points and apply it to each pair of points. Then, you can add up all of the individual distances to find the total distance between all the points.

3. Can the distance between multiple points be negative?

No, the distance between multiple points cannot be negative. Distance is a measure of the length between two points and is always a positive value. If the distance calculated is negative, it is likely that there was an error in the calculation or that the points were not plotted correctly on the graph.

4. How can the distance between multiple points be used in real-world applications?

The distance between multiple points can be used in a variety of real-world applications, such as navigation systems, logistics and route planning, and geographical mapping. It can also be used in scientific research to measure the distance between objects or to track the movement of objects over time.

5. Are there different ways to measure the distance between multiple points?

Yes, there are different ways to measure the distance between multiple points depending on the context and the type of data being analyzed. Some alternative methods include using Manhattan distance, which measures the distance between points along vertical and horizontal lines, and Euclidean distance, which measures the shortest distance between points in a straight line.

Similar threads

  • Differential Geometry
Replies
1
Views
1K
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • Differential Geometry
Replies
9
Views
13K
  • Precalculus Mathematics Homework Help
Replies
20
Views
2K
Replies
13
Views
1K
Replies
5
Views
1K
Replies
12
Views
2K
  • Differential Geometry
Replies
1
Views
2K
  • Special and General Relativity
2
Replies
40
Views
2K
Back
Top