
#1
May2709, 12:27 PM

P: 31

It's easy to calculate the Euclidean distance between two points A and B.
Between 3 points we can do the Eucliean distance between AB, AC and BC, and then take the average of the three to find the "distance" (I don't know what it's called for more than 2 points!). But I have 8900 points, and to go about finding "how similar" the points are, in this manner, would take (8900 choose 2) = 39600550 calculations of the euclidean distance (not practical!) Is there a good way to find the 'similarity' between a large number of points ?? One way I considered was, finding the median (centre of mass) of the object whose boundary was formed by connecting the 8900 points, and then determining what the Euclidean distance is between this 'median' and the FARTHEST point (out of the 8900) from that median. This is not good if my 8900 points form a shape like a tight ball with one small sharp spike sticking out. Another method would maybe be finding the Euclidean distance between the median and all 8900 other points, and averaging this (8900 calculations is much less than 39600550). But how is this problem usually addressed ?? Can anyone please tell me what they know, or refer me to some literature ?? Thanks!!! 



#2
May2709, 12:45 PM

P: 1,784

How many dimensions do your points occupy?




#3
May2709, 01:18 PM

P: 31

Thank you for your response. The points are 2x2 matrices over the nonnegative integers.




#4
May2709, 01:51 PM

Emeritus
Sci Advisor
PF Gold
P: 16,101

'Distance' between MANY points ??
Well, the biggest problem is to figure out just what you really want to know. If you are able to figure that out, then that might give us some ideas about what calculation to do.
Anyways, just to throw random ideas out there, I believe it's actually pretty easy to compute the average of the squared distance between pairs of points, as well as the average squared distance from their mean  the method is to write this calculation as a sum of dot products, and then expand and reorganize te terms. 



#5
May2709, 02:25 PM

P: 31

To find the Euclidean distance between vectors V and U, i can do sqrt [(UV) dot (UV)]. But what do you mean by SUM of dot products ?? 



#6
May2709, 02:37 PM

P: 1,784

Do you have any background in programming? I'd think a computer program to do that shouldn't take more than about 5 minutes. (That is to run, not to write) 



#7
May2709, 03:24 PM

P: 31

There are 39 million pairs of matrices. Trusting that you've worked out the computational cost to around 5 minutes of runtime for this calculation, this would be far too slow. I'm doing this for scores of different 8900tuples of matrices. And then, after the 2x2 case, I will be doing 4x4, 8x8, 16x16, ... 1024x1024 !!! If it would take 5 minutes to calcualate the average distance between 39 million pairs of 2x2 matrices, I can't afford to wait days to run this program on 1024x1024 matrices! Also, I think there's some redundancy in calculating the distance between each pair. But I might be wrong. 



#8
May2709, 03:50 PM

P: 1,784

I said 5 minutes because I just finished writing a program to calculate the distance between each point and every other point for 1000 points defined by latitude and longitude. Your calculations will at least be with integers. Mine were with Lat. & Lon converted to radians and took about 2.5 seconds including reading in the data and converting to radians.
My calculation involved 500,000 pairs ( 1000^2 / 2) just as yours is 8900^2 / 2 = 39 million. I understand the 2x2 matrices but why do you then need to do 4x4, 8x8, etc.? How close do you think you'd be if you calculated the average x distance and the average y distance, and calculate the Euclidean distance between the two? 



#9
May2709, 04:07 PM

P: 31





#10
May2709, 04:21 PM

P: 1,784

Never mind, it doesn't work.




#11
Jun1509, 12:29 PM

P: 462

1) Distance between means  simple to calculate, works ok if both distributions are roughly the same scale and tightly grouped 2) Mahalanobis distance  good if distributions can be modeled well by hyperellipsoid, or come from multivariate normal distribution. This is generally the best. 3) UPGMA  inefficient, and discrimination ability is usually not improved compared to simpler methods 4) Single link, average link, and complete link  single link is useful if the distributions are highly irregular, but it breaks down for large numbers of points and is inherently sensitive to outliers 



#12
Jun1609, 04:19 PM

Math
Emeritus
Sci Advisor
Thanks
PF Gold
P: 38,904

I haven't read through all the posts in this thread but I just wanted to point out that when we talk about the "distance" between two "extended objects" (not just two points) we mean the minimum distance between two points in the two objects, NOT the average distance as suggested in the first post.




#13
Jun1609, 07:47 PM

P: 330

I believe what you want is the sample variance, which is the sum of squared distances from the mean, divided by n1 where n is the number of points.




#14
Jun2209, 08:05 AM

P: 8

Hi juliette sekx
I am not sure what you are trying to do. But if you are dealing with a cloud of points and seeking some relative distance property among these points, then you may to do a Delaunay triangulation (see also convex hull). Very efficient algorithms exist that can handle many million points in few seconds. I hope this will help and feel free to ask for references may you need any. 


Register to reply 
Related Discussions  
Distance between two points on some surface  General Math  2  
Calculating distance between two points in potential  Special & General Relativity  3  
Find distance between two points on Earth  General Math  8  
Locus of points with distance sum =8  Introductory Physics Homework  5  
average distance between points on a circle  General Math  2 