Quote by juliette sekx
Is there a good way to find the 'similarity' between a large number of points ??

There are many ways, which vary in their sensitivity to individual points (robustness) and ability to represent the clusters. The choice you make depends on what you expect the distribution of each cluster to look like. Some of the most common are
1) Distance between means  simple to calculate, works ok if both distributions are roughly the same scale and tightly grouped
2) Mahalanobis distance  good if distributions can be modeled well by hyperellipsoid, or come from multivariate normal distribution. This is generally the best.
3) UPGMA  inefficient, and discrimination ability is usually not improved compared to simpler methods
4) Single link, average link, and complete link  single link is useful if the distributions are highly irregular, but it breaks down for large numbers of points and is inherently sensitive to outliers