Rmsd-based clustering, cluster properties

noplacebos · Oct 6, 2009

Greetings,

I have a quick question that could be trivial, but I am scratching my head for weeks now without being able to find anything concrete in books/papers/web.

I have completed a partitional clustering of a dataset (vectors) using Root Mean Square Deviation as my distance metric. Leaving all other details on the clustering method aside, data points were being assigned to a cluster if their RMSD to the cluster's representative was below a predefined threshold. Let's say that this threshold is 1.

My question is on the resulting clusters' shape, and the general properties of RMSD as a proximity function. Are my clusters spherical (globular) because of this RMSD threshold, do they actually have a radius of 1? Can I assume by default than any two data points within a cluster will have a pairwise RMSD of less than 2 (diameter)?

I am telling myself that as RMSD doesn't exactly reflect the sum of euclidean distances between two vectors, the true cluster's shape may lie in some multidimensional space. If I apply a Multidimensional Scaling down to 3 or 2 dimensions, should I be expecting a globular cluster shape, this time? Does it depend on the nature of the initial vectors (eg. number of parameters)?
Please excuse all these questions but I remain very confused on this matter. Any pointing to a direction would be most helpful.

Thank you very much for your time.

mmwave · Oct 6, 2009

Thank you for your question regarding the shape and properties of clusters formed using Root Mean Square Deviation (RMSD) as a distance metric. I understand your confusion and would be happy to provide some insights and direction on this matter.

Firstly, it is important to note that the shape of the clusters formed using any distance metric is dependent on the nature of the data and the clustering algorithm used. In your case, the clusters may appear spherical or globular due to the RMSD threshold of 1 being used. This means that any data point within a cluster will have an RMSD of less than 1 to the cluster's representative. However, this does not necessarily mean that the clusters have a radius of 1. The size and shape of the clusters will also depend on the distribution and spread of your data points. It is possible that some clusters may have a larger or smaller radius than 1, depending on the variability of the data within that cluster.

As for the pairwise RMSD within a cluster, it is correct to assume that any two data points will have an RMSD of less than 2 as long as they are assigned to the same cluster. However, this does not mean that the clusters are perfectly spherical or that all data points within a cluster are equidistant from each other. The pairwise RMSD is simply a measure of the average distance between data points within a cluster, and the actual distances may vary depending on the distribution of the data.

In terms of using Multidimensional Scaling to visualize the clusters in a lower dimension, it is possible that the clusters may appear more globular in shape. However, this also depends on the nature of your data and the number of dimensions you are reducing to. It is important to note that MDS is a visualization tool and may not accurately represent the true shape of the clusters in higher dimensions.

In conclusion, the shape and properties of your clusters will depend on various factors such as the distance metric, clustering algorithm, and the nature of your data. It is important to carefully consider these factors when interpreting your results. I hope this helps to clarify your confusion and I wish you all the best with your research.

Rmsd-based clustering, cluster properties

1. What is RMSD-based clustering?

2. How does RMSD-based clustering work?

3. What are some properties of clusters in RMSD-based clustering?

4. What are the advantages of using RMSD-based clustering?

5. What are some potential limitations of RMSD-based clustering?

Similar threads

Hot Threads

Recent Insights