There are many ways, which vary in their sensitivity to individual points (robustness) and ability to represent the clusters. The choice you make depends on what you expect the distribution of each cluster to look like. Some of the most common are
1) Distance between means -- simple to calculate, works ok if both distributions are roughly the same scale and tightly grouped
2) Mahalanobis distance - good if distributions can be modeled well by hyper-ellipsoid, or come from multivariate normal distribution. This is generally the best.
3) UPGMA - inefficient, and discrimination ability is usually not improved compared to simpler methods
4) Single link, average link, and complete link - single link is useful if the distributions are highly irregular, but it breaks down for large numbers of points and is inherently sensitive to outliers