FactChecker said:
A sample can have the mode occur at two, widely separated, values. It could also have multiple values that are closely tied and a more samples can make the mode jump around significantly.
As an extreme example, consider the uniform distribution on the real line between 0 and 1. What is its mode? What kind of behavior could you expect for the mode of a sample from that distribution?
This is true. However, that can, to a certain point, also be the case of the median when you have few entries in your sample.
Essentially, there are two different issues with "central tendency" measures. The first is whether the concept itself of central tendency has significance on the given distribution, and the second is the effects of statistical errors due to small sample. These are two different issues actually.
To address the first, if one even talks about a "central tendency", most of the time, one ASSUMES that the measured quantity is somehow "lumped" around a central value. This usually comes down to assuming that the distribution is a "single bump". If your distribution is made up of several "bumps", then the very notion of central tendency is questionable. For instance, if you're talking about body size of a mixed population of rats and dogs, where you have two or more bumps, namely one "around the average size of a rat" and then "several around the average sizes of different dog breeds, from a chiwawa to a Saint Bernard", what conceptual value could a central tendency measure actually have ?
So, even before considering WHAT central tendency measure could possibly be useful, the notion itself of central tendency must have a meaning, which includes the hypothesis that values are somehow "lumped around a central value", which comes down to assuming that the distribution is a "lump".
Once we make that hypothesis, three different estimators, namely sample mean, sample median, and sample mode, have different behaviours according to different properties of the original distribution and the sampling method.
If we have a small sample, this means two things:
1) we will have big statistical errors
2) the probability of getting "outliers" is small
the mean is the most reliable estimator, because it filters best the statistical noise. On a small sample, the heights of the different bins in a histogram are noisy, and the "highest one" could be relatively far away from the "central one" because of these fluctuations, so the mode is not appropriate. Also, the median is one of the sample values, and if you don't have many samples, the possibility that you are close to the "good" value is not very high either.
The bigger your sample gets, and the smaller the statistical noise, the better get these two other estimators such as mode and median, and the worse the mean can get, if there are outliers (that means, if the original distribution has "long tails"). See the Bill Gate example. Mode and median are not affected by rare outliers.
As to the median versus the mode, this will depend on the actual shape of the "bump". If the shape of the bump is "well-peaked", the mode may be a very good estimator. If however the bump is "flat-topped", then the median will do better. See the "uniform number distribution" example.
The median has the extra advantage of being a sample value, while the mode precision is depending on your chosen bin size of your histogram. If the distribution is rather symmetric, then both are good estimators. If your distribution is asymmetric, then you should think of why you need a central tendency. The mode will be closer to what has highest probability to happen, the median will be closer to "half has more, half has less".