Why is the mode usually not as useful?

Cheesycheese213 · Jan 24, 2018

I know that there are some cases where the mode just isn’t very helpful for finding central tendency, but I have never heard any real specific reason why other than it isn’t too reliable.

mfb · Jan 24, 2018

* In experimental data, the mode doesn't have to be anywhere close to most of the distribution
* Even in a theoretical model: You can have extremely asymmetric distributions, where the mode doesn't tell you much. The mode of an exponential distribution is 0. How does that help?

jedishrfu · Jan 24, 2018

I think the mean, median, mode and range help to characterize the collection of numbers you have and so mode is useful in that sense. I could imagine someone wanting a special collection of numbers for software testing where they specify thes four values and you are left to generating the collection.

I’ve done something like this to create a fake customer transaction database for a web store. I created lookup lists of codes. The list matched the desired statistical breakdown and then I would generate a random index into the list to select a value for the record I was creating.

As an example, my list of grades might be ‘aabbbbcccccccccdddf’ so I’d query the list with a random key from 0 to 19 (the list has 20 elements I hope) to get a letter grade to assign to a student. The list was created using a median value of c and a mode of 9 c’s and an average slightly above a c.

StoneTemplePython · Jan 24, 2018

mfb said:

* Even in a theoretical model: You can have extremely asymmetric distributions, where the mode doesn't tell you much. The mode of an exponential distribution is 0. How does that help?

This basically is the maximum likelihood criterion though

(I am not a big fan of it, but still)

Stephen Tashi · Jan 25, 2018

Cheesycheese213 said:

I know that there are some cases where the mode just isn’t very helpful for finding central tendency, but I have never heard any real specific reason why other than it isn’t too reliable.

"Central tendency" is vague property unless you specify a quantitative measure for it. There are more well-known theorems about the mean of a probability distributions than about its mode. So if you use the mean value in your work, you have , in a manner of speaking, more guarantees about the performance of the tool than if you use the mode.

Dale · Jan 25, 2018

Cheesycheese213 said:

I know that there are some cases where the mode just isn’t very helpful for finding central tendency, but I have never heard any real specific reason why other than it isn’t too reliable.

You can have bimodal distributions.

I typically only use the mode for categorical/nominal data.

sysprog · Apr 3, 2018

If you ask random persons to choose one of the numbers 1, 2, 3, and 4, based only on intuition, a prior knowledge of the mode of a large number of responses can perhaps yield some predictability of which of the numbers a newly asked person is more or less likely to choose.

FactChecker · Apr 3, 2018

1) The Central Limit Theorem gives the mean a significance and statistical properties that the median and mode do not have.

2) Consider the example of the birthdays (1...365) in a group of people (see https://en.wikipedia.org/wiki/Birthday_problem ). Each birthday has a uniform distribution in (1...365).
Consider the mean of a sample of birthdays. One would want to say that a measure of the center of the distribution is at 365/2 = 182.5. That would be the mean. It tells you something about the distribution.

Now consider the mode. With a sample of fewer than 23 birthdays, there is probably not a mode since there are probably no duplicate birthdays. So that tells you nothing. With a simple of 23, there probably is one duplicate and a mode, but it can be anywhere in (1..365). With a sample of 31, there are probably two duplicates, so the mode is again undefined. With a significantly larger sample, there is probably a unique mode which is equally likely to be anywhere within (1..365). The mode is not very useful for indicating anything about the sample.

Now consider the median. The median would tend toward the mean as the sample grows. It has some use.

mfb · Apr 3, 2018

FactChecker said:

Each birthday has a uniform distribution in (1...365).

It turns out birthdays don't have a uniform distribution. The mode tells you something about days with a higher number of births (e.g. 1.1., 2.2., 3.3., ...), but mean and median only give a comparison between the first and second half of the year. In addition, they depend on the arbitrary definition of the start of a year, while the mode does not.

FactChecker · Apr 3, 2018

mfb said:

It turns out birthdays don't have a uniform distribution.

I'll buy that. I didn't think of that.

The mode tells you something about days with a higher number of births (e.g. 1.1., 2.2., 3.3., ...)

It tells you something about the one day with the highest number of births, but nothing about the other days.

but mean and median only give a comparison between the first and second half of the year.

I would say that the median tells you the location of the first half of the probability, not necessarily the first half of the year.

In addition, they depend on the arbitrary definition of the start of a year, while the mode does not.

I guess that the same things that influence the probability distribution of birth dates, and therefore the mode, would be reflected some way in both the mean and the median.

mfb · Apr 4, 2018

FactChecker said:

I would say that the median tells you the location of the first half of the probability, not necessarily the first half of the year.

What I meant: A median or mean that is a few days before the middle of the year tells you the first half of the year has more births. The median also allows a rough estimate how many more births there are in the first half.

FactChecker · Apr 4, 2018

An example where the mode tells you virtually nothing is the uniform distribution, which occurs (or is assumed) very often. No matter how large the sample size is, the mode will keep jumping around within the entire range of the distribution. Here is an example where the distribution is evenly distributed on the integers 0, 1, ..., 9. As the sample size increases to 1000 in increments of 10, the mean and median settle down quickly but the mode never does:

Why is the mode usually not as useful?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Attachments

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect