Why is the mode usually not as useful?

  • B
  • Thread starter Cheesycheese213
  • Start date
  • Tags
    Mode
In summary, the mode is not always a reliable measure of central tendency, especially in cases of asymmetric or bimodal distributions. The mean and median have more well-known theorems and statistical properties. The mode is most useful for categorical or nominal data and can be used to generate specific collections of numbers. The mode can also provide some predictability in situations where people are asked to choose from a set of numbers. However, in some cases like the example of birthdays, the mode may not accurately represent the distribution. Overall, the mean and median are more useful measures of central tendency than the mode.
  • #1
Cheesycheese213
55
8
I know that there are some cases where the mode just isn’t very helpful for finding central tendency, but I have never heard any real specific reason why other than it isn’t too reliable.
 
Physics news on Phys.org
  • #2
* In experimental data, the mode doesn't have to be anywhere close to most of the distribution
* Even in a theoretical model: You can have extremely asymmetric distributions, where the mode doesn't tell you much. The mode of an exponential distribution is 0. How does that help?
 
  • Like
Likes Cheesycheese213
  • #3
I think the mean, median, mode and range help to characterize the collection of numbers you have and so mode is useful in that sense. I could imagine someone wanting a special collection of numbers for software testing where they specify thes four values and you are left to generating the collection.

I’ve done something like this to create a fake customer transaction database for a web store. I created lookup lists of codes. The list matched the desired statistical breakdown and then I would generate a random index into the list to select a value for the record I was creating.

As an example, my list of grades might be ‘aabbbbcccccccccdddf’ so I’d query the list with a random key from 0 to 19 (the list has 20 elements I hope) to get a letter grade to assign to a student. The list was created using a median value of c and a mode of 9 c’s and an average slightly above a c.
 
  • #4
mfb said:
* Even in a theoretical model: You can have extremely asymmetric distributions, where the mode doesn't tell you much. The mode of an exponential distribution is 0. How does that help?

This basically is the maximum likelihood criterion though

(I am not a big fan of it, but still)
 
  • #5
Cheesycheese213 said:
I know that there are some cases where the mode just isn’t very helpful for finding central tendency, but I have never heard any real specific reason why other than it isn’t too reliable.

"Central tendency" is vague property unless you specify a quantitative measure for it. There are more well-known theorems about the mean of a probability distributions than about its mode. So if you use the mean value in your work, you have , in a manner of speaking, more guarantees about the performance of the tool than if you use the mode.
 
  • #6
Cheesycheese213 said:
I know that there are some cases where the mode just isn’t very helpful for finding central tendency, but I have never heard any real specific reason why other than it isn’t too reliable.
You can have bimodal distributions.

I typically only use the mode for categorical/nominal data.
 
  • #7
If you ask random persons to choose one of the numbers 1, 2, 3, and 4, based only on intuition, a prior knowledge of the mode of a large number of responses can perhaps yield some predictability of which of the numbers a newly asked person is more or less likely to choose.
 
  • #8
1) The Central Limit Theorem gives the mean a significance and statistical properties that the median and mode do not have.

2) Consider the example of the birthdays (1...365) in a group of people (see https://en.wikipedia.org/wiki/Birthday_problem ). Each birthday has a uniform distribution in (1...365).
Consider the mean of a sample of birthdays. One would want to say that a measure of the center of the distribution is at 365/2 = 182.5. That would be the mean. It tells you something about the distribution.

Now consider the mode. With a sample of fewer than 23 birthdays, there is probably not a mode since there are probably no duplicate birthdays. So that tells you nothing. With a simple of 23, there probably is one duplicate and a mode, but it can be anywhere in (1..365). With a sample of 31, there are probably two duplicates, so the mode is again undefined. With a significantly larger sample, there is probably a unique mode which is equally likely to be anywhere within (1..365). The mode is not very useful for indicating anything about the sample.

Now consider the median. The median would tend toward the mean as the sample grows. It has some use.
 
  • #9
FactChecker said:
Each birthday has a uniform distribution in (1...365).
It turns out birthdays don't have a uniform distribution. The mode tells you something about days with a higher number of births (e.g. 1.1., 2.2., 3.3., ...), but mean and median only give a comparison between the first and second half of the year. In addition, they depend on the arbitrary definition of the start of a year, while the mode does not.
 
  • #10
mfb said:
It turns out birthdays don't have a uniform distribution.
I'll buy that. I didn't think of that.
The mode tells you something about days with a higher number of births (e.g. 1.1., 2.2., 3.3., ...)
It tells you something about the one day with the highest number of births, but nothing about the other days.
but mean and median only give a comparison between the first and second half of the year.
I would say that the median tells you the location of the first half of the probability, not necessarily the first half of the year.
In addition, they depend on the arbitrary definition of the start of a year, while the mode does not.
I guess that the same things that influence the probability distribution of birth dates, and therefore the mode, would be reflected some way in both the mean and the median.
 
  • #11
FactChecker said:
I would say that the median tells you the location of the first half of the probability, not necessarily the first half of the year.
What I meant: A median or mean that is a few days before the middle of the year tells you the first half of the year has more births. The median also allows a rough estimate how many more births there are in the first half.
 
  • #12
An example where the mode tells you virtually nothing is the uniform distribution, which occurs (or is assumed) very often. No matter how large the sample size is, the mode will keep jumping around within the entire range of the distribution. Here is an example where the distribution is evenly distributed on the integers 0, 1, ..., 9. As the sample size increases to 1000 in increments of 10, the mean and median settle down quickly but the mode never does:
meanMedianModeExample.png
 

Attachments

  • meanMedianModeExample.png
    meanMedianModeExample.png
    7.3 KB · Views: 655
Last edited:

What is the mode in statistics?

The mode in statistics is the most frequently occurring value in a dataset. It is a type of measure of central tendency, along with the mean and median.

Why is the mode not as useful as the mean or median?

The mode may not be as useful as the mean or median because it does not take into account all the values in a dataset. It only considers the value that occurs the most, which can be misleading if there are outliers or a skewed distribution.

In what situations is the mode useful?

The mode is useful in situations where determining the most frequently occurring value is important, such as in categorical data or when finding the most popular option in a survey.

How is the mode calculated?

To calculate the mode, you simply count the number of times each value appears in a dataset and see which value occurs the most frequently. In some cases, there may be more than one mode if multiple values occur with the same frequency.

Can the mode be used with all types of data?

No, the mode is most useful with categorical or discrete data, where the values are distinct and cannot be averaged. It is not as useful with continuous data, where the values can take on a range of values and may not have a clear "most frequent" value.

Similar threads

  • Atomic and Condensed Matter
Replies
0
Views
312
  • Advanced Physics Homework Help
Replies
1
Views
922
  • Atomic and Condensed Matter
Replies
5
Views
2K
  • Classical Physics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
416
  • Introductory Physics Homework Help
Replies
2
Views
590
Replies
8
Views
2K
  • Electrical Engineering
Replies
1
Views
1K
Back
Top