Data analysis , I don't understand why this isn't a Gaussian nor a Ma

Click For Summary

Discussion Overview

The discussion revolves around the shape of a histogram representing the Elo ratings of active chess players, questioning why it does not conform to a Gaussian distribution or a Maxwell distribution. Participants explore the implications of selection bias and the characteristics of different statistical distributions in relation to the data presented.

Discussion Character

  • Exploratory
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant expresses confusion about the histogram's shape, suggesting it resembles a reversed Maxwell distribution.
  • Another participant questions the expectation of a Gaussian distribution, prompting a discussion on the nature of Elo ratings.
  • It is noted that Elo ratings have a floor, which some argue precludes a Gaussian distribution.
  • Selection bias is proposed as a reason for the observed histogram shape, with the argument that less skilled players tend to drop out of the active player pool.
  • A participant suggests that the distribution of active players could remain consistent over time, regardless of player dropouts, potentially maintaining the histogram's shape.
  • One participant mentions that Elo ratings are designed to follow a logistic distribution rather than a Gaussian, raising questions about why the initial curve does not appear logistic.
  • Another participant references a paper claiming that the Elo rating system follows a Gaussian distribution, creating further discussion about the validity of this claim.

Areas of Agreement / Disagreement

Participants do not reach a consensus on why the histogram deviates from a Gaussian distribution. Multiple competing views are presented regarding the influence of selection bias, the characteristics of Elo ratings, and the nature of statistical distributions.

Contextual Notes

Participants reference various statistical distributions, including Gaussian and logistic, and discuss the implications of selection bias without resolving the underlying assumptions or definitions that may affect their arguments.

fluidistic
Gold Member
Messages
3,932
Reaction score
283
Data "analysis", I don't understand why this isn't a Gaussian nor a Ma

I have downloaded all the elo ratings of all active chess players in May of the FIDE and I have made an histogram. I have plotted the result on a graph rating vs number of people with this rating.
I do not understand why the graph is not a Gaussian. It looks like a reversed Maxwell distribution to me but I do not understand why it is this way.
Picture of the graph can be found there: https://www.physicsforums.com/showpost.php?p=4401602&postcount=7.

It's not a homework, it's a question that has been grown up in me since last May.
If someone have some ideas on why the ratings spray this way, I am all ears.
 
Physics news on Phys.org
Why would you expect a Gaussian distribution?
 
It can't be Gaussian for the simple reason that elo ratings have a floor.

With regard to the somewhat weird shape: Selection bias. That is a histogram plot of "all the elo ratings of all active chess players in May of the FIDE" (emphasis mine). People who don't quite get the game tend to go for other pursuits. They don't remain active with FIDE. They might still play their children or grandchildren on occasion, but you don't need to be registered with FIDE to do that.
 
UltrafastPED said:
Why would you expect a Gaussian distribution?
I don't know. But it has a well definite graph, if I had to guess, I would have thought Gaussian.
But D H said it cannot be a Gaussian because it has a floor. Just like Maxwell's speed distribution.
It is somewhat similar to a Maxwell speed distribution but reversed (i.e. the peak is greater than the mean for the elo ratings and lower for the speed distribution.). I don't understand why it has that particular shape. There must be a reason to it that I am missing.


D H said:
It can't be Gaussian for the simple reason that elo ratings have a floor.

With regard to the somewhat weird shape: Selection bias. That is a histogram plot of "all the elo ratings of all active chess players in May of the FIDE" (emphasis mine). People who don't quite get the game tend to go for other pursuits. They don't remain active with FIDE. They might still play their children or grandchildren on occasion, but you don't need to be registered with FIDE to do that.
Yes you are right, Kasparov does not appear in the list for example. But how does this create a bias? One could think that people of all strengths get "removed" from the rating list following the same pattern, more or less, as the shape of the graph. In other words out of say 100 players who stop to be active, 1 or 0 above 2600 get removed while many around 2000 get removed. So that the shape of the curve of all active players remains the same no matter what time of the year it is and is still representative of the proportion of players' strength.


Edit: I've downloaded all blitz games of the past month in FICS (free internet chess server), almost a million games and then I used bayeselo to calculate the "bayeselo" of all players. You can have a negative elo, so there's no floor and the shape looks like a Gaussian although I should regraph it better but I'm having troubles to do so. Picture attached.
 

Attachments

  • ficsblitz.jpg
    ficsblitz.jpg
    30.6 KB · Views: 543
There are other distributions that look Gaussian at first glance - Lorentzian is one, but the tails are longer; there are other examples:

http://en.wikipedia.org/wiki/Fat-tailed_distribution

Rather than look at a graph you should calculate some statistics!
 
Apparently elo is designed to follow a logistic distribution rather than a Gaussian.

So why doesn't your initial curve look like a logistic distribution? (Note: A logistic distribution looks very similar to a Guassian, but with slightly longer tails.) I still think the answer is selection bias. If there was some national law that mandated that everyone had to play at least one rated game of chess a month you week see a curve that looked a lot closer to normal. But there isn't. People are free to stop playing if they wish. Think of which kind of players are more likely to stop playing, and which are more likely to stick with it.
 
D H said:
Apparently elo is designed to follow a logistic distribution rather than a Gaussian.

So why doesn't your initial curve look like a logistic distribution? (Note: A logistic distribution looks very similar to a Guassian, but with slightly longer tails.) I still think the answer is selection bias. If there was some national law that mandated that everyone had to play at least one rated game of chess a month you week see a curve that looked a lot closer to normal. But there isn't. People are free to stop playing if they wish. Think of which kind of players are more likely to stop playing, and which are more likely to stick with it.

I've just read a "paper" that stated that the elo rating system used by FIDE follows a Gaussian distribution. (Page 8 of https://docs.google.com/viewer?a=v&...3RhY2hlc3NlbmdpbmV8Z3g6MzU4NDVjMDRkNDgyZDczNA).
It's also written in wikipedia:
Wiki The Great said:
FIDE still uses the normal distribution as the basis for rating calculations as suggested by Elo himself.[14]
(taken out from http://en.wikipedia.org/wiki/Elo_rating_system#Most_accurate_distribution_model).
I'm still thinking about this, I don't see why there would be a bias. I still don't understand why the shape of histogram is "far" from a Gaussian.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 29 ·
Replies
29
Views
7K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
9K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K