# Data analysis , I don't understand why this isn't a Gaussian nor a Ma

Gold Member

## Main Question or Discussion Point

Data "analysis", I don't understand why this isn't a Gaussian nor a Ma

I have downloaded all the elo ratings of all active chess players in May of the FIDE and I have made an histogram. I have plotted the result on a graph rating vs number of people with this rating.
I do not understand why the graph is not a Gaussian. It looks like a reversed Maxwell distribution to me but I do not understand why it is this way.
Picture of the graph can be found there: https://www.physicsforums.com/showpost.php?p=4401602&postcount=7.

It's not a homework, it's a question that has been grown up in me since last May.
If someone have some ideas on why the ratings spray this way, I am all ears.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
UltrafastPED
Gold Member
Why would you expect a Gaussian distribution?

D H
Staff Emeritus
It can't be Gaussian for the simple reason that elo ratings have a floor.

With regard to the somewhat weird shape: Selection bias. That is a histogram plot of "all the elo ratings of all active chess players in May of the FIDE" (emphasis mine). People who don't quite get the game tend to go for other pursuits. They don't remain active with FIDE. They might still play their children or grandchildren on occasion, but you don't need to be registered with FIDE to do that.

Gold Member
Why would you expect a Gaussian distribution?
I don't know. But it has a well definite graph, if I had to guess, I would have thought Gaussian.
But D H said it cannot be a Gaussian because it has a floor. Just like Maxwell's speed distribution.
It is somewhat similar to a Maxwell speed distribution but reversed (i.e. the peak is greater than the mean for the elo ratings and lower for the speed distribution.). I don't understand why it has that particular shape. There must be a reason to it that I am missing.

It can't be Gaussian for the simple reason that elo ratings have a floor.

With regard to the somewhat weird shape: Selection bias. That is a histogram plot of "all the elo ratings of all active chess players in May of the FIDE" (emphasis mine). People who don't quite get the game tend to go for other pursuits. They don't remain active with FIDE. They might still play their children or grandchildren on occasion, but you don't need to be registered with FIDE to do that.
Yes you are right, Kasparov does not appear in the list for example. But how does this create a bias? One could think that people of all strengths get "removed" from the rating list following the same pattern, more or less, as the shape of the graph. In other words out of say 100 players who stop to be active, 1 or 0 above 2600 get removed while many around 2000 get removed. So that the shape of the curve of all active players remains the same no matter what time of the year it is and is still representative of the proportion of players' strength.

Edit: I've downloaded all blitz games of the past month in FICS (free internet chess server), almost a million games and then I used bayeselo to calculate the "bayeselo" of all players. You can have a negative elo, so there's no floor and the shape looks like a Gaussian although I should regraph it better but I'm having troubles to do so. Picture attached.

#### Attachments

• 30.6 KB Views: 355
UltrafastPED
Gold Member
There are other distributions that look Gaussian at first glance - Lorentzian is one, but the tails are longer; there are other examples:

http://en.wikipedia.org/wiki/Fat-tailed_distribution

Rather than look at a graph you should calculate some statistics!

D H
Staff Emeritus
Apparently elo is designed to follow a logistic distribution rather than a Gaussian.

So why doesn't your initial curve look like a logistic distribution? (Note: A logistic distribution looks very similar to a Guassian, but with slightly longer tails.) I still think the answer is selection bias. If there was some national law that mandated that everyone had to play at least one rated game of chess a month you week see a curve that looked a lot closer to normal. But there isn't. People are free to stop playing if they wish. Think of which kind of players are more likely to stop playing, and which are more likely to stick with it.

Gold Member
Apparently elo is designed to follow a logistic distribution rather than a Gaussian.

So why doesn't your initial curve look like a logistic distribution? (Note: A logistic distribution looks very similar to a Guassian, but with slightly longer tails.) I still think the answer is selection bias. If there was some national law that mandated that everyone had to play at least one rated game of chess a month you week see a curve that looked a lot closer to normal. But there isn't. People are free to stop playing if they wish. Think of which kind of players are more likely to stop playing, and which are more likely to stick with it.
I've just read a "paper" that stated that the elo rating system used by FIDE follows a Gaussian distribution. (Page 8 of https://docs.google.com/viewer?a=v&...3RhY2hlc3NlbmdpbmV8Z3g6MzU4NDVjMDRkNDgyZDczNA).
It's also written in wikipedia:
Wiki The Great said:
FIDE still uses the normal distribution as the basis for rating calculations as suggested by Elo himself.[14]
(taken out from http://en.wikipedia.org/wiki/Elo_rating_system#Most_accurate_distribution_model).
I'm still thinking about this, I don't see why there would be a bias. I still don't understand why the shape of histogram is "far" from a Gaussian.