Data analysis , I don't understand why this isn't a Gaussian nor a Ma

In summary, the conversation discusses the shape of a graph representing the elo ratings of all active chess players in May of FIDE. The individual is trying to understand why the graph is not a Gaussian distribution and why it resembles a reversed Maxwell distribution. Some possible explanations are selection bias, as the graph only includes active players and does not account for those who have stopped playing, and the fact that elo ratings have a floor, which prevents it from following a Gaussian distribution. Further research and analysis is needed to fully understand the shape of the graph.
  • #1
fluidistic
Gold Member
3,923
261
Data "analysis", I don't understand why this isn't a Gaussian nor a Ma

I have downloaded all the elo ratings of all active chess players in May of the FIDE and I have made an histogram. I have plotted the result on a graph rating vs number of people with this rating.
I do not understand why the graph is not a Gaussian. It looks like a reversed Maxwell distribution to me but I do not understand why it is this way.
Picture of the graph can be found there: https://www.physicsforums.com/showpost.php?p=4401602&postcount=7.

It's not a homework, it's a question that has been grown up in me since last May.
If someone have some ideas on why the ratings spray this way, I am all ears.
 
Physics news on Phys.org
  • #2
Why would you expect a Gaussian distribution?
 
  • #3
It can't be Gaussian for the simple reason that elo ratings have a floor.

With regard to the somewhat weird shape: Selection bias. That is a histogram plot of "all the elo ratings of all active chess players in May of the FIDE" (emphasis mine). People who don't quite get the game tend to go for other pursuits. They don't remain active with FIDE. They might still play their children or grandchildren on occasion, but you don't need to be registered with FIDE to do that.
 
  • #4
UltrafastPED said:
Why would you expect a Gaussian distribution?
I don't know. But it has a well definite graph, if I had to guess, I would have thought Gaussian.
But D H said it cannot be a Gaussian because it has a floor. Just like Maxwell's speed distribution.
It is somewhat similar to a Maxwell speed distribution but reversed (i.e. the peak is greater than the mean for the elo ratings and lower for the speed distribution.). I don't understand why it has that particular shape. There must be a reason to it that I am missing.


D H said:
It can't be Gaussian for the simple reason that elo ratings have a floor.

With regard to the somewhat weird shape: Selection bias. That is a histogram plot of "all the elo ratings of all active chess players in May of the FIDE" (emphasis mine). People who don't quite get the game tend to go for other pursuits. They don't remain active with FIDE. They might still play their children or grandchildren on occasion, but you don't need to be registered with FIDE to do that.
Yes you are right, Kasparov does not appear in the list for example. But how does this create a bias? One could think that people of all strengths get "removed" from the rating list following the same pattern, more or less, as the shape of the graph. In other words out of say 100 players who stop to be active, 1 or 0 above 2600 get removed while many around 2000 get removed. So that the shape of the curve of all active players remains the same no matter what time of the year it is and is still representative of the proportion of players' strength.


Edit: I've downloaded all blitz games of the past month in FICS (free internet chess server), almost a million games and then I used bayeselo to calculate the "bayeselo" of all players. You can have a negative elo, so there's no floor and the shape looks like a Gaussian although I should regraph it better but I'm having troubles to do so. Picture attached.
 

Attachments

  • ficsblitz.jpg
    ficsblitz.jpg
    30.6 KB · Views: 479
  • #5
There are other distributions that look Gaussian at first glance - Lorentzian is one, but the tails are longer; there are other examples:

http://en.wikipedia.org/wiki/Fat-tailed_distribution

Rather than look at a graph you should calculate some statistics!
 
  • #6
Apparently elo is designed to follow a logistic distribution rather than a Gaussian.

So why doesn't your initial curve look like a logistic distribution? (Note: A logistic distribution looks very similar to a Guassian, but with slightly longer tails.) I still think the answer is selection bias. If there was some national law that mandated that everyone had to play at least one rated game of chess a month you week see a curve that looked a lot closer to normal. But there isn't. People are free to stop playing if they wish. Think of which kind of players are more likely to stop playing, and which are more likely to stick with it.
 
  • #7
D H said:
Apparently elo is designed to follow a logistic distribution rather than a Gaussian.

So why doesn't your initial curve look like a logistic distribution? (Note: A logistic distribution looks very similar to a Guassian, but with slightly longer tails.) I still think the answer is selection bias. If there was some national law that mandated that everyone had to play at least one rated game of chess a month you week see a curve that looked a lot closer to normal. But there isn't. People are free to stop playing if they wish. Think of which kind of players are more likely to stop playing, and which are more likely to stick with it.

I've just read a "paper" that stated that the elo rating system used by FIDE follows a Gaussian distribution. (Page 8 of https://docs.google.com/viewer?a=v&...3RhY2hlc3NlbmdpbmV8Z3g6MzU4NDVjMDRkNDgyZDczNA).
It's also written in wikipedia:
Wiki The Great said:
FIDE still uses the normal distribution as the basis for rating calculations as suggested by Elo himself.[14]
(taken out from http://en.wikipedia.org/wiki/Elo_rating_system#Most_accurate_distribution_model).
I'm still thinking about this, I don't see why there would be a bias. I still don't understand why the shape of histogram is "far" from a Gaussian.
 

1. Why is this data not normally distributed?

There are a few possible reasons why data may not follow a normal (Gaussian) distribution. It could be due to a small sample size, outliers in the data, or the underlying population may not be normally distributed. It is important to check the assumptions of normality before conducting any statistical analysis and consider using alternative methods if the data is not normally distributed.

2. What does it mean if the data is not normally distributed?

If the data is not normally distributed, it means that the values are not evenly distributed around the mean and do not follow a bell-shaped curve. This can affect the results of statistical tests and may require alternative methods to be used. It is important to assess the normality of the data before drawing conclusions.

3. How can I determine if my data is normally distributed?

There are a few methods to determine if data follows a normal distribution. Some common techniques include visual inspection of a histogram or a Q-Q plot, statistical tests such as the Kolmogorov-Smirnov test or the Shapiro-Wilk test, and examining skewness and kurtosis values. It is important to use multiple methods and not rely on just one to assess normality.

4. Can I still use parametric tests if my data is not normally distributed?

In general, parametric tests assume that the data follows a normal distribution. However, if the sample size is large enough, the Central Limit Theorem may apply and the data can still be analyzed with parametric tests. If the data is not normally distributed and the sample size is small, non-parametric tests may be more appropriate.

5. Why is it important to have normally distributed data?

In many statistical analyses, it is assumed that the data follows a normal distribution. This allows for easier interpretation of results and the use of parametric tests, which tend to have higher power and efficiency. Having normally distributed data also helps to ensure the validity and reliability of the statistical tests used.

Similar threads

  • STEM Educators and Teaching
Replies
5
Views
652
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
2K
  • Other Physics Topics
Replies
5
Views
1K
Replies
3
Views
943
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
4K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
12
Views
3K
Back
Top