ELO chess ranking system applied incorrectly in video games

  1. Are there any chess players, statisticians, or smart people who are familiar with the ELO ranking system? This concerns a game with at least over 100,000 players, so its not an obscure concern.

    Basically, I play an online video game that tries to apply an ELO ranking system to rank individual player skill, however its a multiplayer team-oriented game. I think the application of ELO in this case is severely flawed, and it feels so obvious to me logically; however, a way to prove it is not immediately obvious to me and I would like to get input before I venture into calculations, simulations, and models.

    I'll try to describe the method and why I think its invalid, and then the supporters' reasons for why its valid.

    The games consist of 2 teams, each with 5 players. The gameplay is heavily dependent on team performance, where 1 player can influence the outcome at times, but in general, it takes a team effort to outperform another team. Imagine it being basketball if you want. So, if you are on a team with bad players, you are severely disadvantaged. At the end of the game, all players of the winning team receive a boost to their ELO rating, while players of the losing team receive a deduction from their rating, independent of each player's individual performance.

    Now, one of the key flaws I see is that the system applies the team performance to an individual's ranking. If the teams were always constant, as in the players being rated were always on the same team together, this would actually work, because then the ELO rating is applied to the team, rather than the individual, and the team has a record of statistical datapoints that are valid. Say some lucky bad player plays with 4 professional players on his team every game, then the rating he has should not represnt his personal skill, but rather his team's skill.

    In the game I'm playing, each game, or trial if you can call it that, the teams are completely randomized with players that could be very experienced or players that have never played the game before. The system finds 5 players on a team, and it picks 2 teams of random players that have the same average rating (in an attempt to make the matchup fair). So each game a player plays, his team has changed completely, but the end of game results (win or lose) are applied to the individual player and so that random team's performance is applied as a datapoint for someone's individual, non-random, statistical performance. Does this start to sound flawed? The dataset being compiled is basically random events, since the players are picked out of a random pool with an average ranking value.

    The argument in support of this method is that A.) the player pool to make teams is random and B.) the dependent variable in all trials is that the individual player was an influence in every game (in other words, the common element to all the random games is the player being rated was a participant in all of them). And with these premises, the individual's influence should begin to average out above the randomness of his team makeup in each event. So with a sufficient enough amount of games played, the accumulation of performance of the random team matches that he participated in should start to reflect his own performance.

    Now, my argument is this: You cannot base an individual's ranking based on how randomized teams he plays on perform. The +/- received from a win or loss should only be applied to the team performance, rather than an individual's performance. I would also argue that the distribution of player skill in the randomized teams and the fact that the player is only 1/10th of the participants in a match, that the "averaging" effect that they think should surface is completely drowned out by "noise". I think of it as a signal to noise ratio analogy, and the 9 other players are the noise floor, and the individual player is below this noise threshold and his influence cannot be measured.

    Sorry if this doesn't belong here or doesn't make much sense, but if you have any experience or thoughts on this I'd appreciate comments.
  2. jcsd
  3. The result will be a progressively smaller number of people whose scores are increasingly skewed.

    For instance, suppose the following variation on the traditional ELO system where you are scored by your own games only. However, after each game a coin is flipped and the loser of the coin toss gives one ELO point to the winner.

    At the end of the first round, half of the players are 1 point ahead, half are 1 point behind. At the end of the second round, half will have correct ELO scores, a quarter will be 2 points ahead, a quarter 2 points behind. Etc.
    Last edited: Feb 10, 2012
  4. I think one thing to remember is that the elo in a team game would not reflect how good of a player you are but more so how well you work with a team. It would be possible to add some factors that would allow it to value how good of a player you are. It's possible that would lead to people just trying to work the system to get scores that reflects that they are good.

    With just about any elo system the score will become more accurate the more games you play have you taken this into account?
    Last edited: Feb 10, 2012
  5. If I understand the OP, you don't play as a team, you play as a group of 5 individuals. It is only the rating points that are distributed on the basis of the team's result. When you win a game in the traditional ELO system, you get some of your opponent's points. In this system you only get your opponent's points if your team wins. Say you defeat your opponent and the other 4 members of your team lose their games. In the traditional system you would win points. In this system you lose points.
  6. Matterwave

    Matterwave 3,774
    Science Advisor
    Gold Member

    It seems to me that Jimmy's conclusions are reasonable. For the majority, with sufficiently large player pool, will average out their scores to reasonable levels. A small minority may have gotten exceptionally lucky or unlucky and was always paired with players above or below their skill level consistently and will have a skewed rating.
  7. Hi all, thanks for the replies.

    You do play as a team. Its a strategic game where 5 players play together to beat the other team. My problem is that you are then rated as an individual based on your team's performance, and your team changes every match and is randomly chosen.
  8. fluidistic

    fluidistic 3,286
    Gold Member

    I also believe elo ranking system is flawed in such games (like football-soccer too).
    I played for a few months a video game in which teams remained fixed even after a game would end. We would create "clans" and were teamed up only with people of our clan vs other clans. The number of players wasn't fixed, a game could start 2 vs 5 if there was 5 people from 1 clan and 2 people from the other clan.
    I was stronger than the average guy in my team; when the season of clans ended my elo rating went up by almost 200 points and I entered the top 50 or so. However my skills remained the same or almost, so that my elo didn't mean absolutely anything reliable.

    In one of my first games (elo 1500), I played a 3 vs 3 against the strongest guy of the video game (elo around 2200). I managed to beat him almost exclusively in 1 vs 1 but my team mates failed to win the game. Result of the game: I lost elo points, the strongest guy of the video game gained elo... though I personally did better than him in that particular game. Such a ranking system cannot be right. In my opinion elo isn't suited for team games.
    P.S.:In that game if you die entirely but your team manages to win the game, you "win" for the ranking system.
  9. One thing to remember: rating systems are meant to be a predictor, not a sure thing. (and from the sounds of it - OP is talking about League of Legends?). Even with team games, over a long enough period of time, a k-weighted ELO system should be relatively accurate as long as ratings are used in their appropriate pool. Multiplayer ratings used for randomly assigned teams etc. Also, something to note for League of Legends - they use a personal rating (hidden to you) to help granulate some match making even further.

    Magic: The Gathering (and several other 1v1 tabletop games) have all recently dropped the ELO rating system for their competitions. Over a long enough period of time, the ratings became a bit unwieldy (because they weren't using them for match making, just large scale comparisions). People would sit on their ratings for large events forgoing small events (ie: a sufficiently high rated player would need to go undefeated to not lose significant points at a smaller event). There is also the thought that ELO ratings don't accurately represent games with a chance element.

    Finally, especially in a team game something that is being discounted is your skill as a team player. Sure, 1v1 you may beat someone - but 2v2 that other individual may be far better at utilizing his partner. I think that you're discounting "plays well with others" as a measured skill in this case. To your soccer analogy: your striker may have the best kick in the game, but if he's a ball-hog things may not bode well. But again - over a long period of time, in a randomly-selected team environment, the 'better' individual will have a higher rating.
  10. How do you play as a team? Is there a single game and the 5 of you vote on which move to play? Or do you take turns making moves? Can you clarify for me what it means to play a chess game as a team?
  11. Ah, League of Legends. My little brother just got me into that game.

    I guess the main counterargument to the OP is that the high ELO players are legitimately the best players, and the low ELO players are generally crap. I disagree that your performance in the game will get drowned out by the others in the long run. We've all had games where one guy on either team basically solos the entire game, and we've all had games where some horrible player on one of the team just feeds the opposing carries.

    But, my main point is that players with high ELOs exist, and that is the biggest counterexample to your argument.
  12. It's a real-time game, not turn based. Instead of trying to describe it, let me post a video here. This is a video showing some high ELO players with commentary.

  13. Aha!!. I though we were talking about chess.
  14. Is this where I can jump in and flame LoL over HON/Dota?


    I kid.

    This is interesting, I'd like to see what people think about the system that exists in all the above games.
  15. I agree. It was designed for chess, afaik. For chess, and other 1 on 1 games, it's a good predictor. For team competitions that keep pretty much the same personal from game to game it should be a pretty good predictor of team performance. For rating individuals in team competition, where the team membership changes randomly from contest to contest and individual performance stats aren't taken into account ... definitely no.
  16. Disagree, because your individual performance will impact whether or not your team wins. So, better players will win more often regardless of the rest of their team.

    Again, the biggest counterargument is that the strongest players have the highest ELOs, even in solo queue. This fact cannot be explained if ELO isn't related to individual performance.
  17. fluidistic

    fluidistic 3,286
    Gold Member

    In the game where I was teamed up with my clan mates rather than randomly, my elo was in the 1400-1500's. When the "season" was over, the team would be randomly created. My elo suddenly went up to high 1700's, my skills however remained the same. I'm not the only one to whom this happened, many people criticized the elo ranking system for that particular game due to this and totally unbalanced games where one could guess the outcome of the game from start even regardless of what the elo had to say. The same would apply even when the teams would be randomly balanced. In that particular game economy (think of a starcraft-like one's) is shared. If you have a noob in your team and he's wasting all the economy on useless stuff, even the best player can't do much to win the game.
    This did not happen in my game (game name is Zero-K and the seasons of clan teams is called planet wars).
  18. You're talking about a different game than the OP is, so I have no comment on that.

    In League of Legends, there are team queues and solo queues. The teams generally have lower ELOs than individuals, so I don't know how meaningful it is to compare the ELO of a team vs the ELO of an individual, as you seem to be doing. However, the point stands that you're talking about a completely different game.
  19. fluidistic

    fluidistic 3,286
    Gold Member

    My fault, in post #15 I thought you were answering to any team game rather than League of Legends in particular.
  20. Glad people started using game names, I didn't want to appear as a nerd too badly. The game I'm referring to is HoN.

    I'm also glad to see people agreeing with me. But does anyone know how to do a mathematical analysis to prove its invalid?
  21. Thanks for the counterexample, and this brings up some subtle points I forgot to make originally.

    First, most of the high rank players also tend to play with other high rank players, and play in organized teams rather than randomized teams. If not organized teams, they usually at least have a buddy they play with consistently. From my first post, I mention that the ranking system becomes more accurate as the makeup of the team remains unchanged rather than randomized. I highly doubt the "strongest" players can do much
    when they are thrown back down to below the average rank and have their hands tied by 4 beginners.

    Secondly, as the "randomness" tilts you in one direction or another, you start to notice a landslide effect. If I go on a bad streak, my rank takes a dive. If I get on a winning streak, I tend to stay up at that position until I get bad luck (horrible teammates) again.

    Its because if you start to win a couple, the system starts to pair you with other people who have won recently, and then these winners help to pull you further away from the average. It has very little to do with your own actual skill ranking.
    Last edited: Feb 14, 2012
Know someone interested in this topic? Share a link to this question via email, Google+, Twitter, or Facebook
Similar discussions for: ELO chess ranking system applied incorrectly in video games