The most outstanding male squash players

Patterns found by analysing ranking data from Wikipedia

Ramy Ashour, the greatest player of the past 20 years

A fun part of any sport is comparing the top players and trying to determine who really was the best of the best. As a fan of squash, I thought I would do this for squash. As a data scientist, I tried to do this by seeing what patterns could be found by analysing the data in the list of Top 10 Players at the end of each calendar year since 1996, available on Wikipedia.

Based on this data, I will answer these three questions:

  • Which players stand-out?
  • What criteria separates the absolute players from the other Top-10 players
  • What subgroups of players do advanced machine learning models find?

If you want to see the full analysis, you can view all the details and the code on github: https://github.com/Lovkush-A/squash_wiki

  1. Which players stand-out?

Here are the top 30 or so rows of the table I created, ordered by average rank:

Based on a short scan of this data, here is what stands out to me:

  • Peter Nicol’s average rank of 2.4 over a 10-year period. This indicates just how much he dominated the game during that time.
  • Gregory Gaultier’s and Nick Matthew’s total number of years in the top 10, at 15 and 14 years. The next best number is 11. If you watch a lot of squash, the squash commentators will bring up the fact Gregory has been an elite player for an impressive amount of time, but it is only after seeing this data that I really appreciate just how impressive it is.

2. What criteria separates the absolute players from the other Top-10 players

I was expecting to do some fancy modelling here, but scanning through the full dataset revealed that the following simple criteria does a good job of identifying the legends of the games:

  • Is their Top Rank 1 or not?

This is likely not convincing to people unfamiliar with the sport, but other squash fans would agree with this rule. If they were to go down this list of players, they would agree that anybody with a rank of 1 is qualitatively different to those who don’t.

It is not perfect though. Using this criteria you would include Lee Beachill in this list of best players. From my own judgement, I would not group Lee Beachill with the legends; his reputation is not as remarkable as the other players. I think the fact that his other stats are not as impressive as the other best players’ stats matches my intuition.

3. What subgroups of players do advanced machine learning models find?

There are various techniques in machine learning to help identify subgroups of players, so that everyone in one subgroup is in some sense similar to each other, and different subgroups are somehow distinct.

Applying those techniques to this dataset revealed that there are three main subgroups:

  • Players who played in the late 90s and early 2000s
  • Players who have been in the Top 10 for many years
  • Players who have recently joined the Top 10

To see this, I applied three different visualisation techniques and saw how these different groups appeared in the visualisations. Unfortunately I do not know to make the plots available on Medium, so I have to ask you to download the files containing the plots and open them in your own browser. It is worth the effort! The visualisations are interactive, so you can look around for yourself and see what patterns you can find.

Conclusion

Exploration of data scraped from the wikipedia found some patterns and taught me some new things, e.g. how remarkable Peter Nicol’s and Gregory Gaultier’s statistics are.

However, I should stress the limitations of this approach. There are several, but big ones include:

  • Data only goes back until 1996. This means we miss out on the achievements of players in earlier years and have incomplete pictures of those players who ended their careers in the late 90s, notably Jansher Khan and Jahangir Khan, who are often considered the two greatest players of all time.
  • The data is limited to one snapshot at the end of each year. This is a very crude summary of all these players histories. I am surprised this contained enough useful information to pick up on any meaningful patterns!
  • The data is not able to capture the fact that Ramy Ashour is, without a doubt, the best squash player of his generation. A big reason for this is that Ramy had many injuries in his career so his stats were negatively harmed by that.

Maths lecturer turned Data Scientist.