Mark Carey and Mladen Sormaz presented a poster presentation at the 2018 OptaPro Analytics Forum. This article provides an in-depth write up of their analysis. Their posters can be also be read here:
The role of the full-back has changed dramatically in recent years, with many teams across Europe now focusing predominantly on how they can get their full-backs involved in attack, even more so than protecting their defence. Indeed, the use of wing-backs has seemingly come back into fashion, most prominently with Antonio Conte’s 3-5-2 formation working to great effect in Chelsea’s stroll to the Premier League title last season. But rather than subjectively observe the different playing styles of full-backs across Europe, can these styles be quantified?
We focused on full-backs as their playing style has become so varied in recent seasons, but this method can be applied for any position. What we wanted to do was provide a more detailed, objective measure of the different profiles of full-backs, which accounts for the broader dimensions of a player’s style rather than their individual statistics. To do this, we used Opta F9 data (aggregated match totals) to first run a Principal Components Analysis (PCA) with varimax rotation in order to reduce the dimensionality of the data, and determine the broader profiles of the full-backs that we entered into the analysis (details of our PCA analysis can be found here). The next step was to use this information to quantify which full-backs were most similar to each other in playing style.
The numbers
We analysed 417 full-backs from the 2015/16 and 2016/17 seasons across the ‘Big five’ European competitions. After the PCA showed stable profiles across the two seasons, we entered our data from the 2016/17 into a cluster analysis. Individuals with similar or related profiles are placed together, and are separated from those who are dissimilar in their profile. We should note that player comparison across leagues is difficult, and we did our best to correct for this within our analysis.
The results from our analysis were very interesting, with 11 player clusters created based on their performance statistics (see Figure 1). Importantly, high profile players which you would expect to cluster together based on their playing style do indeed fall into the same cluster (e.g. Marcelo, Alex Sandro, Davide Zappacosta), underlining the reliability of the analysis and passing the ’eyeball test’. Figure 1, below, creates broader groups and more specific sub-groups, with each line representing an individual player.
We used an agglomerative method, which looks to match up all the players with a ‘style partner’, and then expands the group by comparing that pair to other pairs and groups iteratively, until it reaches the top of the tree. To reiterate, the distance between the lines (representing one player) below is indicative of the similarity in their playing style.
Traditionally there are ways to determine how many clusters there are in a Dendrogram by cutting the tree branches at a consistent height. We didn’t do that here, because the principle of losing player similarity as you move up the tree remains whether you cut the tree or not. The coloured branches are there for ease of readability of the figure, but if you want to know how it is done there is a good paper titled ‘Defining clusters from a hierarchical cluster tree: The Dynamic Tree Cut package for R’.
Some examples from each cluster can be seen below. As discussed, the players which are clustered together make footballing sense, but given that the clustering algorithm assumes no knowledge of football, such results enhance the reliability of the analysis.
Cluster Example players (2016/17) | |||
1 | Philipp Lahm | Danny Rose | Kyle Naughton |
2 | Jordi Alba | Lukasz Piszczek | Gaël Clichy |
3 | Marcelo | Alex Sandro | Davide Zappacosta |
4 | Dani Alves | Hector Bellerin | Aleksandar Kolarov |
5 | David Alaba | João Cancelo | Lorenzo De Silvestri |
6 | Danny Simpson | Antonio Barragan | Gabriel Silva |
7 | Pablo Zabeleta | Nathaniel Clyne | Ignazio Abate |
8 | Antonio Valencia | Ryan Bertrand | Marcos Alonso |
9 | Leighton Baines | Neil Taylor | Michel Macedo |
10 | Aaron Cresswell | Juanfran | George Friend |
11 | Martin Kelly | Matteo Darmian | Javier Manquillo |
How can this be used?
The greatest use of cluster analysis is that the model can objectively quantify and group players who are most similar to each other in their playing style, without any bias or prior footballing knowledge. This can help football clubs hugely in the transfer market, particularly those on strict financial budgets. For example, clubs can create a transfer shortlist for a profile of player that would fit with their own style of play. Recruitment staff could also identify their ideal signing (e.g. Marcelo) and use the model to explore a number of alternative players who would be very similar for a more affordable price (e.g. Cristian Ansaldi). This can be viewed simply in the Dendrogram above, by searching a player and ‘moving up the tree’ to explore who is most similar to that player. The further you move up the tree, the less similar the profile of player, but clubs may have to compromise between their budget and the exact profile they require.
As a more realistic method of visualising players’ profiles within a club environment, we have provided an example radar chart of how you can assess the strengths and weaknesses of a player based on certain attributes (see Figure 2). This can be used to compare between two transfers targets, or compare a potential signing with the player you already have at the club.
As the example below shows, Ryan Bertrand’s attributes are grounded in good defending, in which he is effective in joining the attack whenever possible. On the other hand, Danny Rose may not be as strong defensively, but he sees more of the ball and is likely to provide more chances for the team. This is an example of the decisions clubs might make in their transfer plans, and using this tool based on the clustering algorithm could be an objective method to do that.
Summary
With access to more data from leagues all around the world, this model can be even stronger in giving clubs a competitive edge in their transfer policy. It provides a fast, cost-effective method which can be easily implemented and updated in a club, both within and between seasons. Of course, (as is always the case with analytics) we emphasise that this model can be implemented as a useful tool to accompany the typical scouting methods that currently exist, and not replace them altogether. What it does provide is a good basis to objectively identify potential transfer targets, without bias, for scouts to then report on. Indeed, this method can be used as a good calibration between traditional scouts and the analysts, to determine whether the subjective profile of a target player matches up with their statistical profile. Given the soaring player prices across Europe, using this model could save clubs a lot of money in the transfer market and help provide a filtering mechanism to avoid the ‘panic buys’ that have plagued so many teams over the years.
We received some very positive feedback when presenting this work at this year’s OptaPro Forum. For more information, you can reach Mark and Mladen on Twitter at @MarkCarey93 and @Mladen_Sormaz.