Stellenbosch Football Club

The Science Behind Our Performance Data

This month in the High-Performance Blog Series we focus on working with data and explore some of the data we have collected as a club. Indeed, the ability to learn from data is becoming an essential skill. Over time, we gain access to more and more data, and never has it been so important to learn to tap into this information. Data conceals useful information, and it needs to be interpreted into a format that will be easy to comprehend. Too often, great results and important messages are lost at the final and most important hurdle of the scientific process, communication.1

The ideal practitioner

Preferably, in high performance sport there should be a sport staff with expertise to draw together an array of approaches to understanding and using data. This includes data generation, data capture and storage, data manipulation, and a variety of analysis and statistics.2 Furthermore, the practitioner should be able to ask pertinent applied research questions as well as provide innovative, efficient, and sustainable feedback to coaches, players, and key stakeholders.3 While technology and sports haven’t always appeared like a natural pairing, the two industries have collaborated great over the years. For instance, as we seek to prepare our athletes for competition, understanding and preparing them for the game demands is an important pursuit.

We record data from training session and games to help shape pre-match preparation and post-game debriefs and develop young talent.4 We use that same data to target certain outcomes such as enhancing athletic performance for example or mitigating injury risk, we attempt to optimise training load at different phases throughout the training process like adjusting individual sessions, planning day-to-day, periodising the season, and managing athletes with a long-term view.5

This requires a sustainable, and precise quantification of the player’s activities during training and match play. These data are important for practitioners to ensure aspects of the training process replicate the general characteristics required during competition. Accordingly, the use of data should be a fundamental part of what informs our decision-making, and we should absolutely seek to collect quality data to support this process.

Creating a workflow

A data workflow is aimed at guiding the practitioner through a logical progression of steps to carrying out effective and sustainable projects. As we seek to help them understand and enhance performance, we need to consider our context as well as how we apply and communicate the data. Crucially, as sport scientists we need to be very mindful and very considered in our methods and interpretation when choosing to act based on the data.1 As we seek to inform the decision-making process, we need to understand the uncertainty and limitations involved with any key performance indicators or tools we choose.2 Most importantly, we should never overlook that our role should involve the use of scientific principles to provide advice and support to athletes and coaches.

Since the workflow is a sequence of steps in a defined order, the workflow design should be as pluggable and extendable as possible to make best use of any software or an application. Starting from the planning to the data collection, storage, and processing until reporting the insights, there are a wide variety of tools that can be used.

To date, most of our work has been created within excel. As a topmost spreadsheet application, Microsoft Excel has a lot of benefits for anyone who knows how to use it. Additionally, it is one of the best ways to store and exchange data as the information is generally easily transferrable. Also, free learning resources like  Excel Tricks For Sports offers a vast catalogue of YouTube videos covering many aspects of what you may want to do through Excel for sport science/S&C. With more powerful data manipulation capabilities, easier automation, reproducibility, as well as easier project organization, open-source computer languages (e.g., R, Python, Java, C+) can be used for statistical analysis, creating automated reports and visualizations.

Application in our context

I suppose it is a question of what is the scientific and process goal, who is the target audience, and what are we looking to do in our analysis as well as considering the platform that will satisfy our needs? Let’s consider GPS data collected over the last 2 seasons (19/20 and 20/21) of the Premier Soccer League. For instance, one pertinent question we might want to ask is what is the physical activity of our PSL soccer players? For the time being we will focus only on running, total distance and a discretise threshold value (i.e., distance covered > 23km/hr). Please note, I am learning these tools myself as well and by no means am I an expert.

Describing the data

We can utilize tables to represent brief descriptive coefficients that summarise the data. The aim of this process is to understand the features of the data by looking for any general characteristics, patterns and features of collected real-world data. These are called descriptive statistics and consists of two basic categories of measures: measures of central tendency (mean, median and mode) and measure of variability (range, quartiles, standard deviation).

Visualizing the data

We might want to explore this further with the aid of data visualization tools such as R and Tableau. Before making a chart, it is important to understand why you need one. An idiom says, “A picture is worth a thousand words”. For one, people naturally understand pictures in forms of charts and graphs more than raw data. Modern tools are making extracting knowledge from data ever more powerful, and perhaps more importantly, easier. All these tools are expanded by packages, modules and libraries which are collections of functions and data sets developed by the community. Let’s say we want to find out if there are any trends in the data. A scatter plot is one of the most popular go-to-charts when doing exploratory data analysis. It can be used to show the relation between multiple variables as well as show general trends in our data. Here we utilize Tableau to create a scatter plot with total distance and manipulated the data point (i.e., colour by quadrant and shape by the season).

As we can see from the chart, we seem to have higher outputs during the 20/21 season compared to the 19/20 season. Now we can ask the next question, is there any statistically significant difference in running outputs between the 2 seasons? Depending on the type of data (nominal, ordinal, discrete or continuous) and its associated distribution (i.e., normal, uniform, poisson, chi-square) different hypothesis can be tested.

Another chart we can utilize is a boxplot. A boxplot is a standardized way of displaying the distribution of data. It is also very useful in displaying some of the descriptive statistics presented in the table above and whether there are any outliers. Here we utilize the ‘ggstatsplot’ package, an extension of ‘ggplot2’ in R which allows us to create graphics with details from statistical tests, making data exploration simpler and faster. Specifically here, the ‘ggbetweenstats’ function for comparisons between seasons for both metrics, was used. You can find the documentation here.

What we observe is that, indeed there was a statistically significant difference between the 2 seasons for both metrics. Measures of effect size which tell us how much one season differed from the other are also provided. For instance, we could interpret the results from the Sprint Distance comparison by utilizing the benchmarks developed by Jacob Cohen where 0.2, 0.5 and 0.8 are considered as small, medium and large. With a Hedges’ of -1.36 this would make it a large effect (also seen by the naked eye). Importantly, these values only provide a global aggregation of a series of physical activities and provide very little context for how players accumulate these distances. Physical activity is influenced by numerous factors (i.e., player individual profile, match location, technical and tactical skill required). We can repeat this process over and over with different questions as we seek to extrapolate the potential outcomes of the situation based on what we know about the players and our specific context.

Concluding Remarks

The advantage lies in today’s modern computing infrastructure and the freedom of access to resources. Multiple online companies have sprung up teaching Data Science using excel and various programming languages. Additionally, there is a fast-growing community of enthusiasts, professionals, and academics who, in most cases spend their free time to continuously develop and maintain these tools. Help is available everywhere; and in most situations quick YouTube search will show an endless supply of tutorials. Developing these new skills has never been easier!

The ability to use code to repeat your analyses and reproduce the results consistently cannot be overstated. And while that can be scary for someone who’s never worked on excel or written a line of code before, it’s not as daunting as it seems. These are skills that anyone can learn and as a sport scientist you will be digging increasingly into data science, fittingly, those in this specific field would benefit from learning these extra skills, as it is not only empowering but also fulfilling in being able to extrapolate, interpret and package the data yourself.

1         Thornton HR, Delaney JA, Duthie GM, et al. Developing athlete monitoring systems in team sports: Data analysis and visualization. Int J Sports Physiol Perform 2019; 14(6):698–705. Doi: 10.1123/ijspp.2018-0169.
2         Martin L. Sports Science Data Protocol. Sport Exerc Med – Open J 2019; 5(2):36–41. Doi: 10.17140/semoj-5-174.
3         Brink MS, Kuyvenhoven JP, Toering T, et al. What do football coaches want from sport science? Kinesiology 2018; 50(April):150–154.
4         Ravé G, Granacher U, Boullosa D, et al. How to Use Global Positioning Systems (GPS) Data to Monitor Training Load in the “Real World” of Elite Soccer. Front Physiol 2020; 11(August). Doi: 10.3389/fphys.2020.00944.
5         West SW, Clubb J, Torres-Ronda L, et al. More than a Metric: How Training Load is Used in Elite Sport for Athlete Management. Int J Sports Med 2021; 42(4):300–306. Doi: 10.1055/a-1268-8791.