Once of the core applications for tracking data is the ability to apply machine learning to gain insight into player tendencies. Unfortunately, due to small samples, we cannot simply measure a particular player’s track paths and say “this player tends to do x.” Instead, we must adopt methods that lift information off a player and direct it to a player prototype. It’s this tracking-based prototype that we are able to gain enough signal to start discerning player capabilities. This process is derived from a methodology called Kriging; a procedure that every indoctrinated spatio-temporal analyst has endured in their studies. The idea is that we are able to develop a spatial indicator based on borrowing strength from spatially-local observations. In this case, spatially-local isn’t defined as nearby on the same court; but rather nearby in the space of relative position during the play.
Let’s start with a simple example to break down what we mean.
To start, let’s consider a Minnesota Timberwolves initial offensive scheme from last season. For this scheme, we have a 4-out and 1-in initialization with a post player starting at the block. This player will then initialize a pick-and-roll to obtain the first looks for the offense.
If we are to look at the center of mass for this offense, it would be offset to the left elbow. The perimeter players would form an arc about the center of mass with the post player looming in the interior of the arc. Much like a 4-out / 1-in offense would be centered as. However, when the motion of the PnR occurs, the center of mass moves with the offense.
At this point, the center of mass lifts into the left wing. What’s more challenging is that the point guard and the post player interchange positions. We watch the guard become the interior of the offense with the post settling at the top of the arc; either staying put or rolling into the key. This is common for the 4-out offense as they are attempting to open driving lanes and finding mismatches on switches.
We all know who the point guard is, and who the center is. However, if we restrict ourselves to positions as this, we immediately correlate the guard and the center as their positions overlap in tracking. And it’s this very reason why we need to de-correlate the tracking positions by increasing the number of iterations; that is, watch 400+ PnR plays; or employ kriging. One method of pre-processing for kriging is role-alignment.
The process of role alignment originally comes from tracking in soccer, and was once a paper submitted as an ICDM submission back in 2014. The idea in soccer is intuitive and doesn’t translate to basketball properly; but its use in borrowing strength is a net positive, as we will discuss later.
The process is straightforward:
1. Center Every Frame
The first step is to take tracking data, frame by frame, and identify the centroid of the offensive players. This is the very first step we performed in the Minnesota Timberwolves example above, with each image being a frame in the data. This centroid will show how the team is distributed about their center of mass.
2. Arbitrarily Set Roles
Next, we arbitrarily set roles. We can use a dictionary order, or a player order. Either way, we start with an ordering of roles that makes sense. In our case, we use players. The idea is that we start with a role assignment and then walk through the roles asking the question, “Does this role at this frame make sense with respect to this role across all frames?”
3. Iterate through the Roles and Assign Updated Roles
This part is the crux of the algorithm. The idea is that we take the centered distribution of each role and walk through every combination of positions for each frame. We build a cost function to identify the distance of each player location from each of the five roles. This builds us a cost matrix. And then we apply Munkres Linear Assignment to identify the optimal role assignment for that frame. This is called an updated role.
After we walk through all frames in the segment, we repeat until convergence of roles.
So let’s see this in action…
Utah Jazz Possession
In this example, we extract out a possession between the Utah Jazz and the Sacramento Kings on November 21, 2018. In this possession, Ricky Rubio brings up the ball after a made free throw from Marvin Bagley III, and runs through a PnR action with Rudy Gobert. The track paths look like so:
As we see the tracks tangling, there is a lot of correlation between players’ paths; requiring some form of strength borrowing. By applying the centering scheme for every frame, we find that the player track paths show a much different picture.
And we no longer see clean segments of tracking. We find that some players rotate, such as the blue path. We see red is actually segmented between two groupings. What this shows are a pair of player switches: one off a screen, another off a shallow switch.
Therefore, we step through the third part of the role alignment process and perform the linear assignment until convergence. So we do just that to obtain roles within the offense.
And here we see the clusters organize much nicer than in the original assignment. There are still some tricky interactions going on but we clearly see a blue, black, red, and green. The purple action in this plot is primarily the red, blue, and green swapping in the original plot. Therefore, we see the purple segments continuing to try and rotate around the center of the offense before snapping back towards the lower-left-hand side of the offense.
Refitting the roles into the track paths, we obtain these segments:
Apologies on the colors, as the roles keep swapping due to first-in, last-out assignment from Python dictionaries (R does the same, don’t worry). We see how a basketball player breaks up as a sequence of different roles throughout the possession. In soccer, the original playing field for development, it is rare to see these many role swaps. However, we still borrow strength from now defining roles as opposed to individual players. The correlation now transfers to the role swapping locations, which is a smaller set of landmarks than two overlapping track paths.
Little Bit of Code…
The main crux of the algorithm is the iterative process. In this case, we bring in our role dictionary, roles, which is just an index list of tracks. At the iterative step, we compute the distribution, Gaussian in this case, as a variance one, mean-estimated distribution.
We then walk through each frame of data and compute the 5×5 cost matrix as a Kullback-Leibler divergence between the new point and all the points for all players. After the cost matrix is assigned, we can apply the Munkres linear assignment algorithm; which is a built-in package in Python.
We copy over roles using a temporary newRoles dictionary, and repeat the process. In this simple case, we cut down the iterative process to 10 iterations. For the Utah example, we required 23 iterations before convergence…
Next Steps: Ghosting? Other things?
Once we can get role alignment to work, the next steps are to leverage the borrowed-strength data for machine learning algorithms. By itself, the role alignments are meaningless and interpretation is non-illuminating. As a data science tool, they are powerful. One of the most powerful algorithms on the market at the moment is Ghosting. That is, the application of Long-Short-Term Memory (LSTM) neural networks in processing average motion between offensive and defensive players.
In the Ghosting framework, instead of laboring to positions, we apply role alignment to reduce the amount of learning required to de-correlate overlapping tracks. Instead, we learn roles and learn the role swapping. To aid in further de-correlation, blocks of data are created for redundancy and for the ball position. Breaking down this algorithm is almost elementary at this point; as role alignment is the long pole in the tent process of Ghosting.
So how would you change role alignment? What would you build off this role alignment algorithm? There are many ways to borrow strength. This is just one, and it happens to be fairly effective. But it’s not the only way.
7 thoughts on “Applying Role Alignment to Tracking Data”
Pingback: The Basketball analytics site of EuropeApplying Role Alignment to Tracking Data
Pingback: Weekly Sports Analytics News Roundup - January 1st, 2019 - StatSheetStuffer
Thanks for the article and great explanation. I was wondering where you got the tracking data for the jazz-kings game. I was under the impression that this data is no longer public. Thanks,
The data is indeed no longer public since Second Spectrum had taken over the process from Stats Inc.
Thanks for the info, I look forward to exploring more of your articles
Pingback: Applied Sports Science newsletter – January 2, 2019 | Sports.BradStenger.com
Justin, this is a much better explanation that whatever I have seen in papers! If you wanted to extend this to multiple possessions so you have role 0 in possession X being the same as role 0 in possession Y, would you simply get the meanRoles converged to for each possession, fit a 5-component Gaussian mixture on them and then rerun the Hungarian algorithm for all possessions and frames using the costs from the centroids of these Gaussians? This seems to be what the ICDM paper you cite alludes to but the details are not there.