In a recent post, we took a look at identifying how a team distributes the ball on offense with a deep dive look at the Brooklyn Nets. In that article we identified how to construct a community; the sets of likely passes for scores between players. This also included two-pass assists (hockey assists) where it was revealed that players like Brook Lopez was highly likely to receive a pass back into the post after a kick out for a score.
In this article, we focus on the spatial distributions of assists for teams and identify some simple tests to identify interactions of players in attempt to identify primary, secondary, and tertiary options for scoring plays given particular units on the court.
The definition of an assist is fairly complex, as it has changed over the years. Originally, an assist was defined as a pass that led directly to a score. This meant any moves by the shooter would automatically nullify the assist. Something as simple as a dribble to even a pump fake. Over the years, the definition has relaxed as such examples of a steal and a pass to a leaking guard would not count as an assist if that guard took a single dribble; but counts now. In fact, in some leagues this is still the norm. However, the relaxation is not as you would think.
Assists are akin to errors in baseball. While the definition is fairly straightforward, the application of the definition is subjective. Here, we will be at the will of NBA stats and accept assists as-is; meaning the definition is applied uniformly across all games.
Spatial Distribution of Assists
The first thing we look into are a team’s spatial distribution of assists. Let’s start by comparing a few teams: Orlando Magic (1820), Houston Rockets (2070), Sacramento Kings (1844), New York Knicks (1786), and Milwaukee Bucks (1984).
We see from almost every distribution that teams have effectively adopted the layups and three’s mentality as mid-range jumpers are small samples. whereas the perimters and the rim are effectively blobs. The most successful team of this lot, the Houston Rockets, are the most egregious with very little assisted field goals (only 82 for the entire season and playoffs) in the mid-range.
We do see some other curious effects, such as the Milwaukee Bucks having a more dominant right-hand distribution of mid-range assisted scoring situations. Similarly, the Orlando Magic have a hole in the left-hand short corner, indicating that any points scored in that position are not part of the passing plan. Or if it is part of the passing plan, points are not being scored there by the Magic.
Understanding the spatial distribution of assists only can go so far in understand the team interaction. Instead we focus more on the tendencies of players. If we are primed with more information, we can make even further analyses of players.
Here we will focus on three primary data sets: play-by-play data, SportVU data, and Synergy data. We will primarily work off of play-by-play data, identify how to enhance data by using SportVU, and use Synergy for partitioning and cues.
Our case study will involve our first example: Orlando Magic.
Distributions According to Units
The first partitioning of the data is to look at how the Orlando Magic units work together. The 2017 Orlando Magic had a total of 19 different roster players who logged at least one minute of NBA action. For any unit on the court, this would indicate a possible 162,792 possible rotations to play on the court. For the 2017 NBA season, the Magic played a total of 282 different rotations over the course of the year.
The rotation with the most assists is the Aaron Gordon, Elfrid Payton, Evan Fournier, Nikola Vucevic, and Terrence Ross rotation with 460 of the team’s 1820 assists. The second top rotation? Aaron Gordon, Elfrid Payton, Evan Fournier, Nikola Vucevic, and Serge Ibaka with 290 of the team’s 1820 assists. These are effectively the starters for the Magic after and before the Ross-Ibaka trade. While there are 56 games with Ibaka and 26 games with Ross and Ross’ rotation has roughly 70% more assists that Ibaka’s rotation, this is due in part to roughly sixteen missed games by Fournier and a twenty game stretch where Payton and Vucevic were not starters. This is evidenced by the third highest assist total rotation being Aaron Gordon, Bismack Biyombo, D.J. Augustin, Evan Fournier, and Serge Ibaka with 189 assists; the fourth highest assist total rotation of Elfrid Payton, Evan Fournier, Jeff Green, Nikola Vucevic, and Serge Ibaka with 163 assists; and the fifth highest assist total rotation of Elfrid Payton, Evan Fournier, Jeff Green, Nikola Vucevic, and Serge Ibaka with 128 assists. In fact, here is the Python dump for every rotation with 20+ assists:
So let’s see where these assists go when the top unit is in.
Gordon, Payton, Fournier, Vucevic, and Ross – 460.
Here, we see the Orlando Magic’s distribution of assists for their top rotation. The plot does not look much different than the team plot with majority of assists at the rim and several others outlining the three-point line. If we parse out quarter-by-quarter action, we see that the distributions change over time.
Per Quarter Spatial Distributions
Here we see the the units are stabilized in the first and third quarters; meaning that they effectively follow the team distribution. However, in the second and fourth quarters, the offensive scheme dramatically changes to be a left-hand dominant scoring team with a significantly high probability of obtaining an assist beyond the arc. This indicates a change in scoring philosophy between the first and second quarters; similar to the third into fourth quarters.
Let’s quantify this difference. In the third quarter, this rotation obtains 72 assists within 6 feet of the basket. Compare this to the 55 assists on three point field goals and we see that this rotation is 1.31 times more likely to score at the rim off an assist than from beyond the arc.
In the fourth quarter, we see a different result. Instead we see 16 assists within 6 feet at the basket and 23 three-point field goals. This effectively shows an inversion of game plan as now we see that this rotation is 0.70 times more likely to score at the rim off an assist than from beyond the arc.
As a side note, there is one assist by this unit in overtime. It is a lay-up.
Testing Changes in Game Plans: Spatial K(t) Tests
While we have seen that the top rotation takes a significant change in scoring situations between odd and even quarters, one may attempt to say low sample size and therefore we are seeing faces in clouds / false correlations. Well, to test this, we can look into comparing the two spatial distributions through the use of a commonly known spatial test known as K-function tests.
A K-function take a particular spatial location, s, and counts the number of observations within a radius, t of s. Think of this as building a circle of radius, t, with s as the center and counting the number of observations within that circle. The function is given by
This is the expected number of observations divided by an intensity. The intensity is merely the distribution of points that should exist within the circular region of interest. For example, if we assume a Poisson distribution of assists; that is, randomly scattered attempts at uniform in the circle, we obtain
Recall that the uniform distribution is one-divided by the area of the region of interest. If we expected an observation per region uniformly, we obtain K(t) = 1 / ( 1 / pi*t*t), which is exactly the Poisson noise model above.
While we do not know the true intensity of the distribution of assists, we can estimate them. In order to approximate the K-function, we perform the calculation (brace yourselves)…
Let’s break this down…
The first fraction is merely the one-over-intensity, where |D_s| is the area of the region of interest. This will be the NBA court. This means the nasty sum is the expected number of observations to appear in the circle with radius t. The points, s_j, are the actual assist locations (observations). The values d_j are the distances from s_j to the nearest boundary point in D_s. The indicator function, 1, counts the number of observations, s_j, that are within the circle of radius t but farther than a distance of t from the boundary of the region of interest, D_s. Let’s illustrate this with all the parts:
The inclusion of this d_j > h means that the point can be captured by a circle of radius t from every location in the space of interest. Dividing by this sum we obtain a estimated distribution of spatial points within a circle of radius t.
Application to Top Rotation
Applying this to the top assist rotation for the Orlando Magic, we obtain 381 of the 460 assists to occur more than five feet from the out-of-bounds region. This number is relatively high as the basket is located at 5.25 feet from the baseline. Therefore all dunks and lay-ups are included. This finds the denominator to be 381 in the formula of estimating a K-function. Of the 211,140 possible comparisons between all 460 spatial locations, we find that 18,456 pairwise spatial comparisons are within five feet.
This gives us a K-Function value of 248.6768.
As a quick note, for every lay-up, these will match perfectly to all other dunks and lay-ups as a distance of zero feet. Hence, anything less than 5.25 feet for a K-function should have a large total for rotations with a large amount of assists.
Varying the spatial dependence value, t, we find the following K-Function plot.
How do we interpret this plot? Recall that the K-Function identifies the expected number of points within a circle of radius t. If this radius gets larger, the number of points should increase. This is exactly what we see in this plot.
Note that there is a big drop off at 5.25. This is the point where the dunks and lay-ups are close to the boundary. The steep drop off shows that assisted field goals between 5.25 and roughly 8 feet are much less frequent than most other shots. In fact, as we creep out to three point distance, 22 feet, the expected number of assists continue to climb.
Let’s understand this…
At 22 feet (three point line range), we have a K-Function value of 725.43478. This means that we expect 1,704 spatial-combinations to be made within 22 feet of each other. This doesn’t mean that assists continue to climb by this rate, but as we start to include three point region shots relative to the basket, we see that more assists are added in.
Once we extend further out from the three point line, we see that number once again drop off and become unstable. This is due to few shots coming from 23+ feet out, and the boundary starting to cause an effect; as the furthest away from the boundary a player can be is 25 feet.
Testing Uniformity of Assists Using the K-Function
The first step in testing an offensive scheme for a particular rotation is to understand how to develop a test using the K-Function. The most basic test is a test of uniformity. Those who are familiar with a test of uniformity will know immediately that the distribution of assisted field goals is NOT UNIFORM. We can see this in the plot.
If the distribution is indeed uniform, then we would see roughly as many shots taken at the rim as we would at half court. In fact, we can perform an MCMC to generate a sample of data points from a uniform process and simple compare. Or… we can look at the K-Function.
Recall that uniform noise on the court is simply shot noise, or the Poisson model we highlighted above. If we plot the K-Function for the uniform model with our data model, we immediately see the disparity.
The statistical test is effectively a likelihood ratio test that considers the distribution of the observed K-Function and the theoretical Poisson distribution. When we take this ratio, however, we subtract the distance, t, to obtain an indication of clustering within the spatial data. This clustering, in turn, identifies that there is a particular preference of spatial actions performed by the offensive rotation. The uniformity test is given by
Plugging in observed values of the K-Function, we obtain the L-Function curve (set of test values for the K-function with respect to uniformity.
Let’s interpret this plot. Values of zero indicate complete spatial randomness. In this case, we actually see a value of zero at roughly 8 feet. However, without plotting the estimated confidence bounds (they linger closer to two until we arrive at roughly 15 feet, and then they “balloon” towards 5) we have significance of non-randomness. The value zero here indicates a change from clustering to regularity.
Any significantly positive values indicate clustering that occurs on the court. As we are under 5.25 feet, we naturally will see the clustering at the basket. The rate at which the function heads back to zero, the more spaced out assists are (indicating randomness). However, as we head into ten or more feet, we obtain regularity. This means that we identify regions of activity such as three point attempts, shots from the elbows, and at the rim. However, the data is not tightly clustered in their locations.
Testing Between Rotations
Now that we understand the concept of a test, we can then focus on comparison two rotations. In particular, the top rotation during odd quarters and the top rotation during even quarters. In this vein, can difference the two resulting K-Functions for the two rotations of interest. Let’s extrapolate on this:
If two rotations have the same offensive schemes, they will have the same resulting K-Functions (within variation). Similarly, if two rotations have statistically differing K-Functions, then the offensive schemes CANNOT be the same.
In this case, we are interested in that odd and even quarter change of offensive schemes for the Magic. K1 is then the K-Function for the fourth-quarter observations. Similarly, K2 is the third-quarter observations. Therefore the difference can be illustrated as follows:
Here, we see that by taking the difference, we are nowhere close to zero in effectively every location, except for the perimeter. What this indicates is that the fourth quarter unit indeed look for more perimeter attempts on the pass. This doesn’t indicate that the shooters are effectively better at this time period. To investigate that, we need to look at per-possession and per-chance statistics.
That all said, we have developed a spatial methodology to statistically distinguish changes in offensive schemes either within rotation or across rotations. This type of analysis can help us clue in on particular patterns in offensive flavor or…
Inclusion of Other Variables
… develop statistical methodology to identify how offensive schemes react to defensive personnel. Note, in order to perform this partitioning, we are giving up many degrees of freedom and may quickly find ourselves in high variances situations. More specifically, situations where Simpson’s paradox may arise.
If we have a Synergy database, we may be able to correlate assist distributions on the court with type of offense; such as Pick-and-Roll plays or Catch-and-Shoot plays. Again, be wary of partitioning noise above the the signal in these situaitons.
Another way we can enhance this data set is to develop a passing indicator. This would yield passing locations on the court. In these situations, we obtain spokes of data and can now look at how ball movement would work within an offensive scheme.
In fact, there’s many things we can do now that we are armed with being able to test spatial data directly. Feel free to try developing this test and see how other team’s schemes play out. Or more investigatively… which teams run similar styles of rotations…