How likely is a player able to rebound a basketball? If you ask Second Spectrum, you will get a function that considers positioning, hustle, and conversion. The argument makes sense: First, a player needs to be in a position to have a chance at obtaining a rebound. Second, the player needs to be able to beat their competitors to the rebound. Finally, a player needs to be able to actually secure the basketball.
In their 2014 Sloane Sports Analytics paper, Second Spectrum focuses on defining position and leveraging their definition of position to obtain a probability of the rebound falling into their position.
The quantity of hustle is defined as the percentage of opportunities of obtaining a rebound. This is slightly different than the positioning probability, as players are able to move and the closest player may not get the rebound. An opportunity is defined as the closest player to the basketball after the ball has fallen below the rim. Therefore, multiple opportunities exist for players, despite the basketball not falling within their bin.
Finally, the conversion probability is merely the proportion of rebounds obtained out of the total number of opportunities. Piecing this together, we obtain the formula
There are some assumptions that are made here. In this article, we break down assumptions when particular mathematical components are used. To be clear, these assumptions may be corrected for by Second Spectrum; but are not explicitly identified in their paper. Here, take a look into what to consider when we use similar mechanics.
So let’s begin…
Distance is Measured by Time
If you’re not familiar with this statement, welcome! A common flaw in spatial analytics is that analysts sometimes think that the distance between two points is the pythagorean theorem. Unfortunately, that is merely a mathematical representation of the physical process we are interested. Let’s drive this point home.
Consider two players located “exactly the same distance apart” from a rebound opportunity. Let’s further suppose player one is DeAndre Jordan while the other player is Slowy McSlowyton. Is it fair to say they are both equally likely to obtain the rebound? If you said no, then you understand that distance is measured by time!
A more concrete example is found in Global Positioning Systems, or GPS. In GPS, our watch has no idea where you are on the Earth. Instead, it listens for a particular code from each GPS satellite it can hear. Each satellite, in turn, has their own code that they effective repeat. Suppose this is “I am satellite one! I am satellite one! I am…” When the watch hears the GPS message, it uses an almanac, which is a lookup table when the satellites should be saying their messages, and identifying where in space they should be. Your watch then merely counts the time difference between the satellite barking its message and when the watch receives the message. This time difference can reverse the famous D = RT formula to recover our understood notion of distance. If we obtain four satellites, we can measure the three-dimensional distance from each satellite, as well as the time bias in our shoddy, cheap, watch!
Rebounding opportunities are no different. Instead of measuring distances, we’d instead look at reactive speeds of players.
Classic Voronoi Tesselation
If we assume that all players are equal, then we can use a Voronoi Tesselation. This mathematical construct is a partitioning algorithm for a surface of interest. The partitioning is conditioned on a set of observed points and answers the question: where are the regions of my surface that are closest to each point?
To build a Voronoi Tesselation, we take each point and grow circles. If any circles intersect, we obtain a boundary between the two points. This continues until the entire surface is covered. A Wikipedia gif file best illustrates this process:
Application to New Orleans Pelicans vs. Los Angeles Lakers
If we apply this to a particular field goal attempt, we obtain a similar partitioning. Here, we only have ten players on the court and therefore will obtain 10 partitions. We are able to partition for each player; however, we will focus on team partitioning.
Note that we used a discretization to obtain the Voronoi regions on the court for each player. This serves as a dual-purpose. First, we are able to display the Voronoi Tesselation on the court to give a sense of the partitioning on the court. Using the scipy.spatial package, we can quickly obtain the necessary boundaries, but are unable to plot on top of a court easily.
Second, using the discretization, if we know where the field goal attempt was taken, we can leverage a trained distribution of misses and simply aggregate the probabilities of the rebound falling within each partition. This aggregated probability is called the value of real estate at the time of the shot.
A simple piece of code to build the spatial partitioning, we can simply walk over a meshgrid on the court and simply compute distances. This is for illustration only. Computing the actual Voronoi Tessilation would use a different computation and leverage the in function from the scipy.spatial package.
But This Assumes Players Have the Same Speed!
In light of the personnel on the court, we can take a simple adjustment and train player movement around the basket. For instance, guards tend to be faster than posts and have a better chance of chasing rebounds down if the ball goes astray. Similarly, a post player may be quicker than another post player, allowing him to gain access to a region faster than another player.
Therefore we look into speed of players as opposed to distance of players. Therefore, much like GPS, distance is measured in seconds; or more analogously, which player will arrive at the location first.
If we were to look at common speeds of players with respect to their regions, we have trained that Anthony Davis tends to move at a rate of 5.999 feet per second within the lane during rebound attempts. Similarly, Brandon Ingram tends to move at a rate of 6.673 feet per second as a crasher. We can then apply these trained values to obtain a more realistic positioning region.
Now We Can Derive a Player-Tendency Learned Positioning Probability
Given this simple update to building a Voronoi Tessilation, we can look into developing the positioning value in rebounding. Positioning is defined by two quantities: initial rebounding position and terminal rebounding position. Initial and terminal are defined by the field goal attempt.
As a player takes an attempt, other players may crash into the lane, effectively shrinking other players’ regions. Similarly, other players may leak out and effectively give more real estate to other players. The initial position is then when a field goal is attempted, while the terminal position is then the ball is closest to the center of the rim. In this case, we obtain two different Voronoi Tessilations; which in turn yield an entirely different probability set.
A regression is then computed to measure the expected change in rebounding positioning probability. This will serve as the adjustments for players during a particular possession (such as quantifying crashing, boxing out capabilities of each player).
The second part of rebounding, according to Second Spectrum, is hustle. These situations are relatively self-explanatory and tend to be the argument from using player-defined Voronoi Tessilations. To argue this argument is indeed not true, we first shall define hustle.
Hustle is defined as opportunities created after initial positioning. This would identify factors such as boxing out and crashing. An opportunity is defined as the number of times a rebound is available for the closest player when the ball is below ten feet. There can be multiple opportunities for each rebound as the first closest player may not secure the rebound. If a player is able to obtain an opportunity outside of their initial positioning, then they obtain a hustle value.
Again, a regression is performed. However this time instead of looking at terminal position versus initial position, we look at Opportunity Percentage versus the Position Value. This would indicate how likely a player is able to gain an opportunity given their movement in positioning.
A slight word of caution here is that a linear regression is being performed. While this may yield answers, this is easily a place to improve on as much of the data is discrete.
The final portion of rebounding is then conversion. Conversion is the process of obtaining a rebound when an opportunity exist. Therefore it is a percentage. As a word of caution, a player can grab every rebound and still have a below 100% rebound rate. Consider jockeying for position and the initial opportunity out-hustling a secondary opportunity to gain a third opportunity. These types of rebounds happen when long range attempts miss long; the initial rebounder misses the rebound, a struggle ensues as the second rebounder is unable to secure the ball and the initial rebounder secures the rebound.
Since there is a high correlation between conversion and initial position, a regression is imposed. While this does not clean the correlation, it helps smooth the correlation with respect to the positioning.
The final rebounding value is then given as the formula
Many Rooms for Improvement
As location data is becoming better understood; understanding how to train each of the three components based on player tendency will slowly begin to be adapted. One simple correction is to stop thinking of distance as a ruler and instead think of distance as a time. This common engineering trick allows for us to redefine quantities such as positioning.
Similarly, regressions are good when the data satisfies the criteria for regression. However, if this is not the case, alternative methods should be explored. For instance, here is a screen cap of one regression used by the Second Spectrum team for adjusting hustle.
The R-squared for this is small as the data piles on the left hand side of the plot and largely varying sparse data populates the right hand side of the regression. Either some form of leverage correction must be used, or an entirely different correlation capturing mechanism needs to be employed.
Any which way we look at this, the rebounding quantity is a valiant effort in capturing the idea of rebounding ability of players. However, we should take the probability with a grain of salt as many published nuances are not quite true; leading us to realize that a 60% rebounding chance may actually be anything from 35 to 85%.
If given the task, how would you develop a a rebounding metric that captured player capability?