In a recent blog post on defending the Hammer Offense, I showed that the quantification of distance to passing lane helps identify the coverage a defender has on an opposing player.In that very post, I showed only a graphic and did not give insight into how to compute this quantity. Today, we will walk through constructing this very simple analytic.
Of course, this requires spatial data, which can be either obtained via SportVU, or you can construct your very own data collection tool!
SportVU data effectively collects spatial locations of players and the basketball as an (x,y)-coordinate system on the basketball court. Each snapshot of the eleven entities is collected roughly every 0.04 seconds. The resolution is extremely nice as we are able to model fine-grained motion of each entity and develop rigorous spatio-temporal models.
The (x,y)-coordinates are effectively contained within the rectangular region (0,94) x (0,50). These are the dimensions of the court with the baseline running in the y-direction and the sideline running in the x-direction. I say the term effectively, due in part to the camera system being able to track players running out of bounds.
The challenge with SportVU data is that it is relatively large. For a 48 minute game, we effectively obtain 72,000 samples of 11 (x,y)-coordinates! Over the course of a season, not including playoffs, this equates to roughly 88,560,000 samples of spatial data.While this isn’t large for industry standards, trying to operate on a basic home PC can be slightly time consuming.
The other challenge is that processing SportVU data is indeed time consuming. Ideally, we would like to develop streaming analytics to run at game time. For instance, if we develop a crude algorithm that does a fantastic job of building a model, but it takes an hour to process a minute of game; we’d be expecting to analyze a game over a week’s worth of time, accounting for sleep, eating, and other basic living requirements. This is not so acceptable. Therefore, we need to either develop simple analytics that break down complex actions to approximate simple streaming actions or we get quite crafty with our data manipulation skills.
Brainstorming an Analytic
To start building an analytic, we start with asking a question. Here our question is, “How far away is the player from the passing lane?” This question is derived from a further complex question of, “How well is the off-ball defender guarding his man?” Traditional coaching strategy suggests that if your man does not have the ball, then you should hedge off of him just enough to be a nuisance to any developing play away from your man, but be able to also cover your man simultaneously. This is commonly referred to as the man-you-ball set-up on defense.
Once we have our question, we hypothesize a test function that hopefully answers our question. If you notice, I am interested in answering if my defender is suitably covering his man off the ball. I should really be asking for an analytic that says “Is my defender in the passing lane?” Which I am not. I’ve broken the analytic down into simpler parts, as I do not know what constitutes as covering a man means. Starting simple, we can derive the appropriate test later.
Therefore, we start with measuring distance to the passing lane first.
A Little Calculus
Since I have spatio-temporal data, I am able to take a particular snapshot of (x,y)-coordinates and quickly form simple geometric relationships with the data. This requires a little bit of calculus. We will concurrently illustrate how to build the analytic while identifying the required tools from calculus. Let’s start with the current state of the system.
Let’s consider a fast-break that is occurring. The red team is on a fast break with an unmarked defender that is forced to make a decision. This defender will either have to press the ball-handler or sag and cut off the passing lane. This unmarked defender is relying on his teammate, the green dot, recovering to stop any pass to the lead man. This means we need to identify his distance to the passing lane to identify whether the unmarked defender is able to make a proper choice.
Equation of a Line
The equation of a line is simple. It’s merely y = mx + b, where m is the slope of the equation and b is the y-value of the y-intercept. We only need to care about this line because it will help us identify the shortest path to the line. So let’s walk through how to compute the line given two spatial points.
If we consider the ball-handler as (x_1,y_1) and the teammate as (x_2,y_2), then the slope is simply rise over run. In this case it is
We can plug either of the coordinates back into the equation to obtain the y-value of the y-intercept; let’s choose (x_2,y_2). In this case, we get
This means our equation of the line in-between a passer and their teammate is given by
Rearranging terms, we obtain
This is the equation of the line between two players on the court.
By computing this value, we are able to identify the passing lane. However, we do not have to actually compute this line! We can draw the line simply by calling a draw function and plugging in the two player coordinates. We only need the coefficients of this line. For now, we will call this passing lane, which a vector, the vector P.
Normal Vector to a Line
The normal vector is a line that is orthogonal to that line. Orthogonality is merely a line that forms a right angle, or intersects our line at a “T”-intersection, with the passing lane line. This is a standard calculus quantity and is found by computing the gradient of a function.
What the gradient calculates is the direction of the greatest amount of change of a function. Our function here is the passing line, which is of two variables: x and y. This means we obtain a vector quantity that simply computes the derivatives with respect to x and y, individually.
In this case, we obtain the gradient
We call this gradient, n, because for the line, it is orthogonal to the line. Notice that these are the coefficients for the equation of the line above! While this quantity will be needed in determining the distance between the defender and the passing lane; we don’t even need to store it.
The difficult part here is that we do not know where the orthogonal part of the line that passes through the defender is located within the passing lane. So we just find another way through vector calculus.
The dot product takes two vectors and gives insight into the interior angle. In fact, it helps compute the area of the parallelogram spanned by both vectors of interest! If this is confusing, no worries. We will break this down.
First, we take our passing lane vector, P, and compare it to the vector, D, which is the vector between the passer and the defender of interest. We can compute P and D very quickly by subtracting the passer’s location from both the teammate’s and the defender’s locations, respectively.
Notice that when we combine P, D, and n, we almost form a triangle! In fact, the distance we scale the normal vector by is the distance from the defender to the passing lane!
The dot product is a quantity that gives us
The dot product is the length of the passing lane times the length of the distance between the passer and the defender, times the cosine of of the angle between the passing lane and the line between the passer and the defender. The quantity is excessively simple to compute, as it is given by
Here, (x_3,y_3) is the defender position. The dot product is a useful tool in determining projections, which is exactly what we want.
A projection of one vector onto another vector is an orthogonal mapping. Let’s take another look at the Phoenix Suns court with all the vectors mapped out.
The yellow line, D, needs to be projected onto the red line, P, which is the passing lane. The normal vector, n, is the actual direction of this projection! However, the length of n is not the distance between the defender and the passing lane! Instead, we project the yellow line down onto the red line by using a dot product.
What this helps us find is the number, a, that we multiply the normal vector, n, in order to obtain the distance we need! Let’s call this vector a = a*n. The length of a is then our distance we wish to compute.
So let’s carry out this math by piecing together the above calculus components.
Projective Math Leads to Simple Geometry
By performing this projection of D onto P, we obtain the angle in-between D and P. Call this theta. Noting that the projection form a right hand triangle, we can use simple SOHCAHTOA methods to obtain the distance. What’s SOHCAHTOA you ask? It’s the relationships: sine = opposite / hypotenuse, cosine = adjacent /hypotenuse, and tangent = opposite / adjacent. Here, the length of vector a along n is the opposite of theta while the hypotenuse is the length of the yellow line, D.
This gives us then the length
We can verify that this distance is indeed the correct length, as we can compare it to the normal vector, n, and see that it indeed as projective scaling.
Applying to Phoenix Situation
Applying this methodology to our Phoenix Suns representation, we find that the teammates are located 32.9192 feet apart while the defender is 26.3454 feet from the ball handler/passer. Computing the dot-product, we obtain a value of 721.9091; a relatively large number.
Using the dot-product, we compute the angle between the passer-defender line and the passing lane. This angle, theta, is that ugly arc-cosine term above. This is 33.6547 degrees.
If we take this computation of theta, we then compute the distance to be 14.6003 feet from the passing lane!
Therefore the analytic returns the distance we are interested in. Note as we did all this math, all that we required was the dot product and a couple trigonometric quantities. This is due to all the calculus steps taken to ensure we applied a proper projection and obtain the value we actually cared about. Hence the coding aspect is now straightforward.
We take in the spatial coordinates: (x_1, y_1), (x_2, y_2), (x_3, y_3) as the passer, teammate, and defender, respectively. In fact, we can do all defenders and all teammates simultaneously by vectorizing everything and obtaining all distances of all defenders relative to all four passing lanes. Instead of overwhelming you with those details, we focus on this single trio of players.
We first compute P = (x_2 – x_1, y_2 – y_1). This gives us the P vector. Next we compute D = (x_3 – x_1, y_3 – y_1). This gives us the D vector. We then compute sqrt(D*D’) and sqrt(P*P’). These quantities are the distances between the passer and defender as well as the passer and teammate, respectively. In math symbols, this is |D| and |P|, respectively.
Finally we compute the last quantity. Note: the angle theta will be in radians! This is important to remember.
Thus our analytic is a sequence of five computations; each that run much faster than 0.04 seconds. Allowing our analytic to process quickly at a near streaming rate.
But Wait There’s More!
Recall that we were actually interested in whether our defender is covering the passing lane! This is where the data science aspect of the analytic comes in. We can compute something silly like three feet from the line is covering. Or we can use the data to tell us something better.
If we are interested in understanding a player’s ability to cover a passing lane, we can trawl through the data and label situations where a pass was made to the player that defender is guarding. Depending on the definition of guarding the passing lane you decide will help you determine whether each instance was guarded or unguarded. This definition is critical.
For instance, a defender could be covering a passing lane, however, there is not basket coverage, such that a basketball can be thrown over the top of the teammate. This is typical on fast breaks where the teammate breaks for the basket on a full court pass.
If we decided to use recovery time for a player, we can label covered passes as passes where the defender either steals or tips the ball, or where a defender is able to recover such that a shot attempt is not taken as the player is forced to pass or put the ball on the ground to elude the defender.
In this case, we can track the distance covered with respect to the passing lane for every pass and use the covered/uncovered labels to statistically determine recovery speed for every NBA player. In doing this, we can use a classification algorithm to identify the distances between “able to cover” and “unable to cover.”
For example, Tony Allen (New Orleans) was able to recover on passes up to 6.243 feet from the passing lane. Therefore, his threshold is generous. Whereas Jahlil Okafor (Philadelphia) is one of the slowest in the league and is given a tighter leash 3.843 feet.
Full disclosure: These estimates are on sampled data; not a census of all 1230 games this past year. Plus it is determined off of a classification I decided to build; possibly not what a team is specifically interested in or plan for.
However, we are able to answer some of our questions and this gives insight into the mechanical process of a data scientist who is attempting to develop analytics for NBA data.