In a previous post, we took a look into some spatio-temporal data obtained through SportVU technology for the NBA and identified how to use that data to perform basic tasks such as building convex hulls to illustrate offense and defense coverage on a court; as well as provide basic Python code for the reader to test out their own ideas. That said, this only yields a mechanism for illustration only. Notice that there are no analytics in these procedures. The only exciting mathematical tool used is the convex hull operation; and this only identifies the tightest convex polygon that uses the players as its vertices.
So let’s ask a basic question: How long does each player possess the basketball? This is a rather simple question to ask but yet a fairly difficult question to obtain an answer. For instance, if a person simply watches game film and records the seconds, the estimates will be biased towards the second. On a touch-pass, one second resolution is simply too much resolution for the reviewer and the resulting time is biased.
SportVU data handles the possessions down to the 0.04 seconds thanks to its image recognition capabilities. This allows the reviewer to freeze frame position data of the ten players on the court, as well as the basketball at every 1/25th of a second. Now the problem is no longer watching game film and writing down seconds, but having to handle this larger data set without the ability to visualize the actual game footage. This is where the illustration techniques noted above come into play.
Let’s consider a simple 20 second span of time from Game 3 of the Indiana Pacers vs. Cleveland Cavaliers first round game on April 20, 2017. Here, this span will cover from 11:50 to 11:30 in the first quarter. This is just the Pacers’ and Cavaliers’ first possessions, respectively. In NBA data terms in it’s json format; we have what’s called moment 0. Here, the Pacers score the first basket of the game with an open jumper by Myles Turner at the nail after a sag on the Cavaliers defense on a high pick-and-roll. Kevin Love picks up the made basket and throws it back in to Kyrie Irving to initiate the offense on the Cavaliers possession.
Using the SportVU set-up, we can look at the data in an animation format. This is from the previous get-moments code usable in previous posts:
Since the x- and y-positions are identified by SportVU, we can compute simple measures such as distance between the basketball and every player. Recalling that our question is how long each player has possession of the ball, we can easily ask if a player is within three feet of the basketball. If they are, then they are likely to be in possession of the basketball.
Here, we note that possession is not recorded in SportVU data. So if we compute this simple action, then we find the following results:
- Myles Turner 4.24
- Thaddeus Young 2.32
- Kevin Love 1.84
- Kyrie Irving 4.44
- Tristan Thompson 0.56
- Jeff Teague 4.24
- TOTAL TIME: 17.64 seconds
This seems close to about right. However, Tristan Thompson never once touches the basketball. As Myles Turner sets the screen on Jeff Teague and rolls to the nail, Thompson not only blitzes Teague but also recovers to contest Turner’s jump shot. In these two instances, Thompson is within three feet of the basketball. So how do we fix this? A not-so-but-seemingly-simple solution is to ask which team is on offense? In this case, a player cannot be in possession of the basketball unless their team has possession.
This means we must incorporate a new data set and perform data fusion across not only the spatio-temporal data set, but also a play-by-play file. We have analyzed this to some extent in the past when analyzing possessions by every team in the NBA.
From play-by-play data, we can look up the actions that identify a change of possession: made field-goal attempt, turnover, defensive rebound, made free-throw attempt on final free-throw, and end of period. Extra special cases such as technical fouls are typically ignored as possession rarely changes or are empty possessions.
Data Fusion and It’s Difficulties
Now a moment about data fusion. In classical analysis of relational databases, typically a key must be made such that databases can be merged in a coherent manner. The classical example take an address list and a phone list with the attempt at merging a user profile. Sometimes the key value is a unique ID that is shared by both lists and a simple merge can be performed. Sometimes the unique key is messy, such as person’s name. While the keys must be unique, they may not be the same. For instance John Q. Doe is the not the same as John Quincy Doe. Some normalization must be done.
In even worse cases, suppose that the keys mean nothing in relation to each other. Such as Post Office ID# 33163 vs. Phone Company ID# 47783572587. Neither of these mean anything in relation to each other; however their associated metadata may be useful for key-matching. Such as un-normalized names (John Q. Doe vs. John Quincy Doe). In these cases, we observe just this fact in SportVU data versus play-by-play data from stats.nba.com.
After Fusion, Who’s on Offense?
Once we perform the merge, we can obtain a list of times when each team is in possession of the basketball. We can then implement the simple code bit to identify which team has possession of the basketball:
for timeRanges in possessions[teamPossessions]:
if (float(t[0]) > timeRanges[1]) and (float(t[0]) < timeRanges[0]):
offenseTeam = teamPossession
We can then identify all players within three feet of the basketball that are on offense:
handler = ids[t[1][‘player_id’].iloc[i]][0]
if ids[t[1][‘player_id’].iloc[i]][1] == offenseTeam:
if handler not in whoGotRock.keys():
whoGotRock[handler] = 0.04
else:
whoGotRock[handler] = whoGotRock[handler] + 0.04
Note that the code snippet is for all players within three feet of the basketball. The offensive team filter is applied on the small set of eligible players. Continuing with this truncated set of eligible ball-handlers, we obtain an updated amount of time with the ball in possession:
- Myles Turner 4.24
- Thaddeus Young 2.32
- Kevin Love 0.88
- Kyrie Irving 4.04
- Jeff Teague 4.24
- TOTAL TIME: 15.72 seconds
We see that the defense has been eliminated and the ball handler times have been properly corrected! Unfortunately, not all is completed here.
Now since height is not accounted for in the data set, we still have the problem of passes going over players’ heads on skip-passes and shot attempts going over players. In this case, we can go back to the play-by-play data set and identify when specific actions are taken. For instance, Myles Turner makes a basket at 11:36. But when does the ball release from his hand (by at least three feet)? This is at 11:37:22. This means there is nearly an extra second of potential possession by an offensive player only.
Underneath the basket, Thaddeus Young takes on an extra 0.44 seconds of ball handling as the basket is made. In this case, we have the adjusted difference of:
- Myles Turner 4.24
- Thaddeus Young 1.88
- Kevin Love 0.88
- Kyrie Irving 4.04
- Jeff Teague 4.24
- TOTAL TIME: 15.28 seconds
There are still some slight nuances in this such as Thaddeus Young gets an extra bump in time relative to the dribble hand-off with Jeff Teague early in the possession. We can verify this in the game footage, but using the SportVU data is not as clean. In these instances, we may want to look into filtering techniques such as Kalman filtering in an attempt to smooth transitions between players. In this case, we look at the velocity of each player and the basketball; and convergence of each. For instance, a pass will have a velocity vector with a collision angle with the velocity vector of another player. By marking these passes, we can then condition on passes and trim the times accordingly. We will save Kalman filtering for a future post.