Over the day following a Derrick Rose 50-point performance, Jalen Rose commented that Derrick Rose took 653 dribbles over the course of the game; contrasting to a 52-point performance from Klay Thompson that only took 56 dribbles. The aim of the comment was to identify the difference between the players’ role within the offense. Thompson, a shooting guard-styled player, plays much of his time off the ball. Contrast to this, Rose is a point guard-style player who has an added purpose of initiating the offense. Despite this key difference, many fans took to Jalen Rose’s comment and responded with an emphatic Homer Simpson voice: ‘NERD!’ The reason being? He used a simple to obtain analytic called number of dribbles. To the uninitiated in NBA Analytics, this is simply a waveform filter and actually doesn’t require a pencil jockey to sit and meticulously count each and every thud on the court.
So how do we go about extracting these seemingly odd stats? Most of it comes from player tracking data. Unfortunately, the user agreements with tracking data typically state: No dissemination of data, no dissemination of code showing processing of said data, and no dissemination of analytics/visualizations containing summaries derived from said data; unless, of course, you have prior written consent. That said, we can still develop analytics based off this style of data. It will still work when you finally gain a chance to operate on tracking data!
Tracking Data: Quick Refresher
Tracking data, whether supplied by a company such as Second Spectrum or SportVU or KINEXON or Catapult, the results are nearly identical: We are given a time of play, an absolute time of reference, the player identification key, and a position estimate. Depending on the type of company, we will have other tags associated with a datagram; but this is the meat and potatoes of the tracking data. If we take a look at the defunct SportVU datagram, we have:
Breaking this down, we have the following components:
Absolute Time: 1451772637436 (Unix time… 2:10:37.436 PM January 2nd, 2016 in Sacramento)
Time Remaining in Period: 705.59 (11:44.41 remaining in period)
Time Remaining in Shot Clock: 10.59 seconds
Empty Slot: null, for future use.
Position Vectors: 11 arrays of length five.
Array Slot One: Team ID (-1 for basketball)
Array Slot Two: Player ID (-1 for basketball)
Array Slot Three: Sideline Location (0 to 94)
Array Slot Four: Baseline Location (0 to 50)
Array Slot Five: Height in feet (ball only)
So how do we even get these positions? Simple… Machine Learning!
Convolutional Neural Networks, Filtering, Classification.
The aforementioned tracking systems vary considerably when it comes to finding “dots on a map.” Second Spectrum and SportVU rely heavily on camera based technology, whereas KINEXON leverages a radio-frequency (RF) based system. The mathematics are relatively similar, with the key difference being an invasive or passive system. The KINEXON system is invasive, as it requires the player to interact with the system by wearing an RF emitter. The Second Spectrum system is passive as it requires no extra action from the player. The challenge with the latter system then becomes identification of the player.
While we do not have access to the raw camera data from Second Spectrum, we can start to hammer out details using similar set-ups. We should note that this is not Second Spectrum’s methodology; but rather a machine-learning based approach when using cameras (that’s actually been around for 15+ years!). To start, let’s consider a recent game between the Sacramento Kings and the Atlanta Hawks. Here, we have video feed from a well-known fixed point camera system. The first thing we must do is identify where the players are.
Here, we see all the players, the referees, the coach, the basketball, and the crowd. Using this well known camera angle (and having millions of frames over the years; except you Omari Spellman…), we can apply a convolutional neural network to extract out the player entities of interest. This is the most difficult part of the process (especially for side-view camera systems) but the convolutional neural network can be applied in many ways. They way we do it is rather straightforward…
First, we construct a court waveform. This is taking the fixed-point locations of the court and turning them into a waveform. This is merely taking a Fourier transform of the court boundaries. This requires labeling. Therefore, if we take the Fourier transform of this image, we get a smeared representation of the court due to the players obfuscating the court. However, applying a convolution, we find there is enough evidence of court to treat this as a notch filter and we can filter out the court. This leaves us with fuzzy players.
Second, we construct a player transform. This is taking a player’s movement and constructing a Fourier transform of the player. This helps in identification. To help illustrate this, let’s play a game…
Sidebar: Find Magic
NBA players are quite discriminating when you break down their body mechanics. Consider this example from the 1992 Olympic Dream Team game versus Angola:
It’s grainy, but we should easily identify Magic Johnson running the break. And it’s not because of his acrobatic backwards hook pass to Bird for three. It’s his gait. Magic is quite distinguishing in his run with his shoulders set high, his head set low and forward; almost forming a hunchback. Whereas Karl Malone is commonly a knees-forward runner with a straightened back and shoulders pinned back. Compare it to Michael Jordan, who slinks into a compact form when running. All these players have specific traits.
Given these filters, if we bash the filters against a game log, we also can build a mulitnomial distribution of likely players on the court. This helps hedge our bets on properly selecting the player on the court. But we will get to that in a moment.
Once we have the filters in place, we will have “boxes” identifying objects on the court. Next, we filter and classify. The first step is filtering. This process attempts to remove erroneous boxes and aid in classification. Commonly, this part works in conjunction with the classifier. A common filter is the Kalman filter. The role of this filter is to ensure the ball stays the ball and the player stays a player. More importantly, if two players screen, this filtering process helps track players correctly as the filters may not convolve enough frames to ensure the players are correct.
The classification step is the labeling process. This process identifies who a player is and can either be as simple as a multinomial distribution (this actually performs rather poorly), a neural network (this actually performs rather poorly too…), or a support vector machine (this is commonly used). The result is then given by:
This example turned out really well. I suppressed the referee indicator because Dave Joerger makes me look bad (he classifies as a referee). Also, the basketball is obfuscated in this sequence of frames. It’s correctly found using back-filtering (reversing the Kalman filter when its located a few frames further), but that would be misleading if found here.
Regardless, we are able to identify where players are within a frame of a camera view. This isn’t their coordinate on the court. In fact, using this camera frame, we can extract positions, but it’s tedious and excessively painful. Much of it deals with effectively single source collection, multi-source splicing, and resulting obfuscation. This means the primary method for finding coordinates is from the sideline camera which we need more than just a sideline camera to identify good points. It also means that the sideline camera isn’t the only feed we get. We also get baseline close-ups, overheads, and side-line closeups. These immediately kill the CNN requirements above. Finally, this also means even with our best option, we lack the ability to properly identify depth… the camera doesn’t tell us its zoom level and unless we have enough of the notch filter from above; we are unable to back out the zoom effect.
Instead, the camera systems for SportVU and Second Spectrum are mounted above the court. This allows a proper fixed-point analysis with little to no obfuscation. Therefore the CNN-filter-classify method runs exceptionally smoothly and the remainder of the positioning problem is a classical angle-of-arrival problem.
Now that we can obtain boxes of players, referees, and basketballs, and we know the location of the cameras, we can start to build a positioning model for identifying the location of a player. So let’s start simple with the basketball.
Since the cameras systems are fixed, we know which pixels the court is located.
Given that knowledge, if we classify the basketball, we obtain six proposed locations; one obtained from each camera. We can think of this pixel as a projecting line from that portion of the camera to the ball: Remember that depth is not solved using one camera.
The resulting position of the ball is then the best intersecting point of these six straight lines. Since the cameras are fixed and the court is fixed, all we really need to know is the angle-of-arrival of the line from the ball into the camera. The width of the resulting pixel at the location of the ball is simply the associated sampling error of the position estimate.
For angle-of-arrival analysis, we set the origin of the reference frame at the center of the six cameras. Therefore, if we average the (x,y,z)-coordinates of the cameras, we obtain (0,0,0). We then treat the location of the basketball as unknown b = (x,y,z) whereas each camera is listed as c_i = (x_i, y_i, z_i), all known locations. Using the pixels which the ball falls into, we obtain two particular angles: alpha_i and beta_i for camera i. Alpha measures the left-right directionality while beta measures the up-down directionality. This is what we measure through the CNN-filter-classify pixel matching method.
We can leverage trigonometry to help identify necessary parts of the model before writing the statistical equation for locating the basketball. First, we can use the measured elevation angles (beta) and the distances between the cameras to deduce the estimated distances from the cameras to the basketball.
We can apply the law of cosines to extract out estimated distances between the camera and the basketball. Note, due to error sources within the cameras, we may get differing distances for the same distance between the camera and the basketball. But that’s alright, we propagate those errors into the statistical system. I typically use the mean of the extracted distances. For a basketball, this commonly jitters the center of the ball within 1-2 feet.
Once we obtain the estimated distances, d_i, from each camera c_i to the ball, b, we can solve the system of equations:
This system does not include biases with cameras, and the error terms are suppressed. However, the left hand side are the measured pseudo-ranges while the right hand side is a collection of squared-distances between the six cameras and the basketball. We can easily solve this using Newton-Raphson root-finding. By doing this, we obtain the three-dimensional position of the basketball!
Next Steps: Follow the Bouncing Ball!
Now that the hard work is out of the way, we are able to start looking at characteristics of the basketball. Let’s take for instance a snapshot of that game mentioned above with a sportVU snippet from a 2016 Phoenix Suns at Sacramento Kings game. We start crudely using a possession frame. This is the time that the Phoenix Suns have the basketball between 5:28 and 5:15 block in the first period.
Now, we don’t have video. Instead, we can compute distances and leverage the possession frame to identify who has the ball and what is happening. To compute who has the ball, we can start with computing the distance between the ball and the player. We know the Suns are on offense. Therefore, we know Brandon Knight has the ball.
We can even see when a basic pass occurs. What kind of pass is it and where is it going?
While Brandon Knight brought the ball up-court, he makes a pass to Mirza Teletovic. This occurs roughly 3.5 seconds into the possession. To identify the type of pass, we can look at the z-profile of the basketball.
At that 3.5 second mark, we see the ball get picked up and passed to Teletovic. Notice the ball has a sharp point and comes upward higher than normal. That’s Teletovic receiving a bounce pass and pulling the ball up. Notice that Teletovic never dribbles the ball. Teletovic is approximately 19 feet from the basket on the wing. In fact, a sequence of screens occurs attempting to free open a player Teletovic will eventually pass to, who is…
…not Devin Booker. The ball appears to be closing in on Booker, however the ball is being skipped back over to Brandon Knight for a fifteen foot pass. Knight catches the ball off the skip pass and does something subtle in the z-coordinate of the basketball. Knight pump fakes before taking the shot on a Catch-and-Shoot field goal attempt from 23 feet out.
Knight misses the three point attempt as we see the secondary bounce above the rim (orange line). DeMarcus Cousins secures the rebound and the possession ends. In three dimensions, the play unfolded as such:
Filtering for Analytics
At this point, we can start developing a rule based system to tease out some basic analytics. For instance, we can use distance of ball to the player to help understand touches. Given touches, we can define what a dribble looks like in the data. Similarly, we can use point-to-point relationships to help understand passes and types of passes. In the example above, we saw four dribbles, one bounce pass, one skip pass, one pump fake, and one catch-and-shoot for the Suns’ offensive possession.
Armed with this knowledge of trilateration and the statistical/machine learning process of extracting position estimates, how would you start developing new techniques to measure some tracking quantities such as dribbling, passes, or even screening? But proceed with caution… someone might just call you a nerd.