Stochastic Tracking

In the era of tracking data, a need for a new style of analysis has emerged. Long gone are the regularized regression models and the simple counting techniques. Instead, we require leveraging shot-noise distributed systems such as Dan Cervone’s competing risks model, or Matthias Kempe’s self-organizing maps, or Peter Carr’s Imitation Learning. The list is fairly large. Each of these methods attempt to capture the spatio-temporal nature of the tracking data and applies some flavor of artificial intelligence or machine learning into the mix. And the results are, well… mixed.

One of the basic issues is that the tracking data isn’t nearly as it seems. In fact, each player is represented as a singleton: a single point on the court. This is effectively the center of a bounding-box fit around a player of interest. It does not take into account the players height, broadness, or wingspan. Therefore the single point will assume that Ty Lawson and Anthony Davis are the same person when given the same point. To combat this, we can augment the data and include personal information for each player. We’ve even done some of that in correcting “real-estate” for rebounds. The point is, each point is really a point estimate generated from a distribution of likely locations that a player resides.

So what does this mean for us?

A Simple Exercise: A Naive Velocity Estimate

Let’s take an excessively naive estimator of the velocity of a player. Assume that a player’s location at time i is given by p_i. If we want to estimate the velocity of the player at that time, we may compute

Note that we take the future time point and subtract the previous time point and divide by the amount of time elapsed between the measurements. For standard NBA tracking data, the delta-t value is typically .04 seconds. This is effectively the simplest estimator for the velocity, and is usually wrong when put into practice. Let’s see why.

Suppose a player is traveling at a constant speed of 9 miles per hour over a sequence of three points. Suppose the player starts at p_1 = (32.8021, 19.0808). Over the following two time-steps, the player moves at a constant speed of 9 miles per hour to p_2 = (33.2594, 19.3448) and p_3 = (33.7167, 19.6088). On the court, these three consecutive time points look like:

Plot of three consecutive time points of a player moving at 9 miles per hour. Starting point (red dot) to end point (purple dot) lasts .08 seconds in total.

As we can immediately see, there appears to be very little movement over the .08 seconds. If we compute the speed of the player, we can apply the above formula, and we obtain a velocity vector of v_i = (11.3575,6.600). Computing the length of this vector, we obtain 13.1359 feet/second (8.96 miles per hour). We have a slight offset because we rounded during the process.

This means that the player of interest is moving at a speed of 8.96 miles per hour at the second time step. Without any extra information, we are unable to estimate the speeds using this estimation process for time steps 1 or 3. However, this isn’t the problem. Instead, we realize that the data points are still merely a sample estimate.

Let’s assume this player is Tobias Harris. His registered wingspan is 6’11”. Let’s assume, for the sake of argument, that Harris’ shoulder-to-shoulder measurement is roughly 1/5-th his wingspan; or 16 inches. Then, this is the bounding box at which the data measurement tool is attempting to collect. From camera resolution and speed/positioning of the player, we may end up with approximately 1-4 inches of error on the measurement. Let’s suppose best case scenario that the player is missed by one inch. Then the velocity estimate becomes can fluctuate to anywhere between 8.3 mph and 9.6 mph. And immediately, we have over an entire mile per hour change in measuring a player just due to sampling noise; or stochasticity of the collection process.

The takeaway here is that when using a naive, and rigid, estimator on tracking data that is stochastic in nature, we may end up with quite undesirable or misleading results. So how do we begin to incorporate the stochasticity of the tracking data?

Physics First…

Before we tackle stochasticity, let’s first revisit how players actually move. We visited this idea in a previous post last year, and in that exercise we used the naive estimators. It was simple [notice how “choppy” the velocity and acceleration plots are] but effective in introducing player motion.

First, for a player to move from point, p_i, to a different point, p_i+1, they must have a non-zero speed. This is captured in the equation

Similarly, we must identify changes in velocities. To capture this, we must estimate the acceleration of a player.Here, acceleration is captured using the equation

And from the kinematics equations, in the x-direction, we have

The result is analogous for the y-direction.

The idea is this:

For a series of two time points, .04 seconds in duration, we assume that acceleration is constant. That is, if a player is slowing down, they are slowing down constantly (piecewise) over a sequence of points. This assumption is required since we cannot estimate the acceleration between two time points. From this estimated constant acceleration, we can update the velocity of the player. And from this estimated velocity, we can retrieve the position of the player. However, for a measured value of the position, we can correct (or smooth) the estimates of the position, velocity, and acceleration.

This leads us to a basic stochastic framework called filtering. And if we make even more rigid assumptions (which are generally accepted), such as the errors are Gaussian, then we obtain Kalman Filtering.

Kalman Filtering: Application to Player Tracking

With all this theory set up, let’s just walk through an exercise of motion for Tobias Harris. Recall that we only measure position estimates. From above, we see that these estimates may be fairly noisy. Let’s identify where we can model this noise.

To start, we have a motion model. This motion is the physics noted above. For two points, p_i-1 and p_i, we make note that we have no information in-between these two time steps and therefore assume acceleration is constant over this .04 second period of time. Note: This does not assume that acceleration is the same between two different intervals; just constant within each interval… hence piecewise linear.

This motion model will update our state of the system. To be clear, the state of our system is the position and velocity of the player of interest. The state of the system and the motion model is given by

This ugly matrix formulation is just the same equations as above! We just broke out the x- and y-directions of the player motion. To make this a stochastic model, we inject randomness by suggesting that the acceleration is noisy. Therefore, we suggest that a_i is a two-dimensional Gaussian centered at zero with some constant variance. This acceleration will help us capture the changes we see in the velocity (recall the 8.3 versus 9.6 mph example above), and inherently capture the position error as well.

However, this is not all. We don’t measure velocities; so we must introduce a measurement model to help capture the velocity information. In our set-up, the measurement model is given by

We call the measurement z, while the true state is given by x. This means we never get to see x in the raw; but rather we see z. The values e_x and e_y are the x- and y-errors in measuring the position of a player. We call this the measurement error. In the case for Tobias Harris, this would be something to the effect of, say, 1 inch in each direction.

The way we interpret this model is as follows:

We hold an assumed value for the current state of the system: position and velocity of a player, x. We then observe a measurement for that player’s position, z. Using that measurement, z, we can update the expected state of the system, x, to a smoother, more realistic version x|z.

And it’s this conditional representation that gives rise to the use of Hierarchical Bayesian Modeling for estimating player position and velocities.

Hierarchical Bayesian Modeling

We start by rewriting the motion model as a simple matrix equation:

Similarly, we rewrite the measurement model as a simple matrix equation:

Our goal is to then find x_i | z_i using the stochastic model above. To do this, we adhere to Hierarchical Bayesian modeling, that suggests given measurements z_1, …, z_i:

These four equations are the mean and variances for the filtering and the forecasting distributions, respectively. The index i|i is the filter process obtained through the measurement model. This is an update of the state vector given the measurements. The index i|i-1 is the forecast process obtained through the motion model. This is a prediction of the current state given the previous state and all previous measurements.

Since we are assuming a Gaussian model (from the error structure), we simply have a Gaussian-Gaussian Hierarchical model. Therefore, computing these expected values is straightforward.

For simplicity, let’s rewrite the matrix equations one more time:

All we have done is just absorbed the errors into an “error” matrix. Without bogging down the details of the mathematics, we obtain the following algorithm:

Parameters of Interest: F, H, R, Q

Initial Conditions: x_0|0 = constant, P_0|0 = constant matrix

Forecast to time-step 1
1. Apply the motion model with no noise to obtain the predicted state of the system: x_1|0 = F*x_0|0
2. Apply the motion model with no noise to obtain the predicted covariance matrix of the system: P_1|0 = Q + F*P_0|0*F’
Compute the Kalman Gain
1. K_1 := P_1|0*H’*(H*P_1|0*H’ + R)^{-1}
Filter using the measurement at time-step 1
1. Filter the state of the system: x_1|1 = x_1|0 + K_1*(z_1 – H*x_1|0)
2. Filter the covariance matrix of the system: P_1|1 = (I – K_1*H)*P_1|0
Repeat for all time-steps in the future.

There are two very special items of note here: the Kalman Gain, K_i, and the innovations, z_i – H*x_i. The Kalman gain is an artifact of a matrix inversion algorithm used when computing the posterior distribution of the Hierarchical Bayesian model. Through the algorithm, it identifies how much weight should be given to the measurement and current state of the system. We see it is heavily driven by the measurement noise, R, and the process noise, Q.

The innovations are actually measurements of the predicted value for z_i and the measured value of z_i. This is helpful in understanding how much correction is actually needed in the filtration step. And due to the smoothness of the system, the Kalman gain acts as a gate for how much correction is applied; as we now the measurement is still noisy.

With the stochastic model in hand, let’s compute the algorithm.

Player Motion in Action!

Let’s revisit Tobias Harris on a transition play. From the April 1st, 2018 game between the Indiana Pacers and the Los Angeles Clippers, we have a transition play during the first quarter. Harris leaks out on transition on the left side-line, eliciting a lead pass from Lou Williams. Harris accelerates to receive the pass and uses his momentum to blow by Indiana’s Darren Collison for a dunk.

Despite pushing the lead for the Clippers to 5, the Pacers would go on to win 111 – 104. But let’s break out the sequence of points that represent this drive. In total, there are roughly 100 points along this break. Plotting the transition, we see the player track:

Screen Shot 2018-09-24 at 2.09.28 PM.png

Track path of Tobias Harris on transition play.

If we are compute the naive velocities, we obtain the speed plot:

We see the moments where Harris slows upright before the pass is made, the acceleration and attempt to avoid contact with Collison, and then the pair of dribbles made before bursting towards the basket for the dunk. However, this is a very noisy plot and gives off the idea that Harris takes two major slow-downs to dribble. It’s not necessarily the case.

Kalman-Filter Breakdown

If we assume a favorable one-inch error in the measurement model, R, and a slight perturbation in the motion model, Q, we can start building the Kalman Filter for the stochastic distribution. Note that R is a 2×2 matrix representing the errors in the position estimate, and Q is a 4×4 matrix representing the errors associated with the model. Here, we just use R to be the one-inch error along the diagonal; zero everywhere else. For Q, we simply use an identity matrix.

The propagation matrix (motion model), F, is a 4×4 matrix with rows (1,0,.04,0), (0,1,0,0.4), (0,0,1,0), (0,0,0,1). The measurement matrix, H, is a 2×4 matrix with (1,0,0,0) and (0,1,0,0) as its rows.

Then, assume that the initial state vector is the first observation with zero velocity and the initial covariance matrix is just the identity matrix. We use this to forward propagate the system to obtain a predicted position and predicted covariance. The first predicted state is merely the first point again.

Due to the position scraping algorithm I’m employing for video, the locations are in a funky notation. Apologies in advance. But here’s the first iteration of the Kalman Filter process:

This may look a little overwhelming, but don’t let it fool you. It’s simple to perform by following the above “Four Step Process.” Carrying out this process, we obtain almost the identical path; except smoothed to account for measurement error.

Screen Shot 2018-09-24 at 3.08.09 PM.png

Comparison of the Raw Data Tracking (Purple Track) and the Stoachstic Track (Lime Green Track).

This is what we expect! Next, we should hope to see the actual acceleration of Tobias Harris; with near collision with Collison and two dribbles. Watching the video, we know that he does not perform a “herky-jerky” motion but rather ramps up, avoids contact, and accelerates into the basket.

Comparison of “Naive Estimator” for speed (Blue Line) and the Stochastic Tracking estimate for speed (Red Line).

And this is indeed what we see. We also see the immediate termination of the dunk as Harris’ momentum carries him about the rim. Where as the naive estimate assumes Harris somehow accelerates 8 feet per second hanging on the rim.

So there you have it. A basic introduction to Kalman Filtering and introducing a basic model for Stochastically measuring tracking data. In case you are a glutton for punishment like I am, if you’d like to see the proof of the material, feel free to scroll through my handwritten notes below. Happy tracking!

This slideshow requires JavaScript.

And yes… simple code:

	Paul Keane on 1990 – 1991 NBA RAPM
	Playoff Success of 8… on Analyzing NBA Possession Model…
	The Historical RAPM… on Historical RAPM: 1985 –…
	The Historical RAPM… on 1995-1996 NBA RAPM
	The Historical RAPM… on 1991-92 NBA RAPM

Squared Statistics: Understanding Basketball Analytics

Possession-level analytics for the pre-play-by-play NBA era. Historical RAPM data, 1985–1996.