For instance, let’s take a look at the New Orleans Pelicans and the Minnesota Timberwolves from last season. Both teams finished with a 111.4 offensive rating and slightly different defensive ratings: 112.6 (Pelicans) to 112.9 (Timberwolves). From a casual level, we’d expect these teams to end up roughly the same in the standings, as they are both from the same conference. And to a degree, that’s effectively what happens. The Timberwolves finished the year at 39-43 while the Pelicans clambered in at a Zion-winning 36-46.

Despite these ratings, we barely have scratched the surface with these two teams. For starters, the Pelicans played in roughly 220 more possessions than the Timberwolves: 8497 to 8279 on offense and 8504 to 8278 on defense. This suggests that the Pelicans played at a faster **pace** than the Timberwolves. And this is indeed the case at the topical level where the average possession for the Pelicans is 13.94 seconds (3rd in the league) to the Timberwolves’ 14.37 seconds per possession (14th in the league).

Remember, these teams have identical offensive ratings. Therefore, combined with adjustments for pace, we should see the same distribution of **potential ****offensive possession ending categories: **Field Goal Attempts, Free Throw Attempts, and Turnovers. Here, we make the assumption that end of period possessions are negligible as teams tend to have nearly identical amounts of period ending possessions.

In this case, we find that the Pelicans 140 extra turnovers compared to the Timberwolves, but 74 less free throws. That’s an estimable 107 extra possessions from the Pelicans. We expect, if all extra field goals are misses with defensive rebounds, the Pelicans to have roughly 110 extra FGA’s as a best case scenario for breaking down an offensive possession. Instead, we find that the Pelicans have a measly 80 extra FGA’s. This means we are missing 30 possessions when comparing these teams…

The reason for this, is due to the **chance. **

A **chance** is defined to be a segment of a possession that results in a field goal attempt, a free throw attempts that results in a potential change of possession, or a turnover. It is the segment of time that breaks up a possession into actions that result in loose balls (rebounds) or outright change of possession (out-of-bounds, steals). Every chance, like a possession, has a point value attached to it. And a nice relationship of possessions and chances is given by

From this relationship, we can model possessions as a collection of chances. Using chances, we can start to decompose players into their **roles within a chance**. While we understand that all possessions are not equal, chances are also not equal. Back in May, we took a look at the impact of turnovers on possessions/chances; in both dead ball and live ball situations. Four months later, Seth Partnow took a more in-depth look at the typical “five” categories for ending possessions:

- Live Ball Turnovers (steals)
- Defensive Rebounds on Missed FGA’s
- Dead Ball Situations
- Offensive Rebounds
- Munged Category of FGM’s, FTA’s, and DREBS on FTA’s.

These categories are almost a perfect partitioning of points. Steals lead to zero points on offense. Dead Ball situations are effectively dead ball turnovers with zero points on offense. Defensive Rebounds on Missed FGA’s are zero points on offense. However, Offensive Rebounds are not necessarily pointless chances, as they may come off of missed FTA’s after a basket (and-1) or a missed back-end of FTA’s. We make this note to identify that the category that deserves the most care in analysis of chances is the offensive rebound.

In particular, it is this category quantity that drives the well-known **Second Chance Points** statistic. And it is here that Minnesota “steals” possessions away from New Orleans in the comparison above.

Taking a step back from Seth’s “five” categories, traditionally, a chance has been defined as through field goal attempts, free throw attempts, and turnovers. Traditionally, from box score data, the number of chances has been represented as

We parse this down as a field goal attempt will lead into one of four results: a **defensive rebound**, an **offensive rebound**, a transfer of possession due to **made field goal**, or a **free throw attempt due to foul**. Through free throw attempts, obtained either through continuation or non-continuation, is traditionally viewed has having a forty-four percent chance of transferring possession to another team through **made free throw**, becoming a **defensive rebound**, or staying with the offense through an **offensive rebound**. There are other nuanced situations with free throws, but we will deem them negligible; such as the free throw that results in turnover due to lane violation on the offense.

Using this traditional setting, we find that the Timberwolves attempted 9,435 chances compared to the Pelicans’ 9,570; a difference of 135 chances. While this doesn’t explain the full 30 possessions that seem to be missing, we find out that the Timberwolves had more offensive rebounds and therefore had more opportunities to score per possession than the Pelicans.

With the notion of chance outlined above, the real goal of this article is to identify subtle artifacts of team dynamics. For instance one year, while working for an Eastern Conference team, I was out scouting a college game with another analyst. During the game, the analyst mentioned something about how three 25% usage players cannot coexist on the same team because their usages are too high, there’s only one ball.

I mentioned that usage cannot be added as it’s a ratio. I was told, “Usage is not a ratio. Usage is usage.” It was a very alarming comment to hear, especially from an analyst on a team. But none-the-less, I retorted that all probabilities must sum to one in the end. Unfortunately, I was scoffed at over the notion that probabilities had to add to one…

Despite the pushback, the traditional form of **usage** is a ratio of chances completed by a player given the number of chances possible during the player’s time on the court. The current standard model for usage is given by

There is an abuse of notation here, but we will explain it. The value **P **is the player of interest, while **Pt **is the time at which a player is on the court. This means the numerator is the number of chances executed by a player, while the denominator is the number of chances executed by the team while the player is on the court. The resulting value of usage is then a percentage of chances executed by the player, **P**. Commonly this value is multiplied by 100 to help readers understand that it is a percentage.

The above formula is the classic **play-by-play** version of usage. In the **box-score **version, adjustments using minutes played and a factor of five emerges to estimate the denominator. This is exactly the form found on basketball reference.

What is not so well established is that usage is a **conditional statistic**, dependent on a **sampling frame**. This means, usage has to be treated with care when being discussed. Making claims about two 25% usage players is * almost* completely meaningless if they are not sampled using the same sampling frame.

In the end, chances have to be gobbled up by players and ultimately someone on a team will gobble up over 20% of chances. Let’s take a look at this over a simulation…

Consider a game of 2-on-2 with teams of 4 players. In this case, there are six potential lineups. If players are labeled A, B, C, and D; the lineups are labeled as AB, AC, AD, BC, BD, and CD. Suppose we witnessed 1000 chances played by this team and they have the following breakdown:

- Lineup AB: 300 chances
- Lineup AC: 200 chances
- Lineup AD: 200 chances
- Lineup BC: 100 chances
- Lineup BD: 100 chances
- Lineup CD: 100 chances

Also suppose there is a secret **true underlying usage** of each player. That is, there is a real probability that a player would complete a chance given across the team. This probability **must add to one across the team**. Using this true underlying usage probability, we can then simulate chances and obtain an observed usage value.

Note that from this rotation, A plays in 700 chances, B plays in 500 chances, C plays in 400 chances, and D plays in 400 chances. Running one simulation, we find that the usage for each player is given by **(.801, .404, .410, .1825)**, for players A, B, C, and D, respectively. This is obtained from the usage formula above!

First, we see that this is not a true usage, as the players’ probabilities do not sum to one. Normalizing will not give us a remotely close answer. The normalized answer is **(.446, .225, .228, .101). **

Second, we see that **team usage **is much closer to the truth, but not quite there either. Team usage, being the percentage of **total team **chances, changes the denominator in usage to look at all chances; regardless if the player of interest is in the game. In this case, the team usage is **(.561, .202, .164, .73)**. However, we can do better with estimation.

Since we have a perfect sampling frame, called a **Balanced Incomplete Block Design **(BIBD), we can apply the associated algebra to recover the true usages of each player with respect to their team.

Have you figured out the true usages of each player?

The example above highlights an important distribution in basketball analytics: the **incomplete multinomial distribution**. This distribution descriptively states that while there are a collection of options we can select from, we cannot observe all options simultaneously.

In the case of lineups, we cannot play the entire roster at the same time. We can only select five players. In technical terms, we are looking for the probability that player **Yi** within a lineup **Ci** from a team of players **Ai** will execute the chance:

where the **p**‘s are the players’ **true usage** on the team. In real life, we never know the values of **p**, just as we agonizingly forced upon ourselves in the example above. Therefore, we take our sampling design (lineups) and observed usages at the player and lineup levels and perform an estimation procedure.

The likelihood function for the incomplete multinomial distribution is given by:

where **p** is the vector of true usages for the players of on a team, **a** is the observed chances executed by a player, **b** is the variable cell counts associated with the different lineups, and **S** is the matrix indicating the player lineups.

For the example above, we have

We use the designation of the count **b** as being negative due to the offset of **chances – first player**. Yes, the players are ordered in terms of most used to least used.

Attempting to solve for **p** is this distribution is challenging. The maximum likelihood method leads to a series of equations:

Drat, there’s that pesky sum of probabilities must be one, again. Seriously, however, we have four equations resulting from the MLE problem. Note that the **(i)** term indicates the row vector in **S**. The value, **TAU**, is an auxiliary vector that arises in the computation of the MLE; and therefore can be “injected” into the second equation, provided the inverse necessarily exists in the top equation.

As there is no analytic solution for the MLE, we can perform an optimization. A proposed algorithm by Fanghu Dong and Guosheng Yin identifies a numerical method for applying a fixed-point interation methodology for finding an optimized maximum likelihood estimator for the incomplete binomial distribution. They call this the **Weaver Algorithm** after the mechanical weaving machine.

</pre> p = np.array(np.ones(4))/4. s = np.sum(a) + np.sum(b) error = 1 while error &gt; .00001: tau = b/np.dot(delta,np.transpose(p)) print(tau) temp = a/(s*np.array(np.ones(4)) - np.transpose(np.dot(np.transpose(delta),np.transpose(tau)))) print(temp) pup = temp/sum(temp) error = np.dot((p - pup),np.transpose(p-pup)) print(error) p = pup print('True Usage: ', trueP) print('Estimate: ', p) print('Uses: ', a)

Using this code above, we obtain estimates of the true usages:

That is, the estimated true usages are **(.587, .185, .154, .073)** which are much closer to the truth; which are (**.60, .20, .15, .05)**.

For the 2018-19 NBA season, the Brooklyn Nets had a total of 19 players on roster that led to 637 different lineups of the 11,628 possible lineup combinations. Of these 637 lineups, 439 lineups managed to draw at least one chance. That is, a total of 198 lineups such as

Jared Dudley, Ed David, Shabazz Napier, Joe Harris, and Treveon Graham

played together for at least one game and registered zero chances.

Restricting ourselves to all lineups that participated in at least one chance, we find the distribution of chances executed by player:

Here, we see that D’Angelo Russell maintained the most chances for the Nets with 1860 estimated chances. Significantly behind Russell was Spencer Dinwiddie at 1137 chances. Using conditional usage, this is 31.1% for Russell and 24.2% for Dinwiddie.

At the team level, this turns out to be much smaller at the team scale. Running Weaver’s algorithm, it turns out that Russell’s overall usage is closer to 24 percent!

We see that Dzanan Musa’s usage gets corrected to reflect his playing time and that, despite only taking 19% of Brooklyn’s overal chances, D’Angelo Russell doesn’t tumble down towards 19%. Instead he “corrects” to 24.16%.

Using the unbalanced incomplete block design, we obtain an estimated 31.4% usage across all lineups, not far off from the measured 31.1% as before. Therefore recovery using the sampling frame is shows that Russell’s “scheduling” as a 24.16% player would indeed result in a 31% usage player.

We can compare the Nets to the Denver Nuggets. The Nuggets played with a total of 18 players during the 2018-19 season and used even far fewer lineups that resulted in chances: 329.

The Nuggets are used as a contrast only to show what a “top-heavy” team does with respect to true usage:

In this case, we see the Nuggets primarily use Jamal Murray and Nikola Jokic. This is no surprise. Their respective values of 24.9 and 27.4 come down a bit, but the relationship/offset remains relatively the same.

Here we see the comparison of Denver and Brooklyn as usage of players in order of highest to lowest. In this case, that despite Denver having the “star power” of Jokic and Murray, they also maintain a stable 7-man rotation; whereas Brooklyn has 5-man rotation. Typically, teams want to run with 7-8 man rotations.

Brooklyn’s knock on usage comes from the cost of injury as the team managed to have only 6 players play 65 or more game; a minimum of 80% of the season.

The takeaway here is we get to use an incomplete sampling frame to being to understand the underlying value of players within a system. A significant challenge of this algorithm, however, is the aspect of injury.

Under this model, the rotations are assumed to be at the discretion of the coach. However, a player may not play due to injury. Therefore a much more advanced model is needed to be used. That, in turn, is called the **censored incomplete multinomial **model. But that’s for another day.

The grand challenge is the ability to adequately measure a player’s basketball IQ. Instead, we focus on the components such as court vision. For, a player may have wonderful court vision but limited mechanical (compared to their counterparts) ability to score. Likewise, some players may be physical beasts and can devastate competition without understanding the value of the pass; like Wilt Chamberlain before he was encouraged to pass more and went on to lead the league in assists two seasons later. But how do we measure a player’s court vision?

One method to measure court vision is by **proxy**: the process of taking observable values and applying them to parts of what is agreed upon to be a **sub-task** of true underlying measurement of interest. We say sub-task as many proxies may be used to create an overall understanding of the quantity of interest.

For example, what makes a “great” defender? We could use a proxy of **steals** but not all defenders are credited with a steal even if their defense causes it. We could use a proxy of **blocks **but not all blocks take away possessions (only chances). We could use coverage of a player, but now we have to define that term in a way that people can agree. Or we can eschew defense and infer it from a higher level through **regression methods**.

For court vision, we focus on the offensive component and look at one of the proxies: **passing directionality**. We choose passing directionality because while it is a very simple item to understand, there is an underlying difficulty that arises when trying to say anything intelligent about it, and we have the **cut locus **to blame.

Passing directionality is the **direction in which a player attempts a pass**. For every pass a player makes, the ball exits with an angle from some **reference frame**.

To gain an understanding of a reference frame, consider an airplane traveling over the surface of the Earth. We care about two horizontal vectors, **East **and **North**. North points out the nose of the plane. East points out the right wing. But we also care about **Up**, which identifies where the ground is below us; a very important thing to know when flying. If, at any time the ground cross into **North,** our plane is pointed directly at the ground. Here, North is the principal vector of the reference frame and the angle towards the ground is the **azimuth**.

In terms of passing, since we never know the way a player is facing, we proxy their reference frame by assuming players always want to go to the basket. Therefore, the reference frame always has the **principal vector** facing the basket. In this case, any passes in the direction of the basket will have azimuths between **-90 and 90 degrees**. Any passes away from the basket will have azimuths between **-180 and -90 degrees**; as well as between **90 and 180 degrees. **

Here, we also note that passes to the left of the player are positive angles: **0 to 180 degrees **while passes to the right of the player are negative angles: **-180 to 0 degrees. **

Now, if we look at a pass from this player’s position, we have two vectors in the **embedded space** of the court. Using the embedded space of the court allows us to identify the angle from the principal vector in the reference frame. This is through the **dot product:**

Here, **P** is the reference vector, defined by the location of the basket **(25,5.25)** from the player **(x,y)**. Then **P = (25 – x, 5.25 – y)**. We similarly compute **Q**, the pass vector as the receiving player **(x’,y’)** from the player. Hence **Q = (x’ – x, y’ – y)**.

In code, this is relatively straight-foward with start being the player and end being the receiving teammate:

Now that we have directions of passes computed, not we can start to do some analysis… Unfortunately, we just opened a big can of worms. Namely, passes are no long **Euclidean**. Instead, they are **Spherical data of dimension one** and computing something as simple as a histogram fails (gives false results).

Using the reference frame approach, we now have a collection of angles. As the angles range from -180 to 180 degrees, we describe a **circle** instead of **Euclidean space**. The key differences are that **ONE: **the differences in pass direction are measured in **angles** not distance; and **TWO: **we have a cut locus. A cut locus is a place where multiple “straight lines” converge at the same point. Using key difference one, we are saying that a straight line is the **arc length of the circle **defining the direction of the pass. Using the reference frame above, the cut locus is at 180 degrees! This is a player making a pass directly away from the basket.

Knowing that tracking data is not quite deterministic (we can get different measurements for the same player location), we should not rely on the vectors directly, but instead focus on the **probability distribution** of a player’s pass direction. This amounts to computing a density estimate on the circle.

If we perform a naive analysis and apply straightforward kernel density estimator, the cut locus will give us a probability jump and throw away potentially important data. For instance, if a pass is made at 179 degrees with a reasonable error of three degrees, then we know the pass can be made between (176,180) degrees **AND** (-180,-178) degrees. The usual KDE will ignore the second interval and the resulting interpretation is that the pass simply “disappears.” Unfortunately, passes cannot disappear into the Upside Down.

Instead, we must perform **manifold kernel density estimation** to understand the distribution of passing direction.

The usual kernel density estimator is given by

where **n** is the sample size, **h** is the bandwidth, and **K** is the kernel smoothing function. For a given player, we can look at a collection of **n** passes of interest. Each pass is then viewed as a noisy estimate with some measurement error (bandwidth). The resulting kernel function is how that measurement noise is distributed about the measurement.

In classical kernel density estimation, the most common kernel function is the Gaussian smoother:

So we must use an analog version of this with the circle in mind. To this end, we can leverage the **von-Mises **distribution:

which only runs over the angles between -180 and 180 degrees. Here, **mu** is the mean direction and **kappa** is the **concentration**. Think of concentration as the inverse of variance. The larger **kappa** is, the tighter the distribution is about the mean. However, thanks to the cyclic nature of the cosine function, we ensure that passes don’t disappear when it crosses the cut locus.

The sacrifice we make is that the bandwidth is no longer separable. Under the von-Mises distribution, the bandwidth is absorbed into **kappa** and is contained within the **modified Bessel function of kind zero **defined by

which ever-so-nicely ensures that our probability distribution integrates to one!

Using this set-up, our circular kernel density estimator is given by

where now a **larger bandwidth** indicates the same things as a small bandwidth in the traditional kernel density estimation methodology.

As an application test case, let’s look at a subsample of passes from Steven Adams of the Oklahoma City Thunder. Here, we take the position of Adams at every pass, calculate the angle between the pass and the basket at Adams’ position, and mark that as an orange dot. Using zero degrees as the reference frame’s principal vector, we draw the circular manifold in green and apply the kernel density estimator above:

Here we see that Adams primarily makes passes to his front left at approximately 45 degrees and to his right front at approximately 40 degrees (320 degrees on the plot). Here, the green circle represents the manifold which describes the passing direction. Zero degrees always points to the basket. The blue line is the kernel density estimator. This shows that Adams primarily attacks the rim with his passes, but tends to favor his left.

At the high level, this is informative, however, we have lost the court information. We don’t know where Adams is making passes. More importantly, we don’t know if his passes are location dependent. For example, does Adams pass different from the left elbow than the right elbow?

To understand this, we must perform a conditional distribution.

When we condition the circular kernel density estimate, we begin to see the dependence of the passing directions based on the player position. Steven Adams is not a great example to show this off. Instead let’s take a look at John Wall.

At the left of the top of the arc, we find that Wall primarily passes towards the corner. Note that this doesn’t suggest passes do go to the corner. Only the direction. This can be passes leading into a give-and-go with a post player as well. However, we see that at this location, his passes tend to go left and forward at about 80 degrees.

However, if Wall move in to the foul line, we see the distribution of his passing change to looking in two directions with a slight preference to the right. At a cursory glance, this my be a reaction to a weak-side defender stepping up to cut potential drives. From the angular point of view, this is most likely a “kick” pass to the weak-side three point line as the direction points to below the break.

If Wall gets into the lane, we see that his passing almost goes entirely to the right. The small pocket to the left is pointing towards the dunker position, which is most likely a dump pass to get the ball out of congestion. The blip right to the rim is an oop passing lane. However, predominantly (over 75% of the time) the pass is getting kicked out.

Let’s put this all together and simulate a drive by Wall.

Here we see how the “court vision” with respect to passing plays out through the course of a drive. We can now start to perform other methods of analysis to better understand changes in passing vision; such as “do weak side defenders help?” or “If I position a defender in this location…” We can partition the distributions and perform circular distributional tests.

For the remainder of the article, let’s enjoy the subtle differences of players…

While Brogdon is on the Pacers for the 2019-20 season, his data was collected for the 2018-19 season with the Milwaukee Bucks. Here, we see a very Milwaukee-centric style of play.

As he traverses the same path as Wall, the passing vectors go **backwards** towards most likely Giannis Antetokounmpo and Brook Lopez. But as Brogdon attacks the rim, his passing directions change to the dump pass to the strong side block and out back to the right wing on the weak side. This was a Bucks’ staple in exposing the weak side collapse for open looks a the perimeter.

This type of passing regimen comes from a **distributor** who is not a primary option on scoring, but rather a player that protects the ball and forces the defense to swarm. These players tend to look away from the basket and create mid-range and beyond-arc opportunities.

Jaylen Brown follows a similar pattern to Brogdon with one slight difference: A high-low passing vector emerges during the drive. Boston, notorious for off-ball players beating their defenders baseline, shows that Brown looks for that pass during the drive.

Also notice the the zero vector is almost pinched to absolutely no distributional weight. This is very apparent when Brown gets deep into the lane. This indicates that Brown **is not passing to score**. He is going for the layup. In Brogdon’s case, he is still looking for a dump underneath the hoop, or a dunk from a teammate. In Brown’s case, he’s attacking the rim himself.

The left and right bulges in the distribution within the key are dumps and kick-outs. He tends to look for “at-the-break” players and anyone near the strong-side block. This is the profile of an **attacking guard **within a system.

Looking at one of the biggest names in the game, LeBron James is yet another style of player. James fits the profile of an attacking guard but without a system.

James starts with the standard perimeter passing profile at the top of the key, but as he drives into the lane, he predominantly looks short corner strong side and below-the-break weak side. Once he gets into the lane, he becomes one of the most dangerous players in the league: **uniform attacker**.

This type of distribution shows that all the weight goes just forward. There is a slight bulge towards the strong block, but weak elbow, weak below-the-break, and anywhere near the rim becomes primary options. This suggests that James **reads and reacts** to the defense accordingly.

One he gets deep into the lane, there’s three passing options: strong-side dunker dump, kick-out to weak side, and alley-oop to the strong side rim. There is no wonder why James averages 8-9 assists per game despite being a premier scorer and playing usual starter minutes (~35 a game).

Steph Curry follows the same form as LeBron James when it comes to attacking the rim. Curry keeps a large distribution in front of him, as opposed to peaking in certain directions.

That is, until he gets into the lane. At this point, he beings to look weak-side block. It is this position that players such as Klay Thompson are cutting behind the defense (Boston-esque in that nature) or players like Andre Iguodala and Shaun Livingston were waiting for the defense to collapse to get potentially open looks at 3-10 feet from the rim.

Compare to Wall above, who almost completely turns into a right-only vision player; we begin to see why Golden State is more likely to beat you from anywhere rather than the Washington Wizards: half the court disappears on drives in Wall’s case.

Armed with some simple manifold nonparametric learning as circular kernel density estimation, we can begin to understand some of the vision associated with players. Merely one small piece of the pie in decision making.

However, we are able to start performing testing of certain player capabilities or schemes. We are also able to “scout” player tendencies, and more importantly: **quantify them. **

And even more so importantly, we are able to start attaching probabilities to actions. Instead of quantifying passes by proxies through “win probability” or “change in shot quality,” we can now quantify the probability of a pass as “how likely will he actually make this pass?” At the scouting level, it tells me where I can make an adjustment.

Plus, the animations are a little cool, too… right?

]]>where **Y** is the vector of offensive ratings, **W** is a diagonal matrix of possessions played by the stint of interest, **X** is the player-stint matrix, and **(sigma, tau)** are the likelihood and prior variance, respectively. Putting this all together, and leveraging conjugate distributions, we find that the posterior distribution is indeed a Gaussian distribution:

From this seemingly tedious calculation, we find that the RAPM estimate for each player is given by

This is exactly the RAPM estimate that you see given on many of those other fancy websites. To put this into the offensive-defensive RAPM context, let’s understand again what this equation is doing. First, we have the offensive rating, **Y**, which is effectively points scored per 100 possessions (not differential as in some other forms of RAPM). Multiplying by **W** turns the quantity **WY** into a “100 times points scored.” Since the design matrix, **X**, is stints by players, the value **XtWY** is merely identifying the stints for which offensive players contributed points and defensive players discounted points and adding such stints together. This quantity is effectively “100* plus-minus” for each player.

Now, that inverse quantity… The quantity **XtWX **is counting the number of possessions each “tandem” has played in. The diagonal element will be “10* the number of possessions played,” as there are 10 players on the court. The off-diagonal elements are some multiple of possessions played; where the some multiple indicates players who play multiple stints together. In a previous post, we saw that just using this quantity in the inverse led to a mathematically unreasonable solution: reducibility… which led to infinite variances. Here, we rectify this by introducing that prior weight. This is a mechanism that biases the final result but allows us to obtain a reasonable variance on the final estimate. This means the inverse quantity is “inverted” possessions played between teammates. The inverse identifies some extent of the correlation between players playing together (or against each other).

Therefore, the final estimate is in “effective points per 100 possessions given some prior variance **tau**” for each player. In code form, using some pre-determined **tau** (I selected 5000 because that’s from literature) we have

The value **beta[-1] **is the intercept term, effectively meaning “baseline offensive rating.” Here that value was approximately 98. Courtesy of Ryan Davis, we obtain stint data and run this code to get the following output:

It’s not quite what we find on his website; but they are close. In fact, on his tutorial, Davis has Nurkic as 13th overall and Ingles as 14th overall. Above we have them at 12th and 13th. However, LeBron James is 19th in the above list while he drops to 36 on Davis’ list. Also note that Davis’ RAPM estimates are smaller, which indicates his **tau** is even smaller than ours (leading to a larger lambda). Regardless, we have effectively the same results.

What we also get from the tedious computation is the variance term associated with each RAPM estimate:

Since this is a regression model, we are able to estimate **sigma** by computing the residuals of the model:

where **N** is the number of stints observed, **P** is the total number of players observed, and the term **N-2P-1** identifies the number of degrees of freedom within the regression model. Note that there is a subtlety here: we assumed that every player who has played a single possession on offense has also played at least a single possession on defense. This is not a guarantee; therefore we may change **2P** to be **P_o + P_d**, where **P_o** is the number of players who have played at least one offensive possession and **P_d** is the number of players who have played at least one defensive possession.

At this point, much of the focus of RAPM is placed on determining the prior variance, **tau**. Typically, folks will eschew prior variance estimation by instead applying cross-validation to identify a “best” **lambda** term. In the outside literature, this value has ranged between 500 and 5000. In conjunction with estimation of **sigma** from the regression setting, we can extract an estimate of the prior variance through **“hat**{**sigma**} **/ hat{****lambda**}”.

What’s even nicer is that since we have a Gaussian posterior distribution, we know the highest posterior density (HPD) intervals determining confidence is equivalent to the standard confidence interval for Gaussian random variables. In this case, we can follow the simple “estimate **+/-** critical value **x** standard error” formulation. For a single player, we can look at the marginal distribution, which is also itself a Gaussian distribution. For a lineup of interest, we look at the joint of the individual player marginals.

Let’s apply the above techniques to the 2018-19 NBA Season. Courtesy of Ryan Davis, we obtain a stint file for which a row of data corresponds to indicators for the five players on offense, indicators for the five players on defense, the number of possessions played, and the number of points scored. From this data set, we can extract the values of **Y**, **W**, and **X** accordingly. For simplicity, let’s just assume that **tau = 5000**. The above table shows the “Top 25 players.” In terms of coding the error, we simply run:

Replicated with variance terms and marginal confidence bounds we now have

Here we see that Danny Green is atop the leaderboard. While this would suggest that Danny Green is the biggest contributor to net ratings, we know this is really not the case. What’s more important is that we should take a look at his variance term. Looking at the marginal of Danny Green’s Offensive and Defensive ratings, we can compute the confidence interval for the net rating. In this case, the 95% confidence interval for Danny Green’s net rating is given by

where **sigma_o** and **sigma_d** are Green’s offensive and defensive RAPM variances, respectively. The value **rho** is the correlation between Danny Green’s offensive and defensive numbers. For this exercise, Green’s offensive/defensive covariance matrix is given by

Using this variance-covariance matrix, Green’s Net Rating of 4.66 is really viewed as some value in between

Comparing this to the rest of the league, we see that 39 other players fit within this confidence bound, indicating that despite being the “league leader,” Danny Green really identifies within the Top 39 players in the league. This is a “best case scenario” for identifiability. In fact, if we grab the **200th** player in the league, Kyle Korver, we find that his confidence interval is **[ -1.21, 4.16]**. This indicates that Korver is equivalent to 460 other players in the league; ranging from **Giannis Antetokounmpo (6th) to Damyean Dotson (464th)**.

Now let’s extend this out to a starting unit. For sake of argument, let’s look at a “starting” lineup for the Brooklyn Nets during the 2018-19 NBA season. Using starts as a proxy, suppose the starting lineup is D’Angelo Russell, Joe Harris, Jarrett Allen, Rodions Kurucs, and Caris LeVert. Using the single-season RAPM estimates above, we obtain offense-defense ratings for Russell, Harris, Kurucs, LeVert, and Allen (respectively):

with associated variance-covariance matrix:

The expected Net Rating of this lineup is then **0.5953**. Constructing the univariate variance term, we rely on the variance of a sum of correlated variables derivation:

Reading these right from the variance-covariance matrix above, we obtain a stint net rating variance of **12.7695**. This indicates that the confidence interval for the expected net rating of the Brooklyn Nets’ starting lineup is **[-6.4086, 7.5992]**, which is quite a considerable range over the span of 100 possessions.

It should be noted that we treat this type of analysis with the utmost of care. Recall that we are only using roughly 71,000 stints. For 530 NBA players, this means we only have **at best** 1.5xe-17 **PERCENT **of all possible 10-man lineups. So deviating outside of ay observed lineups is quite prohibitive. Therefore, building a dream lineup around **Joel Embiid, Anthony Davis, Andre Drummond, Jusuf Nurkic, **and **Justise Winslow** would have phenomenal RAPM considerations, it is not part of the **sampling frame** and therefore non-representable (read that as meaningful) in results.

In intel analyst speak: “We cannot determine what’s going on in Zimbabwe if all we do is look at Cincinnati and Rhode Island.”

Over the series we have created about RAPM, we’ve identified several of the benefits gained by regularization while noting the various pitfalls if we simply embrace the numbers. While we know the central limit theorem actually fails and ratings are not necessarily Gaussian at each conditional stint level (part 3), we can perform the regularization to impose a PCA-like solution (part 2) to understanding ratings better than in basic APM (part 1). However, we see that the variances are still relatively inflated and that we do not get a great understanding of player impact; see Kyle Korver above as the standard test case. Instead, we obtain a filtered identification of a player. And instead of relying on this muddle number that lacks a considerable amount of context; we can instead leverage this technique to impose further filtration on player qualities. This is the case for more recent advanced analytics such as RPM from Jerry Englemann and PIPM from Jacob Goldstein.

Straying from the technical advancements, we can also leverage RAPM as a “smoothed but biased” estimator for discussing the impact of a player on offense and defense. The reason we suggest “instead of technical advancements” is due to the fact that defining defensive metrics is **really hard**. A great synopsis of using RAPM to discuss this point, as opposed to creating potentially misleading defensive statistics is given by Seth Partnow at the Athletic.

However, we can put to rest the commentary that “understanding errors” in RAPM is difficult and instead embrace what these values are really telling us. (So please stop e-mailing me about this topic!)

]]>With the creation of Synergy, the basketball world gained valuable access to previously hard-to-obtain data on all field goal events in the league. One of the biggest introductions was the “primary defender” tag on field goal events. With play-by-play data, when a player drives to the basket or attempts a step-back three, the players are logged as **Player A driving layup 2PA **or **Player B step back 3PA. **The only way to exfiltrate the defender information is to go back and watch the actual film. For an NBA season, 1230 games each of approximately 2 hours leads to 2460 hours of film to review. And it’s rare that we can watch, log, and verify at real time. Also, for reference, there’s only 2087 hours in a standard **work year** according to the federal government; and that’s taking no vacation days.

Another fantastic introduction in Synergy is the **play-type** field that identifies the action that leads to a field goal attempt. For instance, a pick-and-roll may occur and it frees up a drive to the basket. Again, in play-by-play data the play is logged as **Player A driving layup 2PA**. But in Synergy, we get to know who the screener is, who the ball handler is, and who the primary defenders are. As an analyst, if we wanted to measure the quality of a shooter in say “pick-and-roll” events, all we had to do was open up Synergy and sort on field goal percentage on pick-and-roll events.

The key here is that Synergy leverages **mechanical turk **logging of games. It uses loggers and verifying loggers (opposed to machine learning) to help ensure accuracy of their data. There’s also “one-touch” video in Synergy, which allows the analyst to view the play in question; which is undoubtedly the best feature of the system. If we are interested in every pick-and-roll that Damian Lillard plays in, we can filter on Little and Pick-and-Roll events and click on any attempt we are interested in. There’s a reason why Synergy is expensive to the casual viewer. There’s definitely a lot of blood, sweat, and tears that go into this platform.

Over the previous six years, Second Spectrum attempted to leverage tracking data to perform similar tasks as Synergy, but also improve the quantifiability of players in given situations. To this end, instead of mechanical turk-ing field goal events, second spectrum could identify all pick-and-rolls; which include non-field-goal attempts. This was a revolutionary step from Synergy’s sortable table of only-field-goal attempts. For starters, the analyst could now **track how many pick-an-rolls defenders could disrupt and deter any field goal attempt**. Therefore instead of seeing a switching defender give up say 47-for-80, a rather terrible 58.75% defensive field goal percentage, we may find out that teams have actually ran 137 pick-and-rolls against that switch defender. That 58.75% is really a 47-for-137; or 34.31%. In case you were wondering; this was Rudy Gobert from a random subsample of games.

Instead of using humans in the loop (which is exhaustive just from an hours standpoint), Second Spectrum employs a proprietary machine learning library that classifies **trajectories** as certain basketball actions. One such classification algorithm focuses on identifying pick-and-roll events. The beauty of Second Spectrum’s work is that not only do they have upwards of 200 actions classified, ranging from screens to fast breaks to field goal types and defender contest style; but they also have the Eagle platform to perform similar tasks as Synergy’s platform: we can select plays on demand and watch the video as well.

Key challenges with both Synergy and Second Spectrum focus on the nuances in their logging system.

With Synergy, an analyst must grapple with the logger’s definition of coverage. Two key stories pop up from Synergy that has been shared around the league: JJ Hickson an Earl Boykins. If you’re not familiar with these two stories, here’s the short gist.

One season, a team was interested in finding a dominant scorer at the rim. One quick and dirty way was to use the field-goal location tag in Synergy called **Rim** and sort on all players. Immediately J.J. Hickson popped up to the top of the list. This led the team to investigate Hickson as a potential rim scorer. What the team ultimately found out was that Hickson was indeed a top scorer at the rim; but specifically **at the rim**. He could convert dunks and he tried to dunk a lot. As soon as he bumped out to 2-to-5 feet, his FG% would drop significantly and his attempts would fall off a cliff; meaning he wouldn’t take those shots either.

The reason for Hickson popping up is because Synergy’s definition of Rim is the region near the basket. And unless that team could guarantee spacing (a relatively foreign concept at the time) to ensure Hickson could get 6-8 dunks a game; he wasn’t going to be the guy they were looking for.

Another season, a team was looking for a strong perimeter guard. Of course, in a sorting that would make most analysts cringe, the team sorted on defensive three point percentage as primary defender. Out popped Earl Boykins at the top of the list. Furthermore, Boykins had been near the top of the list for multiple seasons.

It turned out that due to Boykins’ size, teams would try to attempt shooting over him thinking it to be a psychological advantage. Those players would actually take lower quality attempts than usual. For one season, attempts on Boykins per possession led the league but quality was near the bottom and despite teams converting better than quality on their attempts; it was their shot quality (decision-making) that led to an overall lower percentage rather than Boykins’ defensive prowess beyond the arc. Adjusting for quality, Boykins actually turned out to be a solid perimeter defender, but nothing exceptional; which the team was looking for.

What should be clearly stated is that these examples are not showing that Synergy is bad, but rather there is a nuance to the data that is delivered. In fact, Synergy is a wonderful tool when used thoughtfully during executing player analysis.

In the Second Spectrum case, identifying primary defenders and contests are two looming challenges for analysts. While the company provides labels, it too has nuance. For instance, Second Spectrum attempts using a Munkres-Linear-Assignment type algorithm to identify primary defender match-ups. It’s a fantastic machine learning algorithm and is used in several advanced tracking algorithms today; but it’s also nuanced. In some cases, it’s slow to reassign players on switches. Specifically, when a BLUE action occurs, it may not correctly attribute primary defender status on the shooter.

Similarly, defining contesting is a challenge; particularly around the rim. For many years, defining a contest at the rim was poorly applied by Second Spectrum; and the reason is because **tracking data lacks directionality and player verticality**. This is not a fault of Second Spectrum; the cameras can only get what they can get for now.

In the case of directionality, the biggest problem is a player who is back-pedaling on a pass and has no chance to contest a shot, but their momentum carries the defender towards the shooter. They will be labeled almost always as a contester.

Similarly, we do not have any knowledge of the player’s z-axis in the tracking data. This means we have no idea whether a player jumped to contest a shot. So if a player attempts to take a charge or attempts a strip but lets the shooter go; they can easily be listed as a contesting defender.

Given some of the nuances in both Synergy and Second Spectrum, one thing neither system can give is **how a player runs that play**. We’ve been primarily discussing pick-and-rolls. In both Synergy and Second Spectrum, they give us a **marker** and a **result**. What we don’t know is how a team runs the pick-and-roll. Do they run it slower or faster? Do they run it wide or tight? Is it delayed? These may seem rather odd questions, but the answers give way to understanding **how quickly does a team attack the switch** and **how much spacing do they incorporate off the pick** and **how much gravity is expected for the driver to have**, respectively. And it’s these things that can’t get answered directly from Synergy data or Second Spectrum markings.

Instead, we would look back at tracking and perform a well-known task in the geolocation world: **registration**.

Registration is the process of finding a spatial transformation to align multiple point sets. In the case of geolocation, the most common problem is to ensure an aircraft follows its way-points. Using the trajectory of the aircraft, we can compare the aircraft’s trajectory to the intended flight path and identify deviations that may have occurred. The “cool, new” problem is applied to **automated vehicles** such as driver-less cars, to ensure a car is following its course of way-points.

But also, it’s used in many other applications, such as monitoring foot traffic of pedestrians in a park. Measuring trajectories of patrons of a park may help the park officials identify optimal locations for newly proposed sidewalks to be installed. In this case, we look at **thousands **of trajectories and perform registration to see the largest class of paths.

Finally, in basketball, we apply registration to identify **similar plays. **Since we are using registration, we can also identify the amount of **distortion** associated with the play and it’s this distortion (or, technically, **warping**) that gives us insight into the nuances of the players associated in the action of interest.

Spatio-temporal registration is the process of comparing two trajectories through an optimization process combining temporal registration (dynamic time warping) and rigid spatial registration. Combining both the temporal and spatial aspect allows us to compare the trajectories as bodies move along these paths as not only a function of distance, but also of time. The registration process is then identification of the **difference** between two trajectories, allowing us to identify if two trajectories are effectively traveling the same path.

For two temporal processes of spatial locations, **X **and **Y**, of length **Nx **and **Ny**, respectively; we may have to prepare a warping function to align the series. A **warping function** is a function that attempts to find temporal “matches” between two time series:

In this case, if the sampling rates are off, the warping function will attempt some form of interpolation between the two time series. Suppose **X** is a “longer” time series, then the warping function will identify the appropriate slice of time to compare **X** to **Y**. The value, **s**, is then the **segment** in which we compare the two trajectories. Hence the function **PHI_x** and **PHI_y** are simply looking for the the index of each respective series that match within a segment.

Thankfully, we do not have to apply rigid rotations as the sampling rate in Second Spectrum data is typically uniform, sampled every .04 seconds. This means, the function **PHI** is simply looking for an **offset** between the start of a trajectory in question and the segment in which the play elapses over.

To be clear, suppose Pick-and-Roll (PnR) action for a team occurs at 11:38 remaining in the first quarter (seen as time **12 seconds**) and it takes 3.7 seconds to complete the action. Then suppose a second PnR is completed by the same team at 2:17 remaining in the first quarter (seen as time **583 seconds**) and it takes 4.1 seconds to complete the action. Then, for our dynamic time warping function of choice, we may select the larger window and synchronize the motion of interest.

Considering the two PnR plays above, we perform a **temporal registration** on the point guard action only. The play on the left shows a **Hedge-and-Under** defensive scheme, which pushes the point guard away from the basket as the on-ball defender gets extra time to sneak underneath the screen to recover. The guard sees that the screen defender is not going to switch and attempts to accelerate to attack the recovering guard.

The play on the right shows a **Show****-And-Over** defensive scheme that has gone woefully awry for the defense. The screener tangles up the on-ball defender and this forces the screen defender to ultimately switch on the show. The point guard, seeing that he’s drawn the (hopefully) slower defender, accelerates earlier than in the first play. This allows the screener to slip and keeps the on-ball defender straggling behind the attacked screen defender.

Here, we see that the point-guard action is not nearly identical however, the action is almost the exact same: guard drives right, attacks the right elbow, screener slips towards left elbow. Performing a temporal registration will align the motion across both plays.

We see that the lime green lines serving as the warping function does not necessarily find the closest points in space. As the lines turn more sharply than the curves do, this suggests that the second action moves a little fast than the first!

Spatial registration is the task of identifying similar shapes. The most common example is looking at a selection of point as asking, “Are these the same shapes?” Spatial registration therefore looks at **rigid motion** which include **rotation**, **reflection**, and **translation**. Spatial registration may also use other tools such as stretching, however that is for comparing shapes that are measured on different scales. Under the Second Spectrum hypothesis of equal sizes for all frames, we may omit stretching as a factor.

Therefore, the key question is whether a spatial trend is equivalent in **rigid** **motion**. The challenge with spatial registration is that actions may lose their right-left interpretation. For our PnR examples above, spatial registration will identify both a translation and a rotation to match the point guard action.

We see that the action is strikingly similar in pattern, but as a **reflection** and a slight **rotation**. We do lose the information of angle of attack; but we can test for defender effects on this later using the space of rotations, **SO(2)**.

The methodology used for identifying rigid motion (as we see above), is commonly solved using the **Iterative Closest Point (ICP) algorithm**. This algorithm treats the trajectory as a point cloud, regardless of time and looks for optimal matching through an iterative scheme. Unfortunately, this methodology fails to properly register player trajectories as the temporal aspect is too important to ignore.

This leads us to spatio-temporal registration. In this case, we combine both the spatial and temporal registration into a single cost function given by

where **R** is the rotation operator, **T** is the translation operator, and **PHI** is the time warping function. We then can compute and inner- and outer-optimization scheme where the outer loop solves the dynamic time warping problem, followed by an inner loop of spatial optimization. Iterating over this scheme, we then identify a spatio-temporal distance for comparing two player trajectories.

Now that we are able to spatio-temporally register two player actions, we can start to develop **distributions** of player actions. These can be defined as clusters of low-cost comparisons between two trajectories. From here, opportunities are endless. Here’s a couple examples:

Now we can start testing the impact of certain defender actions on PnR plays. In the example above, we saw the same PnR get attacked differently. As we saw the guard respond differently each time, the spatio-temporal registration is actually quite similar. We can then look at the parameter sets of **R, T, **and **PHI** and condition on defender response. Using this, we can quantify the changes in directionality and speed; and begin to answer the following question: **How will a hedge compare to a show by my screen defender****? **

This allows us to separate ourselves from poorly construed results-based analysis such as “Whats my defensive rating when I perform this defender action?”

Another problem we can begin to answer is: **How well can my ball-handler read defenses? **In this case, we can look at changes in the trajectories and again test on **R, T, **and **PHI. **Here, we are not testing god or bad decisions; that requires a target variable. Instead, we are looking at how the **distribution changes** given a new wrinkle in the defense.

For this situation, we may ask about how defensive rotation may impact how a ball-handler attacks the rim. In this case, we may see quite a change in R, T, and PHI dependent on the schemes. We can scan the clusters of registered ball-handler motions and compute the probabilities of making that registered motion given the defensive scheme. From there, we may look at the players associated with the action and gain insight on how players respond to the action. Note, this maybe quite noisy at the player level; so be very careful in making player-based decisions.

Ultimately, the point here is that the game of basketball is performed in a spatio-temporal manner. Therefore it requires tools to analyze the spatio-temporal aspect accordingly. As an attack at the rim between **Damian Lillard** may be considerably different than one performed by **De’Aaron Fox**, despite their spatial trajectories looking the same. Registration also allows for follow on testing without having to rely on result-based analysis. Consider this artifact when discussing perimeter defense; as shooters may not take an attempt despite doing all the right things leading to an attempt.

This way, we can leverage platforms such as Synergy to identify types of plays, Second Spectrum to extract out markers for the plays; but then build our own custom analytics on top of the tracking to perform the rigorous test.

]]>

Whenever we develop an analytic to help describe the game, we typically have to ask three things. First, **“is our analytic representative of the actual thing we are attempting to analyze?”** Second, **“does the analytic yield intelligence?”** Finally, **“is our analytic stable?”** While these seem like obvious requirements, it may come as a surprise that many folks actually miss the mark on one of the three requirements of developing an analytic.

Take for instance, **perimeter defense** metrics. While it has been long known that defensive three point percentages do not truly reflect a team’s perimeter defense; yes that’s three links representing effectively the same view… many folks (including some pro teams!!) still use defensive three point percentage as a barometer for defining how well their team plays perimeter defense. While many will attempt to argue that defensive three point percentage does indeed measure perimeter defensive capability, it has been shown repeatedly (over at least a five season span now) that it is indeed not stable; nor does it yield actionable intelligence.

In response to fighting with survivor bias that comes from play-by-play, savvier teams, have focused on **frequency** and **efficiency** relationships; attempting to understand the **“negative space” **of perimeter defense. That is, of deterrence of high quality attempts and promotion of low quality attempts. Others attempt to mitigate the survivor bias by introducing “luck adjustments.” Whichever direction we choose to go for our analysis, the challenging part remaining is to determine the robustness of our measure.

In this article, we focus on defining a core statistical concept in analytics: **consistency**. For a given analytic, consistency identifies the “biasedness” of an estimator relative to its sample size. As the sample increases, we should expect the estimator to converge to its true value; hopefully the parameter. Consistency is a **probabilistic argument** that is defined by

for a true parameter, **theta**, and its estimator, **theta_n**, for some sample size **n.** Thus, the goal of analyst is then to determine if this equation is satisfied and then identify **convergence rates** of the statistic they had just generated.

Let’s start with a simple exercise to demonstrate consistency. Let’s consider an independent, identically distributed (IID) **Bernoulli** process with some probability of success, **p**. The most basic example is the “coin flipping problem.” So let’s start there. Suppose a coin has a probability, **p**, of coming up heads. Suppose we flip this coin **n** times and count the number of heads on the coin. Our goal to estimate the true value of **p** and then determine how **consistent** that estimator is.

If we’d already had some statistical training, we would attack this problem by exposing our knowledge of the distribution and apply **maximum likelihood estimation** to obtain an estimator for **p**. In this case, the sample mean becomes the estimator and its variance is merely **p(1-p)/n**. But how do we check consistency?

First, we see that

is our estimator of the probability of flipping a head. We can either determine the distribution of the estimator directly, or we can work with the original distribution. In this case, it’s straightforward to determine the distribution of the sum of IID Bernoulli random variables. In many situations, determining the distribution is fairly difficult.

To identify the distribution of the sum of IID Bernoulli random variables, we can look at the **moment generating function** (MGF) and show that the sum of IID Bernoulli random variables and the Binomial random variable are the same:

The last line is a moment generating function for the Binomial random variable with mean **np** and variance **np(1-p)**. Using this knowledge, we can then look at the probabilistic argument for consistency. Unfortunately, using the probabilistic statement directly is a challenge as we also need to understand the distribution of the absolute value of the estimator. That’s something I would never attempt for this problem. Instead, we rely on a well-known probabilistic relationship, called the **Chebyshev Inequality**.

The Chebyshev Inequality is a relationship that **bounds** a probabilistic statement in a particular form:

This is a particular form of the Markov Inequality, but allows us to identify convergence through the use of the variance associated with the underlying variable of interest. Therefore, writing the probabilistic argument for convergence, we see:

Applying the limit (increase in sample size), we see that the result goes to zero! Therefore, our estimator is indeed consistent!

Consistency is a **limit-based argument. **This means that it’s a theoretical value that will never be achieved in practice. To this end, we identify that our estimator indeed converges, and we are given some guidance as to **how well it converges**, thanks to the Chebyshev Inequality.

One way to interpret this relationship is that **epsilon** serves as a bound on the variance; and in turn, on the deviation of our analytic about the true underlying parameter of interest. We see this directly in the first line of the consistency proof for a coin flipping example. For argument’s sake, suppose the coin is **fair**; meaning the probability of obtaining a heads is one-half. Further suppose we are alright with obtaining variational error of one-percent. Then, the sample size required to ensure that we have these conditions met say 95% of the time is

which is **500**.

This means we require 500 flips of the coin to ensure that our variance is within 1% at 95% probability. Taking this a step further, this translates to have 10% or more error on the estimator roughly 5% of the time… Yikes.

Let’s consider this from another context…

We come back to our three point shooting argument before. Instead this time we look at it from the shooter’s perspective. The analytic question here is **“How well does my player shoot from the perimeter?” **If we see a player shoot 37% from beyond the arc, does that mean they are a 37% shooter?

Surprisingly, there has been little performed in this field. Darryl Blackport provided a quick treatise in reliability theory four years ago that involved the Kuder-Richardson 21 (KR-21) metric. For a while, a famous interview question from teams involved the dreaded “predict the three point percentage of every player in the league” which is, effectively an exercise in futility if you’re forced to get within 1 percentage point of truth. Over the previous few years, the rise of **shot quality **metrics have popped up to understand the **quality** of a shooter, which in turns leads to **eFG+** calculations. However, this categorizes decision making first, and then relies on the same noisy statistic (field goal percentage from the perimeter) in measuring capability.

So let’s take a look at the KR-21 methodology.

The Kuder-Richardson 21 metric is a psychometric-based reliability measure to analyze the “quality” of a test given to students. The goal of the metric is to identify how **consistent** a test. The original application, from Kuder and Richardson’s 1937 paper, is to identify if two tests applied to the same student population are of equal difficulty. As such, the paper starts with a single test of many questions splits the test questions in half (at-random), treats them as two separate tests, and then computes the cross-correlation matrix of the test with **n** questions. The resulting cross-correlation score is called KR-1; the first equation of Kuder-Richardson.

The remainder of the paper introduces different scenarios and slowly develops a statistical framework for understanding the comparative quality of test questions. It is effectively a permutation test that ultimately results in an analysis of variance (ANOVA) by the time we reach KR-21.

The KR-21 equation is given by:

Here, **sigma** is the standard deviation of the test scores for each student and **p** is the proportion of students getting a single test item correct. Notice that the term **np(1-p) **is lingering in the equation. This is due to the fact that each question is been as a Bernoulli random variable and every test question is assumed to be of equal difficulty (and independent of all other test questions)!

Taking this a step further, since the Binomial distribution is now modeling test scores, we treat this as a basic regression problem and the resulting variance is a **sum-of-squares for error** while the **sigma** terms identify a **total-sum-of-squares. **Then we have:

which is indeed the ANOVA equivalent!

Treating the KR-21 value as an ANOVA-like quantity, we effectively have an R-square calculation. Under R-square conventions, commonly the value of .7 is used as a “strong” value of correlation. Now to perform a KR-21 test, the challenge is to treat each player as a “student” who takes an “examination” of three point attempts. Ideally, we set the “number of questions” to be the number of three point attempts to be **n**. Then, for a collection of players who have taken **n** three point attempts, we compute the population variance of the players and the mean number of attempts across all players.

Starting at a small **n**, say 50, we collect all players across the league who have attempted 50 attempts and compute the KR-21 reliability number. If this number is too small (below 0.7), we simply increment **n** and repeat the study.

One of the unspoken challenges with a reliability measure such as KR-21 is that we may obtain a negative reliability score. For example, let’s generate a sample of fifty shooters that each take 100 3PA’s. Suppose every 3PA is an IID Bernoulli random variable. Using rows as players and columns as 3PA, we obtain a chart that looks like this:

The green column is the number of made 3PA by that player. The yellow row is the number of 3PA made in that attempt number. By computing the SSE component from yellow, we obtain a value of 24.4976. By computing the SST component from green, we obtain a value of 17.6006. **This leads to a KR-21 score of -0.3958.**

Why did this happen? First of all, this is an okay result. A negative reliability score only indicates weak-to-no correlation between test items and users. Specifically, it doesn’t identify “equally difficult” problems; but rather yields “noisy” questions that are randomly solved. In the context of three point attempts, this would suggest all makes are completely random. Which, by definition of our exercise is exactly what had happened.

Now, if I change **p=0.35**, which was the league average for the 2018-19 NBA season, we see the exact same thing happen. This indicates that ordering every single player’s 3PA attempts matter significantly. In fact, we apply a MCMC simulation of KR-21 scores using the above set-up to identify the distribution of possible KR-21 scores:

What this shows, for something along the lines of Blackport’s (and others in the Baseball community) analysis is that **shooters continue shooting and others don’t**. To be able to obtain a positive reliability score, shooters indeed have tendencies and they are picked up on within the KR-21 test. And once they are keyed in on, a value of **n** to nail down a high reliability number is approximately 750.

More importantly is that this shows that perimeter shooters scoring are **not random events**. Instead, they are indeed correlated scorers that have some frame of rhythm. If they do not, then a value of .7 reliability is **never attainable** except by random chance. Which, as you can see above, has exceptionally small probability.

So let’s go back to the Bernoulli coin flip problem. Instead of a coin, if we model a three point attempt as a Bernoulli process, we obtain the same probabilistic argument. Now suppose, using the worst case scenario of **p=.05** (worst case means highest variance!), we note that 500 3PA attempts are required to nail down a 95% probabilistic true value with **plus-or-minus 10% error. **That’s incredulous.

If we impose a **1% error**, we obtain instead require **50,000 attempts**. Which is much less optimistic than the 750 attempts noted before.

No instead of the worst case scenario, we have the **league-average** of 35.5%, we (under the Bernoulli assumption) require **45,795 attempts** to get within one percent error of truth at the 95% probabilistic level.

Leveraging the 750 number, we find that at league average levels, the actual margin of error associated with 750 attempts (bounded by probability) is really **1.8%**. This is indeed a sweet-spot and reinforces the results obtained by Blackport from roughly five years ago. What this tells us is that there are indeed trends in shooting, but they are not strong as they are effectively within the variance of a Bernoulli process.

To this point, we showed that three point percentages have weak trends, but can be modeled loosely as a Bernoulli random process. What this really tells us is that shooters attempt to **optimize their perimeter scoring chances when they decide to shoot**. This means attempts are not independent. Nor are they truly identically distributed. Furthermore, it’s difficult to obtain **tight** confidence regions on the true, underlying perimeter shooting percentage; which is why we see players fluctuate in rankings through the years.

To this end, there’s an underlying model for not only when shooters make attempts, but also for **when they take attempts**. At this point, developing a hierarchical model for the basic **frequency-and-efficiency** analysis. This way we can being to understand the player’s underlying decision making tendencies, in an effort to better understand their true underlying perimeter shooting capabilities.

In effect as Michael Scott once put it: “You miss 100% of the shots you don’t take. – Wayne Gretzky”

But as the moral of the story: For every introduced analytic, there must be an adequate understanding of the variational properties related to the game. After all, the goal is to always get the signal above the noise.

]]>For instance, let’s consider effective field goal percentage. The **Golden State Warriors** have posted a .558 eFG% while limiting their opponents to a .518 eFG%. While this by far the best eFG%; the differential (+.041) is only good for second in the post-season, behind the **Milwaukee Bucks’ **+.056. It’s no wonder both teams are deep into the playoffs as they are outscoring their opponents at such high rates. The second best eFG% in the post-season has been posted by the **Houston Rockets** at .527 with a positive differential at .038; third best in the post-season. Effectively, these are the teams that cannot be “out-shot” in games. Instead, alternative measures must be taken.

Taking a closer look at the Rockets-Warriors series, the Rockets apparently defeated the Warriors in almost every category of the Four Factors:

Here, we see that Houston indeed won three of the four categories, but lost the series two games to four. As every game was decided by **two possessions or less** there are no “aggregation biases,” such as a blowout win compensating for 2-3 losses. What this series ultimately came down to was the **distribution of turnovers**. More specifically, the **value of a turnover** was much greater in this series than the values for the other three categories.

As a baseline, Basketball Reference posited that both the Warriors and Rockets played 579 offensive possessions, resulting in offensive ratings of 115.7 and 113.8, respectively. Using this baseline, we value the **“average possession” **as 1.157 points for the Warriors and 1.138 points for the Rockets. If we look at the turnover battle, the only category the Rockets lost, Houston turned the ball over **98 times** (including 11 shot clock violations) compared to Golden State’s **83 turnovers**. The latter of which contains zero shot clock violations.

As an average, the Rockets gave up an extra 2.5 possessions per game off the turnover; but this does not account for the “4-6 points per game” lost. Using the baseline, this amounts to only about **2.78 points of differential**. Houston won every other category… so where does the remainder of the differential come from?

A way to break down the value of a turnover is to look at the difference between a “live ball” and “dead ball” turnover. To start, a **live ball** turnover is when a defense is able to immediately move into transition without any stoppage of play. The most common live ball turnover is an errant pass that leads to a steal. **Every live ball turnover must have a steal credited to a defender**. Conversely, a **dead ball **turnover is when the defense’s transition is briefly interrupted by a stoppage in play. **Every dead ball turnover must have an in-bounding pass to initiate transition**.

From a psychological stand-point, live ball and dead ball turnovers can bring about drastic effects on transition defense. For instance, a live ball turnover tends to lead to a scrambling **recovery **defense. As the play is “live” a defense has much less time to “set” than usual. However, a dead ball turnover can lead to bickering between teammates, between opponents, and between players and referees; causing a disruption in communication on the ensuing possession. For instance, a bad pass out of bounds may lead to a passer to voice a grievance to their teammate. For the brief moments this occurs, a transitioning offense may be running a designed attack such as a **Pistol **or a **Pin-Down ****Floppy **to pick-apart the distracted, and potentially frustrated, defenders.

Due to these mechanical natures (response time, psychological effects, etc.), the value of a turnover differs from team to team. For the Houston – Golden State series, here’s how the type of turnovers looked:

We see that Golden State had a tendency to turn the ball over live for **57.8% **of their turnovers! Compare this to Houston’s much lower **44.9%**, and we see that at least Houston gives themselves much more time to set on defense; as a non-substitution in-bounding typically takes between 2 and 8 seconds.

When Golden State turned over the ball live, Houston flourished, posting a 129 offensive rating. However, in dead ball turnover situations, Houston dropped significantly, even falling below their baseline rate of 113.8 with a rating of 109:

Compare this to Golden State’s transitions off of turnovers, and we find that their numbers increased in every case:

What this meant was that while Houston would punish the Warriors for live ball turnovers, if Golden State could protect the ball just enough and ensure the Rockets kept pace with them, Golden State would not just win the turnover battle, **but turn it into enough of a win to compensate losing the other three categories most associated with winning.**

Case in point: Houston’s turnovers cost them on average 3.27 points per game; more than one possession in two possession games.

While we presented an argument that turnovers were a significant factor in the Houston – Golden State series, we need to come full circle and identify that the point of this exercise is to show the **value of a turnover** and how it can sway games. In fact, the team that won the turnover battle went on to **lose four games in the series!**

In fact, teams that won the offensive rebounding battle went 5-1 in the series. Teams that won the effective field goal percentage battle went 5-1 in the series. Teams that won the free-throw rate battles went 2-4 in the series.

In fact, the story of Game One was offensive rebounding and Golden State’s control of the offensive glass.

In Game Two, Houston improve on the glass greatly (from .099 in Game One to .270 in Game Two), but the weak-side pin down action to open weak-side rebounding for the Warriors kept going strong, as they too improved their offensive rebounding numbers from .258 to .367. While this closed the gap substantially, Houston gave up 20 points on possessions following a turnover; 13 on live ball turnovers. In fact, Golden State started the game scoring **twelve of their first fourteen points on possessions after turnovers**.

In Game Three, Houston dominated the offensive glass much like Golden State did in Game One. In Game Four, Houston continued this trend. Despite losing the turnover battle in both games, by limiting their TOV% to approximately 11%, Houston managed to keep Golden State at bay when it came to increasing their points per possession.

Game Five and Game Six saw the points per turnovers take a jump. In Game Five, the Warriors used a mix of offensive rebounding an transition off turnovers to take the narrow win. In Game Six, Golden State scored **35 points **off of **17 turnovers** for an outrageous 2.06 points per turnover.

Throughout the playoffs, it has not been the Warriors who have punished teams for turning over the ball. It’s been the **Toronto Raptors**. Through their first fifteen games, the Raptors have netted the largest turnover differential in the post-season with **a +49 turnover differential**. While the entirety of the differential has come at the hands of the Orlando Magic and the Philadelphia 76ers [they are currently losing the turnover battle 40-43 to Milwaukee after three games], the Raptors need to continue their turnover domination in an effort to stay afloat in a challenging Eastern Conference Finals.

As a similar baseline, Toronto has an offensive rating of 106.6 with a defensive rating of 102. This translates to 1.066 points per offensive possession and 1.020 points per defensive possession. However, whenever Toronto generates a turnover, much like in the case of the Houston Rockets, their opponents **increase their scoring**:

The disparity of the live ball and dead ball turnovers are outrageous. This is due to the duration of time and plays allowed after a turnover. For instance, the average duration of a possession after a Toronto live ball turnover is 7.3 seconds. For a dead ball possession, Toronto’s opponents slow down their offense to a 15.2 second pace.

What this indicates is that Toronto’s transition defense is sub-optimal when it comes to turnovers. Specifically, the guards are unable to retreat as players such as Serge Ibaka and Kawhi Leonard have actually managed to dissuade attempts on live ball situations.

if we overlay the distribution of (relative) points on top of the duration of the plays, we find that there’s a “sweet spot” for teams to score after a Toronto turnover.

In this case, the first 2-5 seconds yields points for a Toronto opponent. These are live ball turnovers that turn into fast-break layups and threes. In fact, opponents are shooting 41-for-55 for two-point field goals after a live-ball Toronto turnover.

On the flip side, the Raptors perform a little weaker in transition than their opponents. Despite dominating the turnover battle, the Raptors have a lowly 90.9 offensive rating when they create a dead ball turnover on defense. Much of this is due to the slower pace of play the Raptors play at after a dead ball turnover, compared to their counterparts.

Despite the Raptors ending up with an average possession duration 14.6 seconds, the probability of a possessions taking longer than their counterpart is close to 60%. This is due to a significant bump at 1-2 seconds due to fouling for free throws (“Hack-a-Player”). Therefore we tend to expect, after a dead ball turnover, the Raptors take approximately 15.2 seconds per possession compared to 12.9 seconds of their opponents.

If we overlay the (relative) points scored, we obtain a slightly different picture than their opponents:

As the Milwaukee Bucks and Toronto Raptors are leading the playoffs in Defensive Rating, the teams could not be any more different in approaches to their defense. The Bucks dominate the glass on the defensive end, limiting opponents to only 16.4% OREB%. For the roughly 60% of misses an opponent take in the course of a game [which is approximately 55 misses a game], their opponents are lucky to see more than **NINE **second chance opportunities a game. Similarly, the Bucks play Wisconsin-brand basketball by limiting fouling on field goal attempts; settling in third for the post-season with a .194 free throw rate. In comparison, the Raptors are at 22.7% OREB% and .233 FTr. Playing the point-value game, we would find the Bucks to be 3-4 point favorite based on these stats alone. Combine this with Milwaukee’s +.02 advantage in eFG% (.526 to .507) and the odds stack even more in favor of the Bucks.

It is TOV% where the Raptors are a +3% over the Bucks. Which means they should expect roughly 3 more turnovers a game, which if played as live-ball turnovers, could result in an extra 4-5 points per game. And it’s here that Toronto makes its mark.

Much like the Houston-Golden State series, the Milwaukee – Toronto series is going to be (and is indeed being) dictated by who can control the four factors better. While the teams are evenly aligned point-wise, depending on your viewpoint, either team has a recipe for success: Milwaukee needs to limit turnovers and play their brand of basketball. Toronto needs to continue the defensive effort and focus on keeping Milwaukee out of the paint; thereby reducing each of the Bucks’ effective field goal percentage, attempts at the foul line, and chances at offensive rebounding.

Of course, as the Los Angeles Clippers have shown us twice, having hot shooting nights are always a bonus, too. But we can’t count on that to happen consistently. Effectively, one of these teams have to blink.

So far it has been Toronto.

Over the first three games of the Eastern Conference Finals, Milwaukee has controlled every single Four Factor category. Despite Toronto’s ratcheted defense affecting Milwaukee’s eFG%; Milwaukee has continued to control the glass, and more importantly, **limit turnovers**. Despite Toronto picking up 23 live ball turnovers over three games against Milwaukee, they have only been able to convert them into 29 points (1.26 points per turnover). Compare this to Milwaukee’s 28 live ball turnovers generated off the Toronto offense, and their resulting 40 points (1.43 points per turnover), and the Raptors’ turnover edge has been effectively eradicated this series.

Only in Game Three has Toronto managed to win any Four Factor category: TOV% and eFG%. By playing their style of defense and managing to knock down the Bucks’ eFG%, the Raptors managed to make it to overtime and wait out a Giannis Antetokounmpo foul-out before taking over and winning the game.

Despite winning the turnover battle in Game Three .130 to .146, Toronto generated 14 points on 11 Live Ball turnovers (1.27 points per turnover) and 7 points on 9 Dead Ball turnovers (0.78 points per turnover). Comparing this to Milwaukee scoring 16 points on 14 Live Ball turnovers (1.14 points per turnover) and 0 points on 3 Dead Ball turnovers, we see Toronto eked out only a four point advantage over the number one seed.

Compare this to Milwaukee’s 9 points over 6 Live Ball turnovers and 10 points over 8 Dead Ball turnovers, and this can be seen as a marked improvement for the Raptors transition defense on turnovers between Games Two and Three; despite only getting this game to overtime.

Good defenses take away scoring chances from opponents. Defensive rebounds erase an opponent’s chances at Second-Chance points. Turnovers tend to take away those field goal attempts in the first place. However, when a turnover occurs, chaos ensues.

Some teams race down the court to capitalize on defenses attempting to sort themselves out. Some teams use the transition to work into their rhythm and start their offense with less pressure. Some teams just simply overthink, either taking a low quality field goal attempt of turning the ball over.

It is clear that live ball turnovers are much more detrimental to a team than dead ball turnovers. We also see it’s a way to significantly increase the pace of the game while increasing offensive rating; as we’ve seen possessions run at average 7-10 seconds faster than normal possessions with offensive ratings of 120-140 points.

Teams can thrive on transitioning the turnover. It’s a great equalizer. But only if you can generate the live ball turnover and transition it well.

]]>Two years ago, I posted a basic algorithm that counts every probability of every pick without any trades. This algorithm is able to easily recreate the table we find in Wikipedia, and other sites, when it comes to finding a probability matrix for teams:

Using our aforementioned post, I was able to reconstruct the entire draft lottery algorithm and produce this table within five minutes. Sweet! The code still works! However, these are not the true probabilities for each team thanks to trades made over the previous years. Therefore, other tables that we find on sites like ESPN, HoopsRumors, and even Wikipedia post the incorrect probabilities:

In all cases the trades were either hyperlinked or stuffed within text, forcing the reader to search for context. This season, the trades are rather tame as there are no “pick swap” trades: trades where a team gets the “better” of two picks, contained within the lottery. The closest we get is the Sacramento to Philadelphia/Boston pick swap. Due to this tameness, teams effectively **trade probabilities**. So we can give a pass to the Sacramento Kings having a 1% chance of obtaining the first pick. In reality, it’s **zero **as Philadelphia owns their number one pick.

This is okay, but it requires the reader to search.

But what about Atlanta? Atlanta actually has a **47.02% chance of obtaining the 9th overall pick**. That’s thanks to the Trae Young – Luka Doncic draft night deal. And while Dallas has asterisks next to their odds, it’s Atlanta that doesn’t have any indication.

Similarly, Boston has **two trades** lingering in the draft. They have interesting probabilities floating about the table as well. But that’s not readily apparent either. So let’s incorporate the trades and then update this table. Thanks to Real GM, we are able to turn these trades into code.

From the draft night trade in the 2018 Draft, the Atlanta Hawks managed to move down in the draft in order to allow Dallas to guarantee the rights to Luka Doncic. In order to complete this trade and incentivize Atlanta moving down in the lottery, Atlanta gained a pick-protected lottery pick for this season. That is, if Dallas falls between the 6th and 14th picks, Atlanta gains the Mavericks’ lottery pick. We can represent this code (using the variables from our previous lottery odds post) as:

As a reminder: **remainingProbs** is a **fixed-draw double array** that simply aligns the teams that were not selected in the first four picks. There are a total of ten of these positions: picks 5 through 14. Since pick 5 is protected, we count the last nine spots.

On January 12, 2015 a three-team trade involving five players and three draft picks took place between the Boston Celtics, Memphis Grizzlies, and New Orleans Pelicans. In this trade, Memphis sent Tayshaun Prince to Boston and Quincy Pondexter to New Orleans. In return, New Orleans sent Russ Smith and a traded player exception to Memphis and Boston sent Jeff Green (the centerpiece of the deal) to Memphis. In the process, Boston also obtained Austin Rivers from New Orleans.

To soften the loss of Green, Memphis included a protected future first round pick to Boston. Similarly, to help address the loss of Rivers from New Orleans Memphis included a second round pick to the Pelicans. This season, that first round pick comes into play as Memphis is slotted as the 8 team; with highest probability of **keeping their pick**. Despite this, Boston still has a significant chance of nabbing the Memphis pick, provided Memphis hits that unlucky **42.6% chance of getting the 9th, 10th, or 11th pick in the draft**.

Due to the straightforward nature of the trade, we can easily code this as:

Known as the “Stauskas Trade” back on July 9, 2015, the Sacramento Kings shipped Nik Stauskas, Carl Landry, Jason Thompson, and two future first round picks for the rights to Arturas Gudaitis and Luka Mitrovic. The move for the Kings was essentially to clear cap space for the 2015-16 NBA season in an attempt to Rajon Rondo, Marco Belinelli, and Kosta Koufos. For the future first round draft picks, a series of pick protections were placed on the 2017 and 2018 draft picks. If those protections were satisfied for Sacramento, then Sacramento’s 2019 first round draft pick went to Philadelphia.

In those years, theKings managed to keep their picks.

Despite this…

…on June 19th, 2017 the Philadelphia 76ers traded their rights to Sacramento’s 2019 first round pick to the Boston Celtics when they made the move from 3rd in the 2017 draft to 1st. It was part of a conditional trade where Boston gained the 2019 Sacramento pick as long as the Los Angeles Lakers’ 2018 Draft pick landed between 2nd and 5th. That draft pick landed 10th and Boston become owner of Sacramento’s 2019 Draft Pick, protected as number one.

To this end, we code this trade as:

Applying these trades as a Python script, we are able to generate the probabilities for every team in the draft of obtaining a lottery pick:

Here, we see Sacramento is completely wiped off the map. Here we also see the updated probabilities for Atlanta as well as the illustrated potential of Memphis possibly losing their pick.

This year is a relatively straightforward year when it comes to lottery trades. But at least know how to handle them within our code, as we can visually see everyone’s probabilities. Come May 14th, you now know the true probabilities for your team.

Over the recent year or so, I’ve been touched upon by two NBA Analytics team Directors about this particular problem: constructing NBA lottery probabilities. The reason is this: Both teams used this problem as an applicant test problem to better understand the applicant’s thought process and coding capabilities. In both instances, reviewers noticed an all-too eery duplication in vastly different applicants. The reason? **Code was copied here and passed off as their own**. Both times I was given evidence. Not cool.

The purpose of this site is to introduce concepts and some basic coding principles to help folks learn **the basics**. Posts with code are meant for folks with remedial-or-beginner capabilities in coding to give them a nudge in testing out ideas on their own. Posts without code are for the more sophisticated readers to understand the thought process and theory; even to just open a small discussion.

However, if this trend continues, the amount of code that appears on the site or becomes available by other means will start to disappear rapidly. So, for benefit of the people that enjoy this site, just **be cool** and **do it on your own**.

So let’s break down what makes a -3.5 rating…

Recall that net rating is calculated by

This is just the difference of offensive and defensive ratings. This is merely a linear stretching of **points per possession** to per 100 possessions, to give the effect of **if these players played a whole game at this uniform consistency**. And that’s okay; it’s mainly there for readers to digest the information in an easier manner.

Rarely does a **rotation** play more of one type of possession over another; particularly within a four game series. For starters, we typically see three-to-four **stints** per game for a starting rotation. Rake that over 4 games, and we expect the starters to play **12-16 stints**. Therefore at its worst, possession difference is would be 32 possessions. In reality, its much closer to zero.

Using these facts, we can begin to construct what a -3.5 rating really means: a differential of **-.035 points per possession**. What does this number actually mean? This actually means **every 28 possessions played, the Boston starters needed and extra offensive possession to match what their defense was giving up**. Does this mean the Boston starters were outscored? Without extra information, possibly.

**Example: **Boston starters have 114 offensive possessions to Indiana’s 109 offensive possessions with a final score of 110 – 109 leads to the starters outscoring their competition while maintaining a **-3.5 net rating**.

While this may not be the reality of the Boston starters; the discussion here is to not fall into the trap of **comparing ratings without context**.

A bigger challenge with ratings is the **randomness** of it all. Over the past couple years, different methods of **smoothing** have been used to reduce the noise in ratings. One of the most-used forms is **luck-adjusted rating**. Even this is just a regression methodology at the **zeroth-order level **with a little first-order effects mixed in. Other models such as **Adjusted Plus-Minus** and all of its various add-ons/follow-ons/hierarchical or Bayesian updates/etc. are again just regression methods applied at the **first-order level**. Interaction methods developed by guys like myself or a couple of my past collaborators (and teams) are still just again regression methods applied at the **higher-order levels**. The point is, every single of of these methods treat stints as observations and then apply the smoothing at the response level. Every single one of the methods above are a marked improvement over citing raw net ratings but even they too fail at understanding the randomness of an actual stint.

Let’s take a deep look at a single stint from the Boston-Indiana series.

At the start of game three, the Celtics lit up the floor by scoring on 12 of their first 18 possessions to race out to a 29-18 lead. Buoyed by five three point field goals, Boston maintained an offensive rating of **161.11** for their first stint. In contrast, the Pacers spent half their possessions turning the ball over through bad passes and missed field goals, only converting 44% of their possessions into field goals en route to 18 points; an offensive rating of **100.00**. The differential suggests that the Celtics had a net rating of 61.11; indicating the starters were vastly superior to their opponents. A little troubling for a teams that ended up with a **-3.5 **when all was said and done.

When all was said and done, the distribution of points per possession are given as

- Boston Celtics
**0 points:**6 possessions**1 point:**0 possessions**2 points:**7 possessions**3 points:**5 possessions

- Indiana Pacers
**0 points:**9 possessions**1 point:**1 possession**2 points:**7 possessions**3 points:**1 possession

Let’s play a little game with this “training data.”

By supposing the distribution of points scored per possession are given above by the Celtics-Pacers stint, we can simulate the 18 possession stint over and over to understand the randomness of the data. Of course, we assume there is noise on the above data, so we will apply a basic Bayesian filter for multinomial data. Furthermore, we **won’t even apply luck adjustments **to bias everything we can towards Boston.

**The idea here is to look at a net rating an understand, given the randomness of scoring, how noisy that rating really is.**

Here, we apply a simple algorithm that samples the distribution of points scored from the multinomial-Dirichlet model trained by the Celtics’ +61.11 net rating.

p1 = [0.3182, 0.0455, 0.3636, 0.2727] p2 = [0.4545, 0.0909, 0.3636, 0.0909] scores1 = [] scores2 = [] ratings1 = [] ratings2 = [] netRatings = [] wins = 0. Games = 1000000 for i in range(Games): # Simulate Team 1 score1 = 0. for j in range(18): r1 = random.random() if r1 < p1[0]: # No Points Scored. continue elif r1 < (p1[0]+p1[0]): score1 += 1. elif r1 < (1. - p1[3]): score1 += 2. else: score1 += 3. # Simulate Team 2 score2 = 0. for j in range(18): r2 = random.random() if r2 < p2[0]: # No Points Scored. continue elif r2 < (p2[0]+p2[0]): score2 += 1. elif r2 score2: wins += 1.

Running the simulation, we see that even with this absurd differential, **the Pacers are expected to win more than 5% of these stints! **The probability of a Pacers win under these scoring distributions are **5.2%. **Now this doesn’t mean that when Boston posts up a +61.11 net rating, the Pacers will win 5% of the time. This means **when Boston plays like a +61.11 net rating team, the Pacers are still expected to win more than 5% of the time**.

Therefore, the net rating doesn’t indicate that Boston is 61 points better, it’s merely a **symptom** of whatever the true net rating is. In fact, let’s take a look at the distribution of offensive ratings:

We see there is significant overlap in the two distributions. In fact, to illustrate the symptom effect described above, Indiana played at **72.7 offensive rating** but yet they latched onto a 100.00 offensive rating. Similarly, Boston’s distribution of scoring reflects a **131.84 offensive rating** despite the 161 that was posted. What this shows is, the teams are symptomatic of “**luck.**”

**(Note: **For those who are fully aware of statistical analysis and resulting **continuity correction **being applied by the Dirichlet-Multinomial model above, luck is being defined as points over/under expectation, inflated at small probability regions. In this case, it’s free throws and three point field goals; hence the drops just noted.**)**

The more important takeaway is that the style of play from Boston led to a **larger variance** in play. That is, their ratings have a standard deviation of **28 points**. Compare this to the Pacer’s much smaller **20 points**, and we see that ratings follow a **heteroskedastic process**.

With that in mind, we can look at the net ratings for the Boston starters:

What ends up happening is the phenomenon that beats up most regression analyses on ratings: **skewness**. Here, we can actually see the skewness as the distribution is left-tailed. In fact, due to randomness we see that the game **with a given true net rating of +61.11** could **produce a net rating of** **-100**.

The point here is, a **-3.5 net rating **is relatively meaningless. It’s just another descriptive number that needs **a lot more context**. Negative net ratings still produce wins. That’s a problem when trying to understand how well a unit works together.

Furthermore, even if a very high net rating is used as truth, we can still get wildly varying net ratings.

In fact, a former Sloan presenter one told me that **“Six possessions are enough to invoke Central Limit Theorem**” which I’ve never seen as true. Above is yet another example where we even triple the size and still get a heavy skewness in the results using the tests derived from Columbia University , skewness for this sample is** strong **with **p-value **4.38 x 10(-29) for **one million samples**.

Lastly, ratings are heteroskedastic. Meaning every regression model poorly reduces noise if heteroskadasticity is not taken into account.

More importantly, the argument is to identify the **ratings **are **symptoms** of other phenomenon. Instead, we should focus on **transactional interactions** such as **actions and scenarios that feed into points per possession from possession to possession**. This isn’t to suggest using a singular point per possession, but rather develop an artifical-intelligence-based approach to **understanding the decision making process of a collective unit** **given the state of the gaming system**.

Currently, several teams are approaching this venture. Some is developed on play-to-play analysis such as live and dead ball turnovers thanks to Mike Beuoy and Seth Partnow. Some is developed by tracking such as trying to quantify actions as competing risk models thanks to Dan Cervone. These are just a handful of examples in existence, and even then they struggle to maintain fidelity to the game; a fact of the ever changing landscape of how points are scored.

Until we are able to represent the **stochastic partial differential equation** that defines basketball, we are left nibbling at its edges with summary statistics, regression models, and partial “solutions.” And that’s okay for now.

Just remember that a 61.11 positive net rating match-up is expected to lose over 5% of the time.

]]>

However, for the uninitiated, scoring is not just simply putting the ball in the basket. It’s about getting your team to convert as many points per possession as possible. It’s a reason **Steve Nash** was a consecutive year MVP. It’s a reason why many people remember **Magic Johnson** as an elite scorer; he really wasn’t as only sub-20-PPG player for most of his career. And if you’ve been paying attention this year, it’s also why **Ben Simmons** and **Giannis Antetokounmpo **are such scoring threats despite rarely (or never) making a three point attempt during the season.

Once we plot their **spatial scoring distributions**, we immediately see the scoring value of a player by looking beyond “where they make their shots.” From here, we look into how many points a player **contributes** to their team within a game. At a cursory look, we take a step back from “deeper” such as **credit sharing **techniques that Dean Oliver or I have shared in the past and simply look at the accumulated points from points and assists within a game.

**Note: **By looking at **points + assisted points** we will “score” more points than a team does in a game. In this, we try to take a step back from **point splitting**.

In something I have called **points responsible for (PRF)** since the mid-80’s, we simply add the points scored and assisted points scored. The reason for such a term is from my days as a kid watching the **Showtime Lakers** with **Chick Hearn** and **Stu Lantz** on the television commentating the game. During one game, Magic Johnson had contributed to a series of baskets, causing Chick Hearn to say (as I most likely faulty remember) **“Another basket by Magic. He’s contributed to [x-amount] points over their last [y-points scored].”** And it was his points scored and assisted points. Ever since then I, in a nuanced fashion and to confused response, would tell people in games how many points they contributed.

By counting the contribution of a player through PRF, we start to understand how many points a player is really scoring. Consider a simpler, poor-man’s version of **offensive** **rating**. And in keeping with traditional statistics, such as points, we can start to look at the same question we introduced at the beginning of this article: **who are the greatest single game scorers?**

Recall that since the introduction of the three point line nearly 40 years ago, not all assists are created equal. To this end, we cannot simply count how many assists a player has and multiply it by a constant. **Pre-1980 this is easy! **Post 1979 this is much more difficult.

Using play-by-play, this is rather simple. We can look at every basket made and look at the assist tag. Using Python, we can simply plow through each action in play-by-play and hold results in dictionary.

for index,row in df.iterrows(): if row['event_type'] == 'shot': player = row['player']+','+row['team'] if player not in playersTemp: playersTemp[player] = [0.,0.,0.,0.,0.] if np.sum(playersTemp[player]) > 0.: print 'MASSIVE ERROR: PLAYERS TEMP FILE CAME INTO THE GAME NONEMPTY!!!', player, playersTemp[player] if row['points'] > 2.: # Three Point FG Made assistMan = str(row['assist'])+','+row['team'] if assistMan not in playersTemp: playersTemp[assistMan] = [0.,0.,0.,0.,0.] if assistMan in playersTemp: playersTemp[assistMan] = [x+y for x,y in zip(playersTemp[assistMan],[0.,0.,0.,0.,3.])] playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[0.,3.,0.,0.,0.])] if assistMan not in playerPTS: playerPTS[assistMan] = row['points'] else: playerPTS[assistMan] += row['points'] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[0.,3.,0.,0.,0.])] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) elif row['points'] > 1.: # Two Point FG Made assistMan = str(row['assist'])+','+row['team'] if assistMan not in playersTemp: playersTemp[assistMan] = [0.,0.,0.,0.,0.] if assistMan in playersTemp: playersTemp[assistMan] = [x+y for x,y in zip(playersTemp[assistMan],[0.,0.,0.,2.,0.])] playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[2.,0.,0.,0.,0.])] if assistMan not in playerPTS: playerPTS[assistMan] = row['points'] else: playerPTS[assistMan] += row['points'] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[2.,0.,0.,0.,0.])] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) elif float(row['points']) > 0.: # Free Throw Made print 'FREE THROW MADE' playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[0.,0.,1.,0.,0.])] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) if row['event_type'] == 'free throw': player = str(row['player'])+','+str(row['team']) if player not in playersTemp: playersTemp[player] = [0.,0.,0.,0.,0.] if float(row['points']) > 0.: # Free Throw Made playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[0.,0.,1.,0.,0.])] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS)

And to this end, we can easily identify how much PRF Kobe had in his 81 point game and Devin had in his 70 point game.

**Kobe Bryant: 86 PRF**- 81 points
- 42 points on 2PM
- 21 points on 3PM
- 18 points on FTM

- 5 assisted points
- 2 points on A-2PM
- 3 points on A-3PM

- 81 points

**Devin Booker: 83 PRF**- 70 points
- 34 points on 2PM
- 12 points on 3PM
- 24 points on FTM

- 13 assisted points
- 10 points on A-2PM
- 3 points on A-3PM

- 70 points

And we see that both players contributed to slightly over 80 points. We also see that Kobe Bryant contributed to over **seventy percent **of his team’s scoring in the Toronto game. To this end, we can show Kobe’s scoring chart:

In the play-by-play era, both Kobe and Devin are the points “darlings” of the league. But who has the **highest PRF? **Not **Russell Westbrook** or **LeBron James**. Instead it’s…

Since 2004, James Harden has posted some of the highest PRF games. While Kobe Bryant has constructed an 86 point effort, James Harden is the **only player in the play-by-play era **to wrangle **multiple 90-point games**.

On December 31, 2016 James Harden posted one of the singly epic games in the history of the league with a **53 point, 42 assisted points **game in a 129-122 win over the New York Knicks. This resulted in Harden having a hand in **over 73% of the Rockets points**.

Just under a year later, Harden posted the second-best PRF total in the play-by-play era with a 91 point effort over the Utah Jazz on November 5, 2017. During this game, Harden dropped **56 points** while contributing to **45 points** via the assist. With a 137-110 victory over the Jazz, Harden contributed to **two-thirds of the team’s** **points**.

However, before play-by-play we have the added challenge of attempting to figure out how many three’s were assisted in the seasons between 1980 and the play-by-play years. To this end, we abandon **Python **in favor of **R** and a package developed by **Alex Bresler**, called **nbastatR**. Using nbastatR, we are able to leverage the NBA API to pull **game logs** from every season dating back to 1947.

for (season in seq(1947,1997)){ if (season < 1980){ multiplier <- 2.0 }else{ multiplier <- 3.0 # We set this multiplier at 3 because it's maximum possible. # There's no play-by-play being leveraged here. } print(season) gamesPlayedByPlayer <- game_logs(seasons=season,league="NBA",result_types="player",season_types="Regular Season") print('made it here!') if ("ast" %in% colnames(gamesPlayedByPlayer)){ cat("YES! ASSISTS ARE HERE!\n") }else{ cat("NAH.... ASSISTS ARE MISSING!\n") } prf <- multiplier*gamesPlayedByPlayer$ast + gamesPlayedByPlayer$pts highPRF 79. highPRFindices <- which(highPRF) print(highPRFindices) }

The above code identifies true PRF for all players pre-1980 and provides an **upper bound** for all players between the 1980 NBA season and the play-by-play era. One challenge we run into is that the NBA **does not post assist totals for several seasons…**

Here we see that the NBA stats don’t have assist totals for several games. In fact, there are 14 seasons missing assists entirely; including **Bob Cousy’s then record for most assists in a game **(28). Similarly, for seasons that do report assists, some games have no assists included, such as **Wilt Chamberlain’s ** December 8th, 1961 game. That’s alright, we can scan through **Basketball Reference** using the listing missing seasons. For these seasons, it’s straight-forward: walk through all game logs and compute **PTS + 2*AST**. Afterall, there’s no three point line back then.

Finally, we have to identify non-play-by-play assisted point totals. First we **assume all assists are three point assists**. By doing this, we provide an upper bound for all PRF totals. If no player hits a threshhold we like, the player game is dropped. From there, we can whittle down by looking at the **box score** and computing the value **diff3AST** and **diff2AST** which are created off the **difference** of the **player scoring a three** and the **team scoring a three**. This becomes encapsulated in the code block:

if (season > 1979){ print(index) print(gamesPlayedByPlayer[index,]$namePlayer) print(gamesPlayedByPlayer[index,]$idGame) boxScores <- box_scores(gamesPlayedByPlayer[index,]$idGame,box_score_types = c("Traditional")) for (teamNum in seq(1,2)){ if (boxScores$dataBoxScore[[2]][teamNum,]$slugTeam == gamesPlayedByPlayer[index,]$slugTeam){ tot3PossAST <- boxScores$dataBoxScore[[2]][teamNum,]$fg3m - gamesPlayedByPlayer[index,]$fg3m diff2AST <- gamesPlayedByPlayer[index,]$ast - tot3PossAST print(diff2AST) if (diff2AST < 0.){ diff2AST <- 0. diff3AST <- gamesPlayedByPlayer[index,]$ast }else{ diff3AST <- gamesPlayedByPlayer[index,]$ast - diff2AST } print("Diff 3AST") print(diff3AST) print ("Diff 2AST") print (diff2AST) print(gamesPlayedByPlayer[index,]$ast) astPTS <- 3.*diff3AST + 2.*diff2AST prf <- astPTS + gamesPlayedByPlayer[index,]$pts print(prf) } }

The idea is simple. Suppose a player has **10 assists **and we are interested in an 80 point threshhold. Similarly, they made **2** **3PA** while their team has made **10 3PA**. Then the player **cannot assist more than 8 3PA! **Suppose the player scored 55 points. Then their maximum possible PRF is 55 + 28 for 83 points. If this happens, then we must scan the game footage (if it exists) to see where all the three’s happened. If **four are not assisted by our player**, they fall off the list.

Similarly, if that player only had 51 points, their maximum PRF is 79 and they cannot be a member of the **80-Point Club**.

In this case, as seen by the R-code above, we also leverage nbastatR’s **boxscore **function.

Since 80 points is rare air when it comes to personally scoring 80 points (Wilt and Kobe), we consider 80 points for PRF as the threshhold. In this case, we obtain **five** questionable players when it comes to **three-point differential** in calculating PRF. These five players:

- Scott Skiles
- December 30, 1990 vs. Denver
- 22 points scored, 30 assists, 2 possible 3P-AST
- Guaranteed 82 PRF, maximium possible 84 PRF

- Michael Jordan
- March 28, 1990 vs. Cleveland
- 69 points scored, 6 assists, 1 possible 3P-AST
- Guaranteed 81 PRF, maximum possible 82 PRF

- David Robinson
- April 24, 1994 vs. LA Clippers
- 71 points scored, 5 assists, 1 possible 3P-AST
- Guaranteed 81 PRF, maximum possible 82 PRF

- Tim Hardaway
- April 25, 1993 vs. Seattle
- 41 points scored, 18 assists, 3 possible 3P-AST
- Guaranteed 77 PRF, maximum possible 80 PRF

- Jason Kidd
- February 2nd, 1996 vs. Utah
- 20 points scored, 25 assists, 11 possible 3P-AST
- Guaranteed 70 PRF, maximum possible 81 PRF

And that’s it. Thank you, nbastatR! We find three players immediately qualify for the 80 Point Club, but we’d like to settle their actual score. And we have two questionable players, that would barely crack the bottom of the club. To find thruth, we seek **game footage**.

Skile’s 30 assist game is not only a record setter for the then-lowly Orlando Magic, it also gave Skiles’ entry into the 80 Point Club. Thanks to YouTube:

We can view every single one of Skiles’ assists. In the game, there are only two possible assists and Skiles indeed picks one of the up, to **Dennis Scott**, securing him with **83 PRF.**

Jordan’s highest scoring game came with one possible assist for three. That would be to **Charles Davis**. Thanks again to YouTube, we are able to see Charles’ and Chicago’s only non-Michael Jordan three point field goal:

And we see it was assisted to him by **Stacey King**. Therefore Jordan remains put at **81 PRF**.

In a double-overtime affair against the Utah Jazz, Jason Kidd set a personal best early in his career with 25 assists. With Dallas hitting a then-remarkable 14 three point field goals on a television-announcer-flabberghasting 32 attempts, Kidd was on the hook for 11 possible three point attempts; as he made 3 of the 14. Thanks once again to YouTube:

We see that Jason Kidd assisted effectively all but three three point attempts: **David** **Wood **hit two three point attempts, both with Kidd on the court; but both assisted by **Jim Jackson**. Also, **George McCloud** razzle-dazzled for a step-back three in overtime to knock Kidd well below the 80 Point Club limit.

Unfortunately, we could not find footage for both the David Robinson and Tim Hardaway games. In Robinson’s case, there’s only one missing three point field goal, taken by 80 Point Club flirting **Sleepy Floyd (had several PRF > 70 games in the 1980’s)**.

Similarly, Tim Hardaway’s game on April 25, 1993 against the Seattle Supersonics appears to be a mystery. Despite this, there are a total of **three 3PM** that could have been assisted by Hardaway; all belonging to **Latrell** **Sprewell**. To this end, Hardaway sits at 77 Points and will not be included into the 80 Point Club until proper verification is provided.

Finally, onto the club:

Since 1947, there have been only **31 confirmed cases** of players hitting 80+ PRF in a game. Despite having thirty cases, there are only **20 players **who have reached the 80 PRF mark. As indicated earlier, the king of PRF is **James Harden**, who has hit this threshhold **6 times!**

The next frequent player on the list is **Oscar Robertson **and **Russell Westbrook** with **three entries**. A curious note about Westbrook is that he is **the only player to hit 80 PRF in a playoff game. **Only one. Ever.

If you’re keeping track, this means there are only **two players **remaining with **two ****80+ PRF** games. Those two players are **Wilt Chamberlain** and **LeBron James**. That’s it.

More strikingly, Magic Johnson never reached 80 PRF despite repeatedly hitting 75 throughout the 1980’s. Larry Bird? Not close. Michael Jordan? Once. And he needed 69 points. Kobe Bryant? Same, but with 81 points.

To date, here’s the list of verified 80 Point Club Games:

By looking at the list, we see a couple reflections. First, we see the “high scoring”-“low scoring”-“high scoring” phenomenon of the league between the early years through the muddling 80’s and 90’s to today’s three point revolution. Second, we see the exponential increase due to the three point revolution in the league through its stars. Graphically, these reflections are viewed this way:

Using the graph, we see **Oscar, Wilt, and Cousy** dominating the 60’s. Throughout the 70’s and early 90’s we have special cases by singular players: Jordan’s 69 point game with Skiles’ 30 assist game for the 90s’; Pistol Pete, Rick Barry, and Nate Archibald buoying the 70’s with their singular performances.

We also see the death of the 80 Point Club in the late 90’s and early 2000’s as games ground to low scoring affairs. Then, as the three point revolution has taken off over the last 4-5 years, we see the number of 80 Point Games explode; thanks primarily in part to Harden and Westbrook.

As players continue to adapt to the three point line, we will start to see this list expand and eventually have to ditch the 80 Point Club in favor of a more exclusive club. Possibly a 100 Point Club? If so… how soon will it be until we have three members?

]]>To start, let’s first identify how points are scored. In a singular game only **free throws**, **two-point field goals**, and **three-points field goals** can score points. Therefore the basic points model is given by

On offense, our goal is to increase **all three** categories. Conversely, on defense, our goal is to reduce **all three categories**. Using this formula for points scored leads to uninspiring models. Lots of information is left on the cutting room floor. However, we identify the **Madden Equation**

For the uninitiated, this equation is read as **a win equals a situation where an offense scores more than a** **defense**. In other words, “**The team who scores more than their opponent usually** **wins**.”

Ultimately, we become interested in increasing our number of FTM, 2PM, 3PM while decreasing our opponents’ FTM, 2PM, and 3PM. Or do we?

Nearly a decade ago, Kirk Goldsberry (then of Grantland fame) wrote about the **frequency **and **efficiency **of teams and players. It wasn’t a new idea, but it was one of the first times it was explicitly put into practice. While phrases such as “limiting opponents attempts” has been around for decades, the points model could be explicitly written as

The trick here is that we multiplied by one and applied the commutative law of products. Viola! **Frequency** and **Efficiency**.

Now we can interpret the points model as gaining or limiting attempts (**frequency**) while increasing or decreasing field goal or free throw percentages (**efficiency**).

While this is a much more helpful “model” when it comes to understanding points, it’s still light on intelligence. Thankfully, there are many ways to branch from here. Let’s start with a little bit of **Four Factors.**

In 2002, Dean Oliver introduced the idea of the **Four Factors** to the world. Within the four factors, **Effective Field Goal Percentage** and **Free Throw Rate** were identified as key components for understanding team success. We can obtain the relationship of these to **points scored** by again introducing the multiply by one trick:

In this case, we obtain points as a **multiplier **to FGA. There is no marked difference between this model and the Goldsberry model above, other than the follow along analysis, which we will eventually get to below. Before that, another few representations of the model.

The zonal model builds off of the Goldsberry model above by introducing **spatial components **into the attempts. Using this model, we determine ahead of time which locations, called **bins**, we would like to aggregate attempts into. The traditional model used **seven** **regions: **2 corner three slots, rim, paint, non-paint 2-pt FGA, and above-the-break left / right 3PA. As of this morning, NBA stats uses a total of **fifteen** spatial locations:

Writing out the model for this becomes tedious. Instead, we use the **summation notation** to help condense the points scored model. We label **S_2** as zones within the 2-point range and **S_3 **as zones within the 3-point range. By using the notation **“i in S_2”** we are simply just taking **bin “i”** from **S_2**. Understanding this, we have the zonal model of

From this model, we can start to ask **where** players are taking (and making) their field goal attempts. Using this block model, we can discretize field goal locations. It’s simple to understand and helps to quickly tease out locations.

Building off of the spatial model, instead of building a discrete binning structure, if we apply **the counting measure**, we obtain the traditional shot chart. The chart is a “well, duh” but the mathematics behind it is much more complicated, but allows us to do some fairly wild analysis. Instead of making this a PhD level analysis, we will use the term counting measure to **illustrate **field goal attempts. In reality, we are using the **Lebesgue measure** and this allows us to immediately develop **random process models**, which are leveraged near-explicitly in Expected Point Values (EPV) and what I’ve called my **“Left-Of-Boom” **Model from 2015.

The idea here is that every field goal attempt is part of a **spatial point process** that occurs with some **random structure**. In doing this, we divide the court into spatial regions again, this time using **D_2 **for the region of two point attempts and **D_3** as the region of three point attempts. The model is then written as

Using the two integrals allows us to plot this attempts such as these for **Trae Young, Giannis Antetokounmpo, **and **Ben Simmons**.

Immediately, we see that Trae Young is primarily an above-the-break shooter with a tendency to kick to the corners. Similarly, Giannis Antetokounmpo is a left-wing three point shooter, but will primarily attack the rim. Finally, Ben Simmons is not a three point shooter. He gets a **ton** of assists out there, but he’s primarily a paint player.

From here, we then would model points as a process model. The benefits of this is that we can get surgical with our analysis. The drawbacks is that we need **lots **of data; which we rarely have.

While we’ve primarily focused on **counting** the number of attempts, we also would like to look at **Per Possession Models**. The easiest way to limit points for a team on defense is to reduce the number of possessions. However, looking at the **Madden Model** above, limiting possessions, while indeed reducing points, does not necessarily translate to wins; as the number of offensive possessions decrease as well. Instead we may be more interested in **points per possession**, or (effectively) **ratings**; that is, points per 100 possessions. Since games tend to hover near the 100 possession mark, we will focus on a **ratings model** for scoring.

To build a rating, we simply count the number of possessions and dive that into the total number of points scored. To make the number more palatable, we then multiply this total by 100, obtaining what we call a **rating**. Applying this to the points model, we obtain

This can be used in the Goldsberry Frequency-Efficiency model as well:

And immediately, we focus on the likelihood of getting at least one 2PA or 3PA **within a single possession**. More importantly, we begin to introduce, through possession counting, non-scoring factors such as **turnovers** and **missed FGA/FTA with defensive rebounds**. These are situations with potentially **zero ****points** scored on a possession. In fact, this leads us to the last model we will touch on: the **zero point model**.

The zero point model includes **zero point possessions**. It’s implicitly defined in all the above models, but we intentionally left it out to build up the thought process in modeling points scored. In this model, we simply introduce the **zero point**:

Notice we put an **X** in the model. This is why we left out the zero-points in all the above models. The challenge that arises is placing the appropriate features into **X**. Since **X** is flattened by zero, interpretation of the feature is lost; and hence why a **ratings model** becomes much more favorable. Initially, we can think of the obvious**: Turnovers**. On a similar accord, **defensive** **rebounds**, would lead to zero points for the action.

It is important to note that points can still be scored within a possession that contains a turnover or defensive rebound. Despite this, all of the models above partition these instances into points scored prior to a defensive rebound or turnover.

While these two values are obvious, we are missing **much much more** when it comes to analysis of points. And it’s here where we actually **put action into the term MODEL**.

The phrase model simply means **description of a system**. In the above, we described points as a function of field goals, turnovers, and defensive rebounds. In reality, the **system** of the game of basketball is much more. And it’s impact is important on points scored. For instance, what’s the value of Andre Drummond and Steven Adams to their respective teams? **They are big on offensive rebounds**.

Similarly, what’s the impact of a drive from De’Aaron Fox? From Trae Young? From Lonzo Ball? If we run the Hammer to Patty Mills, how likely is he to score? More importantly, how likely will that shot attempt be available? It’s these questions that begin to drive the components of the models above.

Treating outcomes as observations from a probabilistic model, we can begin to statistically model points scored. Let’s go back to that **counting model**. Let’s treat the Milwaukee Bucks as an opponent. The Bucks still lead the league as of this morning in **percentage of FGA at the rim** with **35.2% of attempts within three feet**. They are second in the league in **percentage of FGA beyond the arc **with **41.8%**. And the Bucks are very effective at the rim (70% – 3rd in the league) despite being pedestrian from beyond the arc (35% – 15th in the league).

Now, we are going to start at a basic level. Anything more would be too complicated for this setting. Let’s just assume that all attempts are generated by some distribution and nothing more. No assists, drives, screens, etc. At the novice level, we’d say FTM, FGM, and TOV all follow a **Poisson **distribution, but that’s if we’re completely disregarding the game of basketball. Instead, let’s **understand the system**.

The **counting vector** is a vector that contains the number of instances of a component of interest. For our most basic scoring model, these might be **FTM, 2PM, 3PM, **and **TOV. **Then we look at the distribution of each over the course of a game. For our opponent, the Milwaukee Bucks, some examples (FTM, 2PM, 3PM, TOV) are:

- 15, 28, 14, 21
- 13, 27, 17, 17
- 17, 28, 17, 14
- 28, 28, 13, 11
- 8, 30, 19, 17
- 21, 31, 10, 21
- 15, 26, 19, 17
- 22, 32, 9, 14
- 30, 24, 22, 12
- 7, 24, 16, 11

These are the numbers from the Bucks’ first 10 games. For all games this season, we end up with a correlation structure that’s not excessively significant, but also not surprising.

Here we see that whenever they take less 3PM, they get more 2PM (correlation -.34) and that turnovers are positively correlated with FTM and 3PM (correlations .03 and .18, respectively). For the 3PM and 2PM relationship, we have a significant enough correlation (p-value < .0005, effect size > 3.0) to suggest that there is an inverse relationship between the two components. Comparing 3PM to turnovers, we see a similar relationship: p-value of .03 and significant effect size of over 3.0. This indicates the Bucks turnover the ball more often when 3-point attempts are not falling.

Similarly, there is a weak inverse relationship between 2PM and turnovers. Here we obtain a p-value of .03 but an effect size of 0.17; which indicates its a weak but potentially existent **positive **relationship.This would indicate that the team is likely to increase their odds in turning the ball over when attacking inside the 3-point line.

Finally, all other relationships are weak-to-nonexistent.

What this exercise emphasizes is that there is a **correlative relationship** between the different mechanical parts of the basic model. This is a big deal, as we **cannot assume independence**. Therefore we model for even the basic model is:

And here we don’t specifically specify the error distribution. Instead, we identify that error is unbiased with some covariance artifact. Therefore, we may be interested in expanding the model to include **interactions**.

Or we try a different approach.

A popular method in basketball analytics is to develop a **conditional** or **hierarchical**** model**. These models assume that quantities FGM or FGA are **sub-targets** that are responses of other basketball characteristics such as passes, drives, “openness” for attempt, defensive pressure. The most common example is the **Shot Quality **model. In this model, we typically (implicitly or explicitly) model FG% based on **minutes played, distance to closest defender, shot location, number of dribbles taken**, etc. In this case, we can write the distribution of points scored as

We can also begin to derive more complicated models, such as EPV, through the counting process model. In these cases, we can begin to start surgically building a hierarchical model that takes the spatial, temporal, and mechanical components of the game and develop a sophisticated model that ultimately goes back to quantifying the **Madden ****Model**.

The resulting coefficients of these hierarchical components help us then identify the **contribution** a player makes within the scoring model. Want to improve scoring when Lonzo Ball is on the court? We can now measure the impact of pick-and-roll offenses that lead to drives; with understanding of personnel on the court.

But be cautious when performing a hierarchical analysis. Small samples will begin to creep in and borrowing strength will become ultimately important. And it’s here where the edge gets to be gained.

]]>