For the sake of argument, let’s suppose that every single red circle in that image is an offensive rebound. Then as one chance is exterminated (through a rebound), a new chance begins in that location; extending the possession. In this case, the shot clock resets to 14 seconds and the play continues. In a post from over a year ago, we looked at where the new chances go after an offensive rebound. You would have seen this Steven Adams plot:

From this Steven Adams plot, you’d see that not only does an offensive rebound create a new chance for the possession, but it creates a new chance for any player on the court. This means a significant amount of second chances generated by a single player are not necessarily put-backs.

But how do we model the number of second chances on a possession? Afterall, there are only so many ways to terminate a chance: **made Field Goal Attempt (no foul), made x-of-x Free Throw, Defensive Rebound, Turnover, End-of-Period*, **and **Offensive Rebound**. There’s only one way to extend a chance. Note the asterisk on end-of-period attempts. Typically, a team rebound to the defense is tagged on heaves that would be rebounded after time has expired. Example from the 2018-19 NBA season:

Therefore, to model the amount of second chances, we can focus on **chaining** the probabilities of obtaining offensive rebounds together to obtain the **expected number of second chances** in a possession. To this end, we will start simple and then raise expectations.

To start simple, we introduce the geometric distribution. To give an illustration, let’s consider a toy example. Suppose that you are at a court taking a series of field goal attempts; aka “shooting” at the park. At the end of the period, you decide to “leave on a make” and take a series of three point attempts until you make one to feel good about yourself leaving the court. Finally, let’s suppose you are realistically a 27% three point shooter. The question is then, how many attempts do we expect you to take until you get to leave the court (somewhat) happy?

Assuming that every shot is independent (effectively saying that the hot-hand fallacy is indeed a fallacy) and identically distributed (every three point attempt has the same probability of going in), then we have what is called the **geometric distribution**.

Illustrating its probabilities are simple:

**Probability of Taking 1 3PA: 27%**

**Probability of Taking 2 3PA: **(1-.27)*.27 =** 19.71%**

**Probability of Taking 3 3PA: **(1-.27)*(1-.27)*.27 = **14.39%**

And so on and so forth…

The way we calculate the probabilities for the number of attempts is to multiply the probability of a miss **until** we hit the final attempt. Note that we always leave (somewhat) happy and therefore the final shot is the only made attempt. Therefore, the probability distribution function is given by

where **p** is the probability of making the three point attempt. Go ahead and stick in .27 for **p**. You will recreate the probabilities above. Here, **X**, is just the random variable: number of three point attempts taken. Theoretically, we could have the worst luck and take infinitely many attempts. Under this case, taking 20 such attempts would amount to .0005 (.05%) probability. Note that **k** in this context is the **number of chances!**

But how many attempts are **expected**? To compute this, we just compute the expected value. However, for the uninitiated, there is a bit of “sorcery” in computing this value:

Using the **p = .27** from above, we obtain an expected **3.70** attempts to closeout the day. So the next time it takes you approximately 2-7 attempts to close out your sessions, congratulations! You’re most likely a 27% shooter.

Applying the geometric distribution to offensive rebounding produces an extra wrinkle. First, not all chances end in rebounds. Recall that offensive rebound percentage (OREB%) is computed as the number of offensive rebounds divided by the number of rebounding opportunities. Hence that **22.9%** for the 2018-19 NBA season is on **rebounds alone**. Therefore, we need to identify the probability a rebound occurs! To this end, we look at the proportion of chances that end in rebounds.

To start, there were a total of 118,396 missed field goal attempts and 6,805 missed final free throw attempts during the 2018-19 NBA season. That accounts for 125,201 potential rebounds. Of which, a total of 25,454 were offensive rebounds and 85,653 were defensive rebounds. This leaves 14094 rebounds unaccounted for (11.45 rebounds per game). These are **team rebounds**. For example, the particular game mentioned above contained 20 such rebounds. Simply comparing offensive to defensive rebounds above, we obtain the 22.9% number used on Basketball-Reference.

Given the number of possible rebounds, we must also account for non-rebound situations. In this case, we have 101062 made field goals and 24803 made final attempted free throws. Note that we also must identify “Plus-1” situations, of which we had 7,419 attempts; 1788 of those missed. Note that these only take away chances!

Therefore, tallying up the numbers: **Field Goals Made** account for 101,062 chances. **Free Throws Made **account for 24,803 chances. **Turnovers **account for 34,644 chances. **Missed Field Goals **account for 118,396 chances. **Missed Final Free Throws **account for 6805 chances. Then we must subtract the **Made “Plus-One” Events **of 5631 chances. Note that this is back of the envelope math; plus-one events are including technical fouls; as the goal here is showing how the model would work.

To this end, we end up this generalized chance ending probability distribution:

**Missed Field Goal:**42.27%**Made Field Goal:**34.07%**Turnover:**12.37%**Free Throw Made:**8.86%**Free Throw Missed:**2.43%

Wow, that actually turned out to be 100% of chances. Note that we eliminated end of period chances as they have no bearing on games; as heaves are included (but insignificant) above.

From all this, the **probability of extending a possession is** **10.24%**. This, of course, assumes the same 22.9% OREB% on free throw attempts, which (except for Steven Adams) we know is not true. Again, goal is to show how the model works.

Applying the **geometric distribution** above, we find that for any random possession we expect **1.114 **second chances. For a game of 100 possessions, this equates to 11 second chance possessions!

For the example above, we saw that on average, we should expect a team to gather 11 second chance opportunities in a game. Unfortunately, not all teams are equal. For instance, the Chicago Bulls and Milwaukee Bucks tended to eschew offensive rebounds last season whereas the Oklahoma City Thunder tended to hunt them down. To their credit, this is due to a combination of positioning, game planning, and luck of the bounce.

However, that 22.9% is completely unrealistic. For example, a single season tends to have approximately 600 rebounds on missed FTA’s in a season. With roughly 6800 missed final free throw attempts, that’s an OREB% of **8.8%**, far below the 22.9%.

Similarly, we can look at the distribution of offensive rebounds off of missed field goal attempts:

Here, we see that shots at the rim push up towards 30-35% for offensive rebounds. The rate dies out dramatically at the midrange (dropping to approximately 12%) before jumping back up to 22-27% around the perimeter. There is a little flare-up of 23% at the 30 foot mark. That’s primarily Damian Lillard’s attempts and Portland reacting accordingly.

Now instead of writing everything out in terms of the geometric distribution, we start with the geometric distribution and swap out the over generalized **p** parameter of offensive rebounding percentage. Instead, we obtain this form of the “geometric” distribution:

This is the exact same form as the old geometric distribution, but instead of **p** we have **p_{j,j+1} **which is the probability of obtaining a rebound for a chance starting in state **j** and ending in state **j+1**.

For instance, let’s start with **p_{0,1}**. This is the **initial chance**. This probability would be dictated by the start of a chance: rebounded miss, turnover into transition, dead ball. A good place to start understanding the impact of initial chances can be found from Seth Partnow of the Athletic. Hence **p_{0,1} **in this context is the probability of terminating a possession and **1 – p_{0,1} **is the probability of obtaining an offensive rebound!

Continuing on, we can look at the probability of terminating a possession after an offensive rebound. This would be **p_{1,2}**. Note again, that **2** here means **second chance**. Furthermore, **p_{2,3} **would indicate another second chance after **2 offensive rebounds**. Again this probability would be identified by its initial position of the rebound starting a new chance (see the Steven Adams plot above for context).

Now, applying the **same** analysis as in the geometric distribution, we find that the Milwaukee Bucks would expect a mere 8 second chance attempts per 100 possessions while the Oklahoma City Thunder would expect 16 second chance attempts! And now we start gaining insight into the baseline model of extending possessions for any given team.

Given this set-up we can now start asking and answering questions such as “how do teams generate second chance opportunities?” and “how do teams utilize their second chances?” down to the “How are we at defending against second chance opportunities?”

For instance, limiting the Thunder to 11 second chance opportunities is a good thing. Against the Chicago Bulls, not so much.

]]>The challenge of offensive rebounding is that there are two antipodal thought processes at war with each other. Do we attempt to extend the possession and give ourselves a second chance, or do we forfeit the second chance opportunity and focus on limiting fast break opportunities?

As a simple proxy, we can look at fast break points against second chance points. While this is not a one-to-one relationship thanks to turnovers, it gives a little insight as to how a team crashes.

Here we see that teams would potentially tend to play a more conservative style. We see that teams such as the New York Knicks lead the league in second chance points per game, while they are 21st in opponent fast break points. This is in part due to two primary factors: the Knicks are currently the one of the worst shooting teams (.482 eFG%, 29th / .422 FG%, 30th) and yet have the best offensive rebound percentage (.263, 1st). This equates to not only more chances at obtaining second chance points, but being effective enough at securing those opportunities.

To put this in perspective, the Knicks have had 167 offensive rebounds against 649 missed field goal attempts.

In contrast, the Toronto Raptors currently post the worst offensive rebounding percentage in the league at .171, through a total of 92 offensive rebounds. While the team does shoot significantly better than the Knicks (.455, 15th), the Raptors still don’t pressure the glass as much. In relation, the Raptors have one of the best opponent points off fast breaks numbers in the league at 11.3 (3rd).

Including the net-zero boundary, we see that the New York Knicks and the Sacramento Kings are hedging towards the “Fantastic” region; despite neither of these teams being all that “Fantastic” so far this season. It’s relatively easy to conclude the simple proxy is not indicative of on-court performance.

For instance, Toronto is an 8-4 team despite having the 4th worst differential (-2.8 points per game) between opponent fast break points and second chance points. In fact, there are a total of 18 teams above the “break-even” line. Again, this is due to some mixing of fast breaks initiated by turnovers.

That said, of the 12 teams in the “net-positive” region, six teams have losing records and six teams have winning records.

But what we start to takeaway from this simple plot is this: good teams don’t necessarily get second chance points. However, we do see a trend that effectively states that as second chances points increase, opponent fastbreak points increase.

Running a terrible linear regression on this data, we find that there is a shoddy R^2 of 0.22 (thanks in part to New Orleans, Memphis, New York, and Sacramento) but also significant positive trend. This is a decent initial first step in investigating the impact of crashing.

As a simple use case, we take a look at a particular play in the Atlanta Hawks versus Los Angeles Lakers game from last night (17 November 2019). In this play, Trae Young uses a screen from Alex Len in an effort to shake Kentavious Caldwell-Pope as he brings up the ball. Caldwell-Pope jumps the screen, allowing Len to rescreen Pope as Young throws a right-left crossover and pulls up to shoot over a moderate contest from JaVale McGee.

As Young releases the ball, three of Young’s teammates are positioned outside the arc, while Bruno Fernando is slightly inside the arc. As Young takes this 25 foot three point attempt, no Atlanta Hawk player is within twenty feet of the basket. The probability of obtaining such a rebound should be considerably low.

Despite this, Len crashes from 28 feet out. Fernando trails and makes it a mere five feet as his man, Anthony Davis, absorbs Len in the paint. Given the position of Young’s three point attempt, the ball is expected to travel between 2 and 8 feet, slightly to the left of the basket. Here, Davis has attempted to nudge Len out of the play and leave the entire paint open to LeBron James and Danny Green. Davis misses the box on Len, as Len hunts out the left-of-basket rebound.

As the ball falls short, both Evan Turner and Allen Crabbe start to trot back on defense. As they fixate on the ball, they’ve failed to notice that Caldwell-Pope has leaked out on the fast break. As James grabs the rebound, Turner has barely cleared the break in transition defense and Crabbe is aligned with McGee, who is still roughly 72 feet from the Lakers’ basket.

It’s not on camera (it is on Second Spectrum) but James notes that Turner is at the three point line, Fernando is still seven feet from half court, as Caldwell-Pope is ten feet beyond half court. James hits Caldwell-Pope with a full-court pass for an easy dunk.

What this use case shows is the coach’s rebounding nightmare. This is not a commonality, but it encapsulates the conscious decision to rebound or retreat. Most times, it’s a simple decision: shots close range will already have rebounders jostling for position. It’s difficult to leak and go unchecked. With 45.1% percent of field goals attempted coming within ten feet, we see that effectively half of possessions already have non-crashing rebounders in place.

To hammer the use cases’ final point home, we used 2018-19 NBA data to identify the average distance of a rebound given the distance of a field goal attempt. By plotting the results, we obtain a rather wonky-looking plot.

Here, we see that dunks tend to become rebounds roughy 5-6 feet from the basket. This occurs primarily due to the fact they are dunks, and a defender is typically contesting the dunk. At this point, a secondary player obtains the rebound.

As the distance stretches out to 2-3 feet, we obtain contested jumpers, layups, and hook shots. These tend to fall again to secondary defenders, who are moving within the flow of attacking the basket. The force of a miss shot from this distance is also much less than a missed dunk.

After this, the rebounding distance trends downward as field goal attempts push further out into the midrange. This is expected as time of flight increases for these attempts and there are typically rebounders in position in the lane. Also, we will comment on a midrange phenomenon that occurs later on.

Once we get out to three point range, average rebound distance increases. It’s not noted in the plot, but variance of rebound distance also increases. Where it is effectively 1 foot for close attempts, the value is closer to 2 feet for three point attempts. The rebound distance effect is a dual between force of the ball hitting the rim and spacing that is typically confounded with three point attempts. That is, less rebounders in position.

Using this plot, we begin to understand that crashing is more of a question surrounding three point shooting than it is for midrange attempts.

To better understand the “probability of a rebound” we need to start asking questions such as “where do rebounds go?” In this case, we can build what is called a rebounding plot. Here, we segment a set of field goal attempts from the “same” location and mark where the rebounds fall.

Let’s take, for instance, attempts from a particular 1-foot-by-1-foot box on a three point attempt near the top of the key:

While we see a few rebounds trickle out to the midrange and three point line, the bulk of the rebounds fall within the restricted area, with a bulge towards the left (your right) of the hoop. The sharp flat jut out to the right of the hoop (your left) are short attempts. This is a mirror image of the Trae Young use case, where the first dot outside the paint along the baseline is the position where LeBron James secured the rebound.

Now if we move that shooting position slightly in:

We find the majority of attempts fall short. This is because majority of these attempts are pullup jump shots, which tend to be short when missed. It almost makes no sense to have a weakside crasher in this situation as over ninety percent of rebounds are falling short on the strongside. Crashing this attempt leads to a weakside leak and a potential fast break. It is here we see that ~20 feet jump shot leads to a short rebound.

As we come into the midrange, we find that shots are peppered about the key:

And as we move even closer in:

Here, we are fighting with floaters, which tend to miss opposite side of the hoop as they tend to it the back of the iron.

Given these examples, we can start building a “probability” of where the rebound will go. Conditioned on player position when the field goal is attempted, this gives us a **prior** probability of how likely a team will rebound the basketball. From there, we can look at the effect of crashing by computing the **posterior** probability of getting a rebound observing where players end up when the rebound is secured and tallying who secured the rebound.

We can also perform a similar analysis on starting fast breaks. By computing the probability of a fastbreak igniting when a team crashes versus not crashing.

The resulting **odds ratio** then allows us to identify the capability of a team crashing. Running this across the entire league on three point attempts for the 2018-19 NBA season, crashing is actually a **negative return on investment**. Meaning that crashing tends to indeed ingite fast breaks. And if a team is poor in transition defense, this can spell disaster.

Across the coaching staffs that I have interacted with over the years, there is one piece of transition defense philosophy that is shared despite having different names. And that is: **get back on defense**.

In particular, no leading transition defensive player should be in the back court when the ball crosses their own three point line. This area has been referred to as **No Man’s Land **and **Zone of Death**. Basically phrases certain assistant coaches have used to express being a poor location to be on defense. I have even heard it called the “Horseshit Area.”

In the Hawks use case from above, this is exactly the problem encountered: Young is the shooter and falling forward on the attempt. Len actively chooses to crash, leaving Caldwell-Pope wide open to leak. Turner and Crabbe fail to get out of No Man’s Land in time.

As James’ pass crosses the three point line, Turner is stuck deep in No Man’s Land and Crabbe (who is not in frame) is still three feet inside the region. By the time the ball crosses half court, Crabbe has managed to take two steps to get to three feet on the correct side of the court, but he’s already too late.

Where this transition defense ultimately breaks down is due to Allen Crabbe’s indecision to crash or retreat. It is not only his man who secures the rebound, but he is caught in transition defense’s No Man’s Land. If Crabbe crashes, he makes the pass from James much more difficult to attempt. It may actually have forced James to put the ball on the ground. If Crabbe doesn’t crash, he has to recognize that he’s gotta do everything to get across half-court, as opposed to tracking the field goal attempt until it’s too late.

One drill that I’ve seen across multiple pro teams is the “transition defense” drill. Of course, that’s a vague name. So Let’s break this down. In a 3-on-3 setting, defenders are stationed in semi-typical positions on offense. Offensive players are positioned along the baseline; coach holding the ball.

When the coach throws the ball to a random offensive player, the corresponding defensive player must run and touch the baseline as the other two defensive players retreat.

There are three objectives to the defensive retreat. First, the teammates must stop transition. Second, the teammates need to communicate properly. Third, the two primary defenders must **clear No Man’s Land**:

If the defense is unable to do that, the offense is awarded a point on top of what they can score.

Over the last 15 years, we have started to see the decline of offensive rebounding percentages.

While last season saw a little bit of an uptick (.229 compared to .223), it’s predominantly crashing from the three point line that has begun to decreases dramatically, bottoming out at .190 last season. This is almost in lock-step with the league trend of retreating as the remedial math tends to back this up:

If a team makes 35% of their 3PA and crashing produces a 24% chance on securing a rebound with a following .55% chance of scoring 2 points; but yields a 30% chance of giving up a transition with 80% chance of scoring 2 points, the expected gain of crashing is -.1404 points. This seems arbitrarily small, but suggests for 30 3PA’s in a game, we will tend to give up an average of 4.2 points if we crash consistently. To put this number into context, 51 of the first 188 games (27%) of the 2020 NBA season have ended regulation within 4.2 points.

That said, the anecdote is not to retreat all possessions. But rather better understand how to tactically crash. That is, design schemes that not only create space for shooting, but also place players in position to rebound. Similarly, designing schemes for retreating when appropriate such as tagging transitioning offensive players and chasing if the tag is too far.

In a game of slim margins, a team cannot afford to let opponents get offsides in transition consistently. Here’s one symptom of such a case. It’s up to the teams to plan accordingly.

]]>For instance, let’s take a look at the New Orleans Pelicans and the Minnesota Timberwolves from last season. Both teams finished with a 111.4 offensive rating and slightly different defensive ratings: 112.6 (Pelicans) to 112.9 (Timberwolves). From a casual level, we’d expect these teams to end up roughly the same in the standings, as they are both from the same conference. And to a degree, that’s effectively what happens. The Timberwolves finished the year at 39-43 while the Pelicans clambered in at a Zion-winning 36-46.

Despite these ratings, we barely have scratched the surface with these two teams. For starters, the Pelicans played in roughly 220 more possessions than the Timberwolves: 8497 to 8279 on offense and 8504 to 8278 on defense. This suggests that the Pelicans played at a faster **pace** than the Timberwolves. And this is indeed the case at the topical level where the average possession for the Pelicans is 13.94 seconds (3rd in the league) to the Timberwolves’ 14.37 seconds per possession (14th in the league).

Remember, these teams have identical offensive ratings. Therefore, combined with adjustments for pace, we should see the same distribution of **potential ****offensive possession ending categories: **Field Goal Attempts, Free Throw Attempts, and Turnovers. Here, we make the assumption that end of period possessions are negligible as teams tend to have nearly identical amounts of period ending possessions.

In this case, we find that the Pelicans 140 extra turnovers compared to the Timberwolves, but 74 less free throws. That’s an estimable 107 extra possessions from the Pelicans. We expect, if all extra field goals are misses with defensive rebounds, the Pelicans to have roughly 110 extra FGA’s as a best case scenario for breaking down an offensive possession. Instead, we find that the Pelicans have a measly 80 extra FGA’s. This means we are missing 30 possessions when comparing these teams…

The reason for this, is due to the **chance. **

A **chance** is defined to be a segment of a possession that results in a field goal attempt, a free throw attempts that results in a potential change of possession, or a turnover. It is the segment of time that breaks up a possession into actions that result in loose balls (rebounds) or outright change of possession (out-of-bounds, steals). Every chance, like a possession, has a point value attached to it. And a nice relationship of possessions and chances is given by

From this relationship, we can model possessions as a collection of chances. Using chances, we can start to decompose players into their **roles within a chance**. While we understand that all possessions are not equal, chances are also not equal. Back in May, we took a look at the impact of turnovers on possessions/chances; in both dead ball and live ball situations. Four months later, Seth Partnow took a more in-depth look at the typical “five” categories for ending possessions:

- Live Ball Turnovers (steals)
- Defensive Rebounds on Missed FGA’s
- Dead Ball Situations
- Offensive Rebounds
- Munged Category of FGM’s, FTA’s, and DREBS on FTA’s.

These categories are almost a perfect partitioning of points. Steals lead to zero points on offense. Dead Ball situations are effectively dead ball turnovers with zero points on offense. Defensive Rebounds on Missed FGA’s are zero points on offense. However, Offensive Rebounds are not necessarily pointless chances, as they may come off of missed FTA’s after a basket (and-1) or a missed back-end of FTA’s. We make this note to identify that the category that deserves the most care in analysis of chances is the offensive rebound.

In particular, it is this category quantity that drives the well-known **Second Chance Points** statistic. And it is here that Minnesota “steals” possessions away from New Orleans in the comparison above.

Taking a step back from Seth’s “five” categories, traditionally, a chance has been defined as through field goal attempts, free throw attempts, and turnovers. Traditionally, from box score data, the number of chances has been represented as

We parse this down as a field goal attempt will lead into one of four results: a **defensive rebound**, an **offensive rebound**, a transfer of possession due to **made field goal**, or a **free throw attempt due to foul**. Through free throw attempts, obtained either through continuation or non-continuation, is traditionally viewed has having a forty-four percent chance of transferring possession to another team through **made free throw**, becoming a **defensive rebound**, or staying with the offense through an **offensive rebound**. There are other nuanced situations with free throws, but we will deem them negligible; such as the free throw that results in turnover due to lane violation on the offense.

Using this traditional setting, we find that the Timberwolves attempted 9,435 chances compared to the Pelicans’ 9,570; a difference of 135 chances. While this doesn’t explain the full 30 possessions that seem to be missing, we find out that the Timberwolves had more offensive rebounds and therefore had more opportunities to score per possession than the Pelicans.

With the notion of chance outlined above, the real goal of this article is to identify subtle artifacts of team dynamics. For instance one year, while working for an Eastern Conference team, I was out scouting a college game with another analyst. During the game, the analyst mentioned something about how three 25% usage players cannot coexist on the same team because their usages are too high, there’s only one ball.

I mentioned that usage cannot be added as it’s a ratio. I was told, “Usage is not a ratio. Usage is usage.” It was a very alarming comment to hear, especially from an analyst on a team. But none-the-less, I retorted that all probabilities must sum to one in the end. Unfortunately, I was scoffed at over the notion that probabilities had to add to one…

Despite the pushback, the traditional form of **usage** is a ratio of chances completed by a player given the number of chances possible during the player’s time on the court. The current standard model for usage is given by

There is an abuse of notation here, but we will explain it. The value **P **is the player of interest, while **Pt **is the time at which a player is on the court. This means the numerator is the number of chances executed by a player, while the denominator is the number of chances executed by the team while the player is on the court. The resulting value of usage is then a percentage of chances executed by the player, **P**. Commonly this value is multiplied by 100 to help readers understand that it is a percentage.

The above formula is the classic **play-by-play** version of usage. In the **box-score **version, adjustments using minutes played and a factor of five emerges to estimate the denominator. This is exactly the form found on basketball reference.

What is not so well established is that usage is a **conditional statistic**, dependent on a **sampling frame**. This means, usage has to be treated with care when being discussed. Making claims about two 25% usage players is * almost* completely meaningless if they are not sampled using the same sampling frame.

In the end, chances have to be gobbled up by players and ultimately someone on a team will gobble up over 20% of chances. Let’s take a look at this over a simulation…

Consider a game of 2-on-2 with teams of 4 players. In this case, there are six potential lineups. If players are labeled A, B, C, and D; the lineups are labeled as AB, AC, AD, BC, BD, and CD. Suppose we witnessed 1000 chances played by this team and they have the following breakdown:

- Lineup AB: 300 chances
- Lineup AC: 200 chances
- Lineup AD: 200 chances
- Lineup BC: 100 chances
- Lineup BD: 100 chances
- Lineup CD: 100 chances

Also suppose there is a secret **true underlying usage** of each player. That is, there is a real probability that a player would complete a chance given across the team. This probability **must add to one across the team**. Using this true underlying usage probability, we can then simulate chances and obtain an observed usage value.

Note that from this rotation, A plays in 700 chances, B plays in 500 chances, C plays in 400 chances, and D plays in 400 chances. Running one simulation, we find that the usage for each player is given by **(.801, .404, .410, .1825)**, for players A, B, C, and D, respectively. This is obtained from the usage formula above!

First, we see that this is not a true usage, as the players’ probabilities do not sum to one. Normalizing will not give us a remotely close answer. The normalized answer is **(.446, .225, .228, .101). **

Second, we see that **team usage **is much closer to the truth, but not quite there either. Team usage, being the percentage of **total team **chances, changes the denominator in usage to look at all chances; regardless if the player of interest is in the game. In this case, the team usage is **(.561, .202, .164, .73)**. However, we can do better with estimation.

Since we have a perfect sampling frame, called a **Balanced Incomplete Block Design **(BIBD), we can apply the associated algebra to recover the true usages of each player with respect to their team.

Have you figured out the true usages of each player?

The example above highlights an important distribution in basketball analytics: the **incomplete multinomial distribution**. This distribution descriptively states that while there are a collection of options we can select from, we cannot observe all options simultaneously.

In the case of lineups, we cannot play the entire roster at the same time. We can only select five players. In technical terms, we are looking for the probability that player **Yi** within a lineup **Ci** from a team of players **Ai** will execute the chance:

where the **p**‘s are the players’ **true usage** on the team. In real life, we never know the values of **p**, just as we agonizingly forced upon ourselves in the example above. Therefore, we take our sampling design (lineups) and observed usages at the player and lineup levels and perform an estimation procedure.

The likelihood function for the incomplete multinomial distribution is given by:

where **p** is the vector of true usages for the players of on a team, **a** is the observed chances executed by a player, **b** is the variable cell counts associated with the different lineups, and **S** is the matrix indicating the player lineups.

For the example above, we have

We use the designation of the count **b** as being negative due to the offset of **chances – first player**. Yes, the players are ordered in terms of most used to least used.

Attempting to solve for **p** is this distribution is challenging. The maximum likelihood method leads to a series of equations:

Drat, there’s that pesky sum of probabilities must be one, again. Seriously, however, we have four equations resulting from the MLE problem. Note that the **(i)** term indicates the row vector in **S**. The value, **TAU**, is an auxiliary vector that arises in the computation of the MLE; and therefore can be “injected” into the second equation, provided the inverse necessarily exists in the top equation.

As there is no analytic solution for the MLE, we can perform an optimization. A proposed algorithm by Fanghu Dong and Guosheng Yin identifies a numerical method for applying a fixed-point interation methodology for finding an optimized maximum likelihood estimator for the incomplete binomial distribution. They call this the **Weaver Algorithm** after the mechanical weaving machine.

</pre> p = np.array(np.ones(4))/4. s = np.sum(a) + np.sum(b) error = 1 while error &gt; .00001: tau = b/np.dot(delta,np.transpose(p)) print(tau) temp = a/(s*np.array(np.ones(4)) - np.transpose(np.dot(np.transpose(delta),np.transpose(tau)))) print(temp) pup = temp/sum(temp) error = np.dot((p - pup),np.transpose(p-pup)) print(error) p = pup print('True Usage: ', trueP) print('Estimate: ', p) print('Uses: ', a)

Using this code above, we obtain estimates of the true usages:

That is, the estimated true usages are **(.587, .185, .154, .073)** which are much closer to the truth; which are (**.60, .20, .15, .05)**.

For the 2018-19 NBA season, the Brooklyn Nets had a total of 19 players on roster that led to 637 different lineups of the 11,628 possible lineup combinations. Of these 637 lineups, 439 lineups managed to draw at least one chance. That is, a total of 198 lineups such as

Jared Dudley, Ed David, Shabazz Napier, Joe Harris, and Treveon Graham

played together for at least one game and registered zero chances.

Restricting ourselves to all lineups that participated in at least one chance, we find the distribution of chances executed by player:

Here, we see that D’Angelo Russell maintained the most chances for the Nets with 1860 estimated chances. Significantly behind Russell was Spencer Dinwiddie at 1137 chances. Using conditional usage, this is 31.1% for Russell and 24.2% for Dinwiddie.

At the team level, this turns out to be much smaller at the team scale. Running Weaver’s algorithm, it turns out that Russell’s overall usage is closer to 24 percent!

We see that Dzanan Musa’s usage gets corrected to reflect his playing time and that, despite only taking 19% of Brooklyn’s overal chances, D’Angelo Russell doesn’t tumble down towards 19%. Instead he “corrects” to 24.16%.

Using the unbalanced incomplete block design, we obtain an estimated 31.4% usage across all lineups, not far off from the measured 31.1% as before. Therefore recovery using the sampling frame is shows that Russell’s “scheduling” as a 24.16% player would indeed result in a 31% usage player.

We can compare the Nets to the Denver Nuggets. The Nuggets played with a total of 18 players during the 2018-19 season and used even far fewer lineups that resulted in chances: 329.

The Nuggets are used as a contrast only to show what a “top-heavy” team does with respect to true usage:

In this case, we see the Nuggets primarily use Jamal Murray and Nikola Jokic. This is no surprise. Their respective values of 24.9 and 27.4 come down a bit, but the relationship/offset remains relatively the same.

Here we see the comparison of Denver and Brooklyn as usage of players in order of highest to lowest. In this case, that despite Denver having the “star power” of Jokic and Murray, they also maintain a stable 7-man rotation; whereas Brooklyn has 5-man rotation. Typically, teams want to run with 7-8 man rotations.

Brooklyn’s knock on usage comes from the cost of injury as the team managed to have only 6 players play 65 or more game; a minimum of 80% of the season.

The takeaway here is we get to use an incomplete sampling frame to being to understand the underlying value of players within a system. A significant challenge of this algorithm, however, is the aspect of injury.

Under this model, the rotations are assumed to be at the discretion of the coach. However, a player may not play due to injury. Therefore a much more advanced model is needed to be used. That, in turn, is called the **censored incomplete multinomial **model. But that’s for another day.

The grand challenge is the ability to adequately measure a player’s basketball IQ. Instead, we focus on the components such as court vision. For, a player may have wonderful court vision but limited mechanical (compared to their counterparts) ability to score. Likewise, some players may be physical beasts and can devastate competition without understanding the value of the pass; like Wilt Chamberlain before he was encouraged to pass more and went on to lead the league in assists two seasons later. But how do we measure a player’s court vision?

One method to measure court vision is by **proxy**: the process of taking observable values and applying them to parts of what is agreed upon to be a **sub-task** of true underlying measurement of interest. We say sub-task as many proxies may be used to create an overall understanding of the quantity of interest.

For example, what makes a “great” defender? We could use a proxy of **steals** but not all defenders are credited with a steal even if their defense causes it. We could use a proxy of **blocks **but not all blocks take away possessions (only chances). We could use coverage of a player, but now we have to define that term in a way that people can agree. Or we can eschew defense and infer it from a higher level through **regression methods**.

For court vision, we focus on the offensive component and look at one of the proxies: **passing directionality**. We choose passing directionality because while it is a very simple item to understand, there is an underlying difficulty that arises when trying to say anything intelligent about it, and we have the **cut locus **to blame.

Passing directionality is the **direction in which a player attempts a pass**. For every pass a player makes, the ball exits with an angle from some **reference frame**.

To gain an understanding of a reference frame, consider an airplane traveling over the surface of the Earth. We care about two horizontal vectors, **East **and **North**. North points out the nose of the plane. East points out the right wing. But we also care about **Up**, which identifies where the ground is below us; a very important thing to know when flying. If, at any time the ground cross into **North,** our plane is pointed directly at the ground. Here, North is the principal vector of the reference frame and the angle towards the ground is the **azimuth**.

In terms of passing, since we never know the way a player is facing, we proxy their reference frame by assuming players always want to go to the basket. Therefore, the reference frame always has the **principal vector** facing the basket. In this case, any passes in the direction of the basket will have azimuths between **-90 and 90 degrees**. Any passes away from the basket will have azimuths between **-180 and -90 degrees**; as well as between **90 and 180 degrees. **

Here, we also note that passes to the left of the player are positive angles: **0 to 180 degrees **while passes to the right of the player are negative angles: **-180 to 0 degrees. **

Now, if we look at a pass from this player’s position, we have two vectors in the **embedded space** of the court. Using the embedded space of the court allows us to identify the angle from the principal vector in the reference frame. This is through the **dot product:**

Here, **P** is the reference vector, defined by the location of the basket **(25,5.25)** from the player **(x,y)**. Then **P = (25 – x, 5.25 – y)**. We similarly compute **Q**, the pass vector as the receiving player **(x’,y’)** from the player. Hence **Q = (x’ – x, y’ – y)**.

In code, this is relatively straight-foward with start being the player and end being the receiving teammate:

Now that we have directions of passes computed, not we can start to do some analysis… Unfortunately, we just opened a big can of worms. Namely, passes are no long **Euclidean**. Instead, they are **Spherical data of dimension one** and computing something as simple as a histogram fails (gives false results).

Using the reference frame approach, we now have a collection of angles. As the angles range from -180 to 180 degrees, we describe a **circle** instead of **Euclidean space**. The key differences are that **ONE: **the differences in pass direction are measured in **angles** not distance; and **TWO: **we have a cut locus. A cut locus is a place where multiple “straight lines” converge at the same point. Using key difference one, we are saying that a straight line is the **arc length of the circle **defining the direction of the pass. Using the reference frame above, the cut locus is at 180 degrees! This is a player making a pass directly away from the basket.

Knowing that tracking data is not quite deterministic (we can get different measurements for the same player location), we should not rely on the vectors directly, but instead focus on the **probability distribution** of a player’s pass direction. This amounts to computing a density estimate on the circle.

If we perform a naive analysis and apply straightforward kernel density estimator, the cut locus will give us a probability jump and throw away potentially important data. For instance, if a pass is made at 179 degrees with a reasonable error of three degrees, then we know the pass can be made between (176,180) degrees **AND** (-180,-178) degrees. The usual KDE will ignore the second interval and the resulting interpretation is that the pass simply “disappears.” Unfortunately, passes cannot disappear into the Upside Down.

Instead, we must perform **manifold kernel density estimation** to understand the distribution of passing direction.

The usual kernel density estimator is given by

where **n** is the sample size, **h** is the bandwidth, and **K** is the kernel smoothing function. For a given player, we can look at a collection of **n** passes of interest. Each pass is then viewed as a noisy estimate with some measurement error (bandwidth). The resulting kernel function is how that measurement noise is distributed about the measurement.

In classical kernel density estimation, the most common kernel function is the Gaussian smoother:

So we must use an analog version of this with the circle in mind. To this end, we can leverage the **von-Mises **distribution:

which only runs over the angles between -180 and 180 degrees. Here, **mu** is the mean direction and **kappa** is the **concentration**. Think of concentration as the inverse of variance. The larger **kappa** is, the tighter the distribution is about the mean. However, thanks to the cyclic nature of the cosine function, we ensure that passes don’t disappear when it crosses the cut locus.

The sacrifice we make is that the bandwidth is no longer separable. Under the von-Mises distribution, the bandwidth is absorbed into **kappa** and is contained within the **modified Bessel function of kind zero **defined by

which ever-so-nicely ensures that our probability distribution integrates to one!

Using this set-up, our circular kernel density estimator is given by

where now a **larger bandwidth** indicates the same things as a small bandwidth in the traditional kernel density estimation methodology.

As an application test case, let’s look at a subsample of passes from Steven Adams of the Oklahoma City Thunder. Here, we take the position of Adams at every pass, calculate the angle between the pass and the basket at Adams’ position, and mark that as an orange dot. Using zero degrees as the reference frame’s principal vector, we draw the circular manifold in green and apply the kernel density estimator above:

Here we see that Adams primarily makes passes to his front left at approximately 45 degrees and to his right front at approximately 40 degrees (320 degrees on the plot). Here, the green circle represents the manifold which describes the passing direction. Zero degrees always points to the basket. The blue line is the kernel density estimator. This shows that Adams primarily attacks the rim with his passes, but tends to favor his left.

At the high level, this is informative, however, we have lost the court information. We don’t know where Adams is making passes. More importantly, we don’t know if his passes are location dependent. For example, does Adams pass different from the left elbow than the right elbow?

To understand this, we must perform a conditional distribution.

When we condition the circular kernel density estimate, we begin to see the dependence of the passing directions based on the player position. Steven Adams is not a great example to show this off. Instead let’s take a look at John Wall.

At the left of the top of the arc, we find that Wall primarily passes towards the corner. Note that this doesn’t suggest passes do go to the corner. Only the direction. This can be passes leading into a give-and-go with a post player as well. However, we see that at this location, his passes tend to go left and forward at about 80 degrees.

However, if Wall move in to the foul line, we see the distribution of his passing change to looking in two directions with a slight preference to the right. At a cursory glance, this my be a reaction to a weak-side defender stepping up to cut potential drives. From the angular point of view, this is most likely a “kick” pass to the weak-side three point line as the direction points to below the break.

If Wall gets into the lane, we see that his passing almost goes entirely to the right. The small pocket to the left is pointing towards the dunker position, which is most likely a dump pass to get the ball out of congestion. The blip right to the rim is an oop passing lane. However, predominantly (over 75% of the time) the pass is getting kicked out.

Let’s put this all together and simulate a drive by Wall.

Here we see how the “court vision” with respect to passing plays out through the course of a drive. We can now start to perform other methods of analysis to better understand changes in passing vision; such as “do weak side defenders help?” or “If I position a defender in this location…” We can partition the distributions and perform circular distributional tests.

For the remainder of the article, let’s enjoy the subtle differences of players…

While Brogdon is on the Pacers for the 2019-20 season, his data was collected for the 2018-19 season with the Milwaukee Bucks. Here, we see a very Milwaukee-centric style of play.

As he traverses the same path as Wall, the passing vectors go **backwards** towards most likely Giannis Antetokounmpo and Brook Lopez. But as Brogdon attacks the rim, his passing directions change to the dump pass to the strong side block and out back to the right wing on the weak side. This was a Bucks’ staple in exposing the weak side collapse for open looks a the perimeter.

This type of passing regimen comes from a **distributor** who is not a primary option on scoring, but rather a player that protects the ball and forces the defense to swarm. These players tend to look away from the basket and create mid-range and beyond-arc opportunities.

Jaylen Brown follows a similar pattern to Brogdon with one slight difference: A high-low passing vector emerges during the drive. Boston, notorious for off-ball players beating their defenders baseline, shows that Brown looks for that pass during the drive.

Also notice the the zero vector is almost pinched to absolutely no distributional weight. This is very apparent when Brown gets deep into the lane. This indicates that Brown **is not passing to score**. He is going for the layup. In Brogdon’s case, he is still looking for a dump underneath the hoop, or a dunk from a teammate. In Brown’s case, he’s attacking the rim himself.

The left and right bulges in the distribution within the key are dumps and kick-outs. He tends to look for “at-the-break” players and anyone near the strong-side block. This is the profile of an **attacking guard **within a system.

Looking at one of the biggest names in the game, LeBron James is yet another style of player. James fits the profile of an attacking guard but without a system.

James starts with the standard perimeter passing profile at the top of the key, but as he drives into the lane, he predominantly looks short corner strong side and below-the-break weak side. Once he gets into the lane, he becomes one of the most dangerous players in the league: **uniform attacker**.

This type of distribution shows that all the weight goes just forward. There is a slight bulge towards the strong block, but weak elbow, weak below-the-break, and anywhere near the rim becomes primary options. This suggests that James **reads and reacts** to the defense accordingly.

One he gets deep into the lane, there’s three passing options: strong-side dunker dump, kick-out to weak side, and alley-oop to the strong side rim. There is no wonder why James averages 8-9 assists per game despite being a premier scorer and playing usual starter minutes (~35 a game).

Steph Curry follows the same form as LeBron James when it comes to attacking the rim. Curry keeps a large distribution in front of him, as opposed to peaking in certain directions.

That is, until he gets into the lane. At this point, he beings to look weak-side block. It is this position that players such as Klay Thompson are cutting behind the defense (Boston-esque in that nature) or players like Andre Iguodala and Shaun Livingston were waiting for the defense to collapse to get potentially open looks at 3-10 feet from the rim.

Compare to Wall above, who almost completely turns into a right-only vision player; we begin to see why Golden State is more likely to beat you from anywhere rather than the Washington Wizards: half the court disappears on drives in Wall’s case.

Armed with some simple manifold nonparametric learning as circular kernel density estimation, we can begin to understand some of the vision associated with players. Merely one small piece of the pie in decision making.

However, we are able to start performing testing of certain player capabilities or schemes. We are also able to “scout” player tendencies, and more importantly: **quantify them. **

And even more so importantly, we are able to start attaching probabilities to actions. Instead of quantifying passes by proxies through “win probability” or “change in shot quality,” we can now quantify the probability of a pass as “how likely will he actually make this pass?” At the scouting level, it tells me where I can make an adjustment.

Plus, the animations are a little cool, too… right?

]]>where **Y** is the vector of offensive ratings, **W** is a diagonal matrix of possessions played by the stint of interest, **X** is the player-stint matrix, and **(sigma, tau)** are the likelihood and prior variance, respectively. Putting this all together, and leveraging conjugate distributions, we find that the posterior distribution is indeed a Gaussian distribution:

From this seemingly tedious calculation, we find that the RAPM estimate for each player is given by

This is exactly the RAPM estimate that you see given on many of those other fancy websites. To put this into the offensive-defensive RAPM context, let’s understand again what this equation is doing. First, we have the offensive rating, **Y**, which is effectively points scored per 100 possessions (not differential as in some other forms of RAPM). Multiplying by **W** turns the quantity **WY** into a “100 times points scored.” Since the design matrix, **X**, is stints by players, the value **XtWY** is merely identifying the stints for which offensive players contributed points and defensive players discounted points and adding such stints together. This quantity is effectively “100* plus-minus” for each player.

Now, that inverse quantity… The quantity **XtWX **is counting the number of possessions each “tandem” has played in. The diagonal element will be “10* the number of possessions played,” as there are 10 players on the court. The off-diagonal elements are some multiple of possessions played; where the some multiple indicates players who play multiple stints together. In a previous post, we saw that just using this quantity in the inverse led to a mathematically unreasonable solution: reducibility… which led to infinite variances. Here, we rectify this by introducing that prior weight. This is a mechanism that biases the final result but allows us to obtain a reasonable variance on the final estimate. This means the inverse quantity is “inverted” possessions played between teammates. The inverse identifies some extent of the correlation between players playing together (or against each other).

Therefore, the final estimate is in “effective points per 100 possessions given some prior variance **tau**” for each player. In code form, using some pre-determined **tau** (I selected 5000 because that’s from literature) we have

The value **beta[-1] **is the intercept term, effectively meaning “baseline offensive rating.” Here that value was approximately 98. Courtesy of Ryan Davis, we obtain stint data and run this code to get the following output:

It’s not quite what we find on his website; but they are close. In fact, on his tutorial, Davis has Nurkic as 13th overall and Ingles as 14th overall. Above we have them at 12th and 13th. However, LeBron James is 19th in the above list while he drops to 36 on Davis’ list. Also note that Davis’ RAPM estimates are smaller, which indicates his **tau** is even smaller than ours (leading to a larger lambda). Regardless, we have effectively the same results.

What we also get from the tedious computation is the variance term associated with each RAPM estimate:

Since this is a regression model, we are able to estimate **sigma** by computing the residuals of the model:

where **N** is the number of stints observed, **P** is the total number of players observed, and the term **N-2P-1** identifies the number of degrees of freedom within the regression model. Note that there is a subtlety here: we assumed that every player who has played a single possession on offense has also played at least a single possession on defense. This is not a guarantee; therefore we may change **2P** to be **P_o + P_d**, where **P_o** is the number of players who have played at least one offensive possession and **P_d** is the number of players who have played at least one defensive possession.

At this point, much of the focus of RAPM is placed on determining the prior variance, **tau**. Typically, folks will eschew prior variance estimation by instead applying cross-validation to identify a “best” **lambda** term. In the outside literature, this value has ranged between 500 and 5000. In conjunction with estimation of **sigma** from the regression setting, we can extract an estimate of the prior variance through **“hat**{**sigma**} **/ hat{****lambda**}”.

What’s even nicer is that since we have a Gaussian posterior distribution, we know the highest posterior density (HPD) intervals determining confidence is equivalent to the standard confidence interval for Gaussian random variables. In this case, we can follow the simple “estimate **+/-** critical value **x** standard error” formulation. For a single player, we can look at the marginal distribution, which is also itself a Gaussian distribution. For a lineup of interest, we look at the joint of the individual player marginals.

Let’s apply the above techniques to the 2018-19 NBA Season. Courtesy of Ryan Davis, we obtain a stint file for which a row of data corresponds to indicators for the five players on offense, indicators for the five players on defense, the number of possessions played, and the number of points scored. From this data set, we can extract the values of **Y**, **W**, and **X** accordingly. For simplicity, let’s just assume that **tau = 5000**. The above table shows the “Top 25 players.” In terms of coding the error, we simply run:

Replicated with variance terms and marginal confidence bounds we now have

Here we see that Danny Green is atop the leaderboard. While this would suggest that Danny Green is the biggest contributor to net ratings, we know this is really not the case. What’s more important is that we should take a look at his variance term. Looking at the marginal of Danny Green’s Offensive and Defensive ratings, we can compute the confidence interval for the net rating. In this case, the 95% confidence interval for Danny Green’s net rating is given by

where **sigma_o** and **sigma_d** are Green’s offensive and defensive RAPM variances, respectively. The value **rho** is the correlation between Danny Green’s offensive and defensive numbers. For this exercise, Green’s offensive/defensive covariance matrix is given by

Using this variance-covariance matrix, Green’s Net Rating of 4.66 is really viewed as some value in between

Comparing this to the rest of the league, we see that 39 other players fit within this confidence bound, indicating that despite being the “league leader,” Danny Green really identifies within the Top 39 players in the league. This is a “best case scenario” for identifiability. In fact, if we grab the **200th** player in the league, Kyle Korver, we find that his confidence interval is **[ -1.21, 4.16]**. This indicates that Korver is equivalent to 460 other players in the league; ranging from **Giannis Antetokounmpo (6th) to Damyean Dotson (464th)**.

Now let’s extend this out to a starting unit. For sake of argument, let’s look at a “starting” lineup for the Brooklyn Nets during the 2018-19 NBA season. Using starts as a proxy, suppose the starting lineup is D’Angelo Russell, Joe Harris, Jarrett Allen, Rodions Kurucs, and Caris LeVert. Using the single-season RAPM estimates above, we obtain offense-defense ratings for Russell, Harris, Kurucs, LeVert, and Allen (respectively):

with associated variance-covariance matrix:

The expected Net Rating of this lineup is then **0.5953**. Constructing the univariate variance term, we rely on the variance of a sum of correlated variables derivation:

Reading these right from the variance-covariance matrix above, we obtain a stint net rating variance of **12.7695**. This indicates that the confidence interval for the expected net rating of the Brooklyn Nets’ starting lineup is **[-6.4086, 7.5992]**, which is quite a considerable range over the span of 100 possessions.

It should be noted that we treat this type of analysis with the utmost of care. Recall that we are only using roughly 71,000 stints. For 530 NBA players, this means we only have **at best** 1.5xe-17 **PERCENT **of all possible 10-man lineups. So deviating outside of ay observed lineups is quite prohibitive. Therefore, building a dream lineup around **Joel Embiid, Anthony Davis, Andre Drummond, Jusuf Nurkic, **and **Justise Winslow** would have phenomenal RAPM considerations, it is not part of the **sampling frame** and therefore non-representable (read that as meaningful) in results.

In intel analyst speak: “We cannot determine what’s going on in Zimbabwe if all we do is look at Cincinnati and Rhode Island.”

Over the series we have created about RAPM, we’ve identified several of the benefits gained by regularization while noting the various pitfalls if we simply embrace the numbers. While we know the central limit theorem actually fails and ratings are not necessarily Gaussian at each conditional stint level (part 3), we can perform the regularization to impose a PCA-like solution (part 2) to understanding ratings better than in basic APM (part 1). However, we see that the variances are still relatively inflated and that we do not get a great understanding of player impact; see Kyle Korver above as the standard test case. Instead, we obtain a filtered identification of a player. And instead of relying on this muddle number that lacks a considerable amount of context; we can instead leverage this technique to impose further filtration on player qualities. This is the case for more recent advanced analytics such as RPM from Jerry Englemann and PIPM from Jacob Goldstein.

Straying from the technical advancements, we can also leverage RAPM as a “smoothed but biased” estimator for discussing the impact of a player on offense and defense. The reason we suggest “instead of technical advancements” is due to the fact that defining defensive metrics is **really hard**. A great synopsis of using RAPM to discuss this point, as opposed to creating potentially misleading defensive statistics is given by Seth Partnow at the Athletic.

However, we can put to rest the commentary that “understanding errors” in RAPM is difficult and instead embrace what these values are really telling us. (So please stop e-mailing me about this topic!)

]]>With the creation of Synergy, the basketball world gained valuable access to previously hard-to-obtain data on all field goal events in the league. One of the biggest introductions was the “primary defender” tag on field goal events. With play-by-play data, when a player drives to the basket or attempts a step-back three, the players are logged as **Player A driving layup 2PA **or **Player B step back 3PA. **The only way to exfiltrate the defender information is to go back and watch the actual film. For an NBA season, 1230 games each of approximately 2 hours leads to 2460 hours of film to review. And it’s rare that we can watch, log, and verify at real time. Also, for reference, there’s only 2087 hours in a standard **work year** according to the federal government; and that’s taking no vacation days.

Another fantastic introduction in Synergy is the **play-type** field that identifies the action that leads to a field goal attempt. For instance, a pick-and-roll may occur and it frees up a drive to the basket. Again, in play-by-play data the play is logged as **Player A driving layup 2PA**. But in Synergy, we get to know who the screener is, who the ball handler is, and who the primary defenders are. As an analyst, if we wanted to measure the quality of a shooter in say “pick-and-roll” events, all we had to do was open up Synergy and sort on field goal percentage on pick-and-roll events.

The key here is that Synergy leverages **mechanical turk **logging of games. It uses loggers and verifying loggers (opposed to machine learning) to help ensure accuracy of their data. There’s also “one-touch” video in Synergy, which allows the analyst to view the play in question; which is undoubtedly the best feature of the system. If we are interested in every pick-and-roll that Damian Lillard plays in, we can filter on Little and Pick-and-Roll events and click on any attempt we are interested in. There’s a reason why Synergy is expensive to the casual viewer. There’s definitely a lot of blood, sweat, and tears that go into this platform.

Over the previous six years, Second Spectrum attempted to leverage tracking data to perform similar tasks as Synergy, but also improve the quantifiability of players in given situations. To this end, instead of mechanical turk-ing field goal events, second spectrum could identify all pick-and-rolls; which include non-field-goal attempts. This was a revolutionary step from Synergy’s sortable table of only-field-goal attempts. For starters, the analyst could now **track how many pick-an-rolls defenders could disrupt and deter any field goal attempt**. Therefore instead of seeing a switching defender give up say 47-for-80, a rather terrible 58.75% defensive field goal percentage, we may find out that teams have actually ran 137 pick-and-rolls against that switch defender. That 58.75% is really a 47-for-137; or 34.31%. In case you were wondering; this was Rudy Gobert from a random subsample of games.

Instead of using humans in the loop (which is exhaustive just from an hours standpoint), Second Spectrum employs a proprietary machine learning library that classifies **trajectories** as certain basketball actions. One such classification algorithm focuses on identifying pick-and-roll events. The beauty of Second Spectrum’s work is that not only do they have upwards of 200 actions classified, ranging from screens to fast breaks to field goal types and defender contest style; but they also have the Eagle platform to perform similar tasks as Synergy’s platform: we can select plays on demand and watch the video as well.

Key challenges with both Synergy and Second Spectrum focus on the nuances in their logging system.

With Synergy, an analyst must grapple with the logger’s definition of coverage. Two key stories pop up from Synergy that has been shared around the league: JJ Hickson an Earl Boykins. If you’re not familiar with these two stories, here’s the short gist.

One season, a team was interested in finding a dominant scorer at the rim. One quick and dirty way was to use the field-goal location tag in Synergy called **Rim** and sort on all players. Immediately J.J. Hickson popped up to the top of the list. This led the team to investigate Hickson as a potential rim scorer. What the team ultimately found out was that Hickson was indeed a top scorer at the rim; but specifically **at the rim**. He could convert dunks and he tried to dunk a lot. As soon as he bumped out to 2-to-5 feet, his FG% would drop significantly and his attempts would fall off a cliff; meaning he wouldn’t take those shots either.

The reason for Hickson popping up is because Synergy’s definition of Rim is the region near the basket. And unless that team could guarantee spacing (a relatively foreign concept at the time) to ensure Hickson could get 6-8 dunks a game; he wasn’t going to be the guy they were looking for.

Another season, a team was looking for a strong perimeter guard. Of course, in a sorting that would make most analysts cringe, the team sorted on defensive three point percentage as primary defender. Out popped Earl Boykins at the top of the list. Furthermore, Boykins had been near the top of the list for multiple seasons.

It turned out that due to Boykins’ size, teams would try to attempt shooting over him thinking it to be a psychological advantage. Those players would actually take lower quality attempts than usual. For one season, attempts on Boykins per possession led the league but quality was near the bottom and despite teams converting better than quality on their attempts; it was their shot quality (decision-making) that led to an overall lower percentage rather than Boykins’ defensive prowess beyond the arc. Adjusting for quality, Boykins actually turned out to be a solid perimeter defender, but nothing exceptional; which the team was looking for.

What should be clearly stated is that these examples are not showing that Synergy is bad, but rather there is a nuance to the data that is delivered. In fact, Synergy is a wonderful tool when used thoughtfully during executing player analysis.

In the Second Spectrum case, identifying primary defenders and contests are two looming challenges for analysts. While the company provides labels, it too has nuance. For instance, Second Spectrum attempts using a Munkres-Linear-Assignment type algorithm to identify primary defender match-ups. It’s a fantastic machine learning algorithm and is used in several advanced tracking algorithms today; but it’s also nuanced. In some cases, it’s slow to reassign players on switches. Specifically, when a BLUE action occurs, it may not correctly attribute primary defender status on the shooter.

Similarly, defining contesting is a challenge; particularly around the rim. For many years, defining a contest at the rim was poorly applied by Second Spectrum; and the reason is because **tracking data lacks directionality and player verticality**. This is not a fault of Second Spectrum; the cameras can only get what they can get for now.

In the case of directionality, the biggest problem is a player who is back-pedaling on a pass and has no chance to contest a shot, but their momentum carries the defender towards the shooter. They will be labeled almost always as a contester.

Similarly, we do not have any knowledge of the player’s z-axis in the tracking data. This means we have no idea whether a player jumped to contest a shot. So if a player attempts to take a charge or attempts a strip but lets the shooter go; they can easily be listed as a contesting defender.

Given some of the nuances in both Synergy and Second Spectrum, one thing neither system can give is **how a player runs that play**. We’ve been primarily discussing pick-and-rolls. In both Synergy and Second Spectrum, they give us a **marker** and a **result**. What we don’t know is how a team runs the pick-and-roll. Do they run it slower or faster? Do they run it wide or tight? Is it delayed? These may seem rather odd questions, but the answers give way to understanding **how quickly does a team attack the switch** and **how much spacing do they incorporate off the pick** and **how much gravity is expected for the driver to have**, respectively. And it’s these things that can’t get answered directly from Synergy data or Second Spectrum markings.

Instead, we would look back at tracking and perform a well-known task in the geolocation world: **registration**.

Registration is the process of finding a spatial transformation to align multiple point sets. In the case of geolocation, the most common problem is to ensure an aircraft follows its way-points. Using the trajectory of the aircraft, we can compare the aircraft’s trajectory to the intended flight path and identify deviations that may have occurred. The “cool, new” problem is applied to **automated vehicles** such as driver-less cars, to ensure a car is following its course of way-points.

But also, it’s used in many other applications, such as monitoring foot traffic of pedestrians in a park. Measuring trajectories of patrons of a park may help the park officials identify optimal locations for newly proposed sidewalks to be installed. In this case, we look at **thousands **of trajectories and perform registration to see the largest class of paths.

Finally, in basketball, we apply registration to identify **similar plays. **Since we are using registration, we can also identify the amount of **distortion** associated with the play and it’s this distortion (or, technically, **warping**) that gives us insight into the nuances of the players associated in the action of interest.

Spatio-temporal registration is the process of comparing two trajectories through an optimization process combining temporal registration (dynamic time warping) and rigid spatial registration. Combining both the temporal and spatial aspect allows us to compare the trajectories as bodies move along these paths as not only a function of distance, but also of time. The registration process is then identification of the **difference** between two trajectories, allowing us to identify if two trajectories are effectively traveling the same path.

For two temporal processes of spatial locations, **X **and **Y**, of length **Nx **and **Ny**, respectively; we may have to prepare a warping function to align the series. A **warping function** is a function that attempts to find temporal “matches” between two time series:

In this case, if the sampling rates are off, the warping function will attempt some form of interpolation between the two time series. Suppose **X** is a “longer” time series, then the warping function will identify the appropriate slice of time to compare **X** to **Y**. The value, **s**, is then the **segment** in which we compare the two trajectories. Hence the function **PHI_x** and **PHI_y** are simply looking for the the index of each respective series that match within a segment.

Thankfully, we do not have to apply rigid rotations as the sampling rate in Second Spectrum data is typically uniform, sampled every .04 seconds. This means, the function **PHI** is simply looking for an **offset** between the start of a trajectory in question and the segment in which the play elapses over.

To be clear, suppose Pick-and-Roll (PnR) action for a team occurs at 11:38 remaining in the first quarter (seen as time **12 seconds**) and it takes 3.7 seconds to complete the action. Then suppose a second PnR is completed by the same team at 2:17 remaining in the first quarter (seen as time **583 seconds**) and it takes 4.1 seconds to complete the action. Then, for our dynamic time warping function of choice, we may select the larger window and synchronize the motion of interest.

Considering the two PnR plays above, we perform a **temporal registration** on the point guard action only. The play on the left shows a **Hedge-and-Under** defensive scheme, which pushes the point guard away from the basket as the on-ball defender gets extra time to sneak underneath the screen to recover. The guard sees that the screen defender is not going to switch and attempts to accelerate to attack the recovering guard.

The play on the right shows a **Show****-And-Over** defensive scheme that has gone woefully awry for the defense. The screener tangles up the on-ball defender and this forces the screen defender to ultimately switch on the show. The point guard, seeing that he’s drawn the (hopefully) slower defender, accelerates earlier than in the first play. This allows the screener to slip and keeps the on-ball defender straggling behind the attacked screen defender.

Here, we see that the point-guard action is not nearly identical however, the action is almost the exact same: guard drives right, attacks the right elbow, screener slips towards left elbow. Performing a temporal registration will align the motion across both plays.

We see that the lime green lines serving as the warping function does not necessarily find the closest points in space. As the lines turn more sharply than the curves do, this suggests that the second action moves a little fast than the first!

Spatial registration is the task of identifying similar shapes. The most common example is looking at a selection of point as asking, “Are these the same shapes?” Spatial registration therefore looks at **rigid motion** which include **rotation**, **reflection**, and **translation**. Spatial registration may also use other tools such as stretching, however that is for comparing shapes that are measured on different scales. Under the Second Spectrum hypothesis of equal sizes for all frames, we may omit stretching as a factor.

Therefore, the key question is whether a spatial trend is equivalent in **rigid** **motion**. The challenge with spatial registration is that actions may lose their right-left interpretation. For our PnR examples above, spatial registration will identify both a translation and a rotation to match the point guard action.

We see that the action is strikingly similar in pattern, but as a **reflection** and a slight **rotation**. We do lose the information of angle of attack; but we can test for defender effects on this later using the space of rotations, **SO(2)**.

The methodology used for identifying rigid motion (as we see above), is commonly solved using the **Iterative Closest Point (ICP) algorithm**. This algorithm treats the trajectory as a point cloud, regardless of time and looks for optimal matching through an iterative scheme. Unfortunately, this methodology fails to properly register player trajectories as the temporal aspect is too important to ignore.

This leads us to spatio-temporal registration. In this case, we combine both the spatial and temporal registration into a single cost function given by

where **R** is the rotation operator, **T** is the translation operator, and **PHI** is the time warping function. We then can compute and inner- and outer-optimization scheme where the outer loop solves the dynamic time warping problem, followed by an inner loop of spatial optimization. Iterating over this scheme, we then identify a spatio-temporal distance for comparing two player trajectories.

Now that we are able to spatio-temporally register two player actions, we can start to develop **distributions** of player actions. These can be defined as clusters of low-cost comparisons between two trajectories. From here, opportunities are endless. Here’s a couple examples:

Now we can start testing the impact of certain defender actions on PnR plays. In the example above, we saw the same PnR get attacked differently. As we saw the guard respond differently each time, the spatio-temporal registration is actually quite similar. We can then look at the parameter sets of **R, T, **and **PHI** and condition on defender response. Using this, we can quantify the changes in directionality and speed; and begin to answer the following question: **How will a hedge compare to a show by my screen defender****? **

This allows us to separate ourselves from poorly construed results-based analysis such as “Whats my defensive rating when I perform this defender action?”

Another problem we can begin to answer is: **How well can my ball-handler read defenses? **In this case, we can look at changes in the trajectories and again test on **R, T, **and **PHI. **Here, we are not testing god or bad decisions; that requires a target variable. Instead, we are looking at how the **distribution changes** given a new wrinkle in the defense.

For this situation, we may ask about how defensive rotation may impact how a ball-handler attacks the rim. In this case, we may see quite a change in R, T, and PHI dependent on the schemes. We can scan the clusters of registered ball-handler motions and compute the probabilities of making that registered motion given the defensive scheme. From there, we may look at the players associated with the action and gain insight on how players respond to the action. Note, this maybe quite noisy at the player level; so be very careful in making player-based decisions.

Ultimately, the point here is that the game of basketball is performed in a spatio-temporal manner. Therefore it requires tools to analyze the spatio-temporal aspect accordingly. As an attack at the rim between **Damian Lillard** may be considerably different than one performed by **De’Aaron Fox**, despite their spatial trajectories looking the same. Registration also allows for follow on testing without having to rely on result-based analysis. Consider this artifact when discussing perimeter defense; as shooters may not take an attempt despite doing all the right things leading to an attempt.

This way, we can leverage platforms such as Synergy to identify types of plays, Second Spectrum to extract out markers for the plays; but then build our own custom analytics on top of the tracking to perform the rigorous test.

]]>

Whenever we develop an analytic to help describe the game, we typically have to ask three things. First, **“is our analytic representative of the actual thing we are attempting to analyze?”** Second, **“does the analytic yield intelligence?”** Finally, **“is our analytic stable?”** While these seem like obvious requirements, it may come as a surprise that many folks actually miss the mark on one of the three requirements of developing an analytic.

Take for instance, **perimeter defense** metrics. While it has been long known that defensive three point percentages do not truly reflect a team’s perimeter defense; yes that’s three links representing effectively the same view… many folks (including some pro teams!!) still use defensive three point percentage as a barometer for defining how well their team plays perimeter defense. While many will attempt to argue that defensive three point percentage does indeed measure perimeter defensive capability, it has been shown repeatedly (over at least a five season span now) that it is indeed not stable; nor does it yield actionable intelligence.

In response to fighting with survivor bias that comes from play-by-play, savvier teams, have focused on **frequency** and **efficiency** relationships; attempting to understand the **“negative space” **of perimeter defense. That is, of deterrence of high quality attempts and promotion of low quality attempts. Others attempt to mitigate the survivor bias by introducing “luck adjustments.” Whichever direction we choose to go for our analysis, the challenging part remaining is to determine the robustness of our measure.

In this article, we focus on defining a core statistical concept in analytics: **consistency**. For a given analytic, consistency identifies the “biasedness” of an estimator relative to its sample size. As the sample increases, we should expect the estimator to converge to its true value; hopefully the parameter. Consistency is a **probabilistic argument** that is defined by

for a true parameter, **theta**, and its estimator, **theta_n**, for some sample size **n.** Thus, the goal of analyst is then to determine if this equation is satisfied and then identify **convergence rates** of the statistic they had just generated.

Let’s start with a simple exercise to demonstrate consistency. Let’s consider an independent, identically distributed (IID) **Bernoulli** process with some probability of success, **p**. The most basic example is the “coin flipping problem.” So let’s start there. Suppose a coin has a probability, **p**, of coming up heads. Suppose we flip this coin **n** times and count the number of heads on the coin. Our goal to estimate the true value of **p** and then determine how **consistent** that estimator is.

If we’d already had some statistical training, we would attack this problem by exposing our knowledge of the distribution and apply **maximum likelihood estimation** to obtain an estimator for **p**. In this case, the sample mean becomes the estimator and its variance is merely **p(1-p)/n**. But how do we check consistency?

First, we see that

is our estimator of the probability of flipping a head. We can either determine the distribution of the estimator directly, or we can work with the original distribution. In this case, it’s straightforward to determine the distribution of the sum of IID Bernoulli random variables. In many situations, determining the distribution is fairly difficult.

To identify the distribution of the sum of IID Bernoulli random variables, we can look at the **moment generating function** (MGF) and show that the sum of IID Bernoulli random variables and the Binomial random variable are the same:

The last line is a moment generating function for the Binomial random variable with mean **np** and variance **np(1-p)**. Using this knowledge, we can then look at the probabilistic argument for consistency. Unfortunately, using the probabilistic statement directly is a challenge as we also need to understand the distribution of the absolute value of the estimator. That’s something I would never attempt for this problem. Instead, we rely on a well-known probabilistic relationship, called the **Chebyshev Inequality**.

The Chebyshev Inequality is a relationship that **bounds** a probabilistic statement in a particular form:

This is a particular form of the Markov Inequality, but allows us to identify convergence through the use of the variance associated with the underlying variable of interest. Therefore, writing the probabilistic argument for convergence, we see:

Applying the limit (increase in sample size), we see that the result goes to zero! Therefore, our estimator is indeed consistent!

Consistency is a **limit-based argument. **This means that it’s a theoretical value that will never be achieved in practice. To this end, we identify that our estimator indeed converges, and we are given some guidance as to **how well it converges**, thanks to the Chebyshev Inequality.

One way to interpret this relationship is that **epsilon** serves as a bound on the variance; and in turn, on the deviation of our analytic about the true underlying parameter of interest. We see this directly in the first line of the consistency proof for a coin flipping example. For argument’s sake, suppose the coin is **fair**; meaning the probability of obtaining a heads is one-half. Further suppose we are alright with obtaining variational error of one-percent. Then, the sample size required to ensure that we have these conditions met say 95% of the time is

which is **500**.

This means we require 500 flips of the coin to ensure that our variance is within 1% at 95% probability. Taking this a step further, this translates to have 10% or more error on the estimator roughly 5% of the time… Yikes.

Let’s consider this from another context…

We come back to our three point shooting argument before. Instead this time we look at it from the shooter’s perspective. The analytic question here is **“How well does my player shoot from the perimeter?” **If we see a player shoot 37% from beyond the arc, does that mean they are a 37% shooter?

Surprisingly, there has been little performed in this field. Darryl Blackport provided a quick treatise in reliability theory four years ago that involved the Kuder-Richardson 21 (KR-21) metric. For a while, a famous interview question from teams involved the dreaded “predict the three point percentage of every player in the league” which is, effectively an exercise in futility if you’re forced to get within 1 percentage point of truth. Over the previous few years, the rise of **shot quality **metrics have popped up to understand the **quality** of a shooter, which in turns leads to **eFG+** calculations. However, this categorizes decision making first, and then relies on the same noisy statistic (field goal percentage from the perimeter) in measuring capability.

So let’s take a look at the KR-21 methodology.

The Kuder-Richardson 21 metric is a psychometric-based reliability measure to analyze the “quality” of a test given to students. The goal of the metric is to identify how **consistent** a test. The original application, from Kuder and Richardson’s 1937 paper, is to identify if two tests applied to the same student population are of equal difficulty. As such, the paper starts with a single test of many questions splits the test questions in half (at-random), treats them as two separate tests, and then computes the cross-correlation matrix of the test with **n** questions. The resulting cross-correlation score is called KR-1; the first equation of Kuder-Richardson.

The remainder of the paper introduces different scenarios and slowly develops a statistical framework for understanding the comparative quality of test questions. It is effectively a permutation test that ultimately results in an analysis of variance (ANOVA) by the time we reach KR-21.

The KR-21 equation is given by:

Here, **sigma** is the standard deviation of the test scores for each student and **p** is the proportion of students getting a single test item correct. Notice that the term **np(1-p) **is lingering in the equation. This is due to the fact that each question is been as a Bernoulli random variable and every test question is assumed to be of equal difficulty (and independent of all other test questions)!

Taking this a step further, since the Binomial distribution is now modeling test scores, we treat this as a basic regression problem and the resulting variance is a **sum-of-squares for error** while the **sigma** terms identify a **total-sum-of-squares. **Then we have:

which is indeed the ANOVA equivalent!

Treating the KR-21 value as an ANOVA-like quantity, we effectively have an R-square calculation. Under R-square conventions, commonly the value of .7 is used as a “strong” value of correlation. Now to perform a KR-21 test, the challenge is to treat each player as a “student” who takes an “examination” of three point attempts. Ideally, we set the “number of questions” to be the number of three point attempts to be **n**. Then, for a collection of players who have taken **n** three point attempts, we compute the population variance of the players and the mean number of attempts across all players.

Starting at a small **n**, say 50, we collect all players across the league who have attempted 50 attempts and compute the KR-21 reliability number. If this number is too small (below 0.7), we simply increment **n** and repeat the study.

One of the unspoken challenges with a reliability measure such as KR-21 is that we may obtain a negative reliability score. For example, let’s generate a sample of fifty shooters that each take 100 3PA’s. Suppose every 3PA is an IID Bernoulli random variable. Using rows as players and columns as 3PA, we obtain a chart that looks like this:

The green column is the number of made 3PA by that player. The yellow row is the number of 3PA made in that attempt number. By computing the SSE component from yellow, we obtain a value of 24.4976. By computing the SST component from green, we obtain a value of 17.6006. **This leads to a KR-21 score of -0.3958.**

Why did this happen? First of all, this is an okay result. A negative reliability score only indicates weak-to-no correlation between test items and users. Specifically, it doesn’t identify “equally difficult” problems; but rather yields “noisy” questions that are randomly solved. In the context of three point attempts, this would suggest all makes are completely random. Which, by definition of our exercise is exactly what had happened.

Now, if I change **p=0.35**, which was the league average for the 2018-19 NBA season, we see the exact same thing happen. This indicates that ordering every single player’s 3PA attempts matter significantly. In fact, we apply a MCMC simulation of KR-21 scores using the above set-up to identify the distribution of possible KR-21 scores:

What this shows, for something along the lines of Blackport’s (and others in the Baseball community) analysis is that **shooters continue shooting and others don’t**. To be able to obtain a positive reliability score, shooters indeed have tendencies and they are picked up on within the KR-21 test. And once they are keyed in on, a value of **n** to nail down a high reliability number is approximately 750.

More importantly is that this shows that perimeter shooters scoring are **not random events**. Instead, they are indeed correlated scorers that have some frame of rhythm. If they do not, then a value of .7 reliability is **never attainable** except by random chance. Which, as you can see above, has exceptionally small probability.

So let’s go back to the Bernoulli coin flip problem. Instead of a coin, if we model a three point attempt as a Bernoulli process, we obtain the same probabilistic argument. Now suppose, using the worst case scenario of **p=.05** (worst case means highest variance!), we note that 500 3PA attempts are required to nail down a 95% probabilistic true value with **plus-or-minus 10% error. **That’s incredulous.

If we impose a **1% error**, we obtain instead require **50,000 attempts**. Which is much less optimistic than the 750 attempts noted before.

No instead of the worst case scenario, we have the **league-average** of 35.5%, we (under the Bernoulli assumption) require **45,795 attempts** to get within one percent error of truth at the 95% probabilistic level.

Leveraging the 750 number, we find that at league average levels, the actual margin of error associated with 750 attempts (bounded by probability) is really **1.8%**. This is indeed a sweet-spot and reinforces the results obtained by Blackport from roughly five years ago. What this tells us is that there are indeed trends in shooting, but they are not strong as they are effectively within the variance of a Bernoulli process.

To this point, we showed that three point percentages have weak trends, but can be modeled loosely as a Bernoulli random process. What this really tells us is that shooters attempt to **optimize their perimeter scoring chances when they decide to shoot**. This means attempts are not independent. Nor are they truly identically distributed. Furthermore, it’s difficult to obtain **tight** confidence regions on the true, underlying perimeter shooting percentage; which is why we see players fluctuate in rankings through the years.

To this end, there’s an underlying model for not only when shooters make attempts, but also for **when they take attempts**. At this point, developing a hierarchical model for the basic **frequency-and-efficiency** analysis. This way we can being to understand the player’s underlying decision making tendencies, in an effort to better understand their true underlying perimeter shooting capabilities.

In effect as Michael Scott once put it: “You miss 100% of the shots you don’t take. – Wayne Gretzky”

But as the moral of the story: For every introduced analytic, there must be an adequate understanding of the variational properties related to the game. After all, the goal is to always get the signal above the noise.

]]>For instance, let’s consider effective field goal percentage. The **Golden State Warriors** have posted a .558 eFG% while limiting their opponents to a .518 eFG%. While this by far the best eFG%; the differential (+.041) is only good for second in the post-season, behind the **Milwaukee Bucks’ **+.056. It’s no wonder both teams are deep into the playoffs as they are outscoring their opponents at such high rates. The second best eFG% in the post-season has been posted by the **Houston Rockets** at .527 with a positive differential at .038; third best in the post-season. Effectively, these are the teams that cannot be “out-shot” in games. Instead, alternative measures must be taken.

Taking a closer look at the Rockets-Warriors series, the Rockets apparently defeated the Warriors in almost every category of the Four Factors:

Here, we see that Houston indeed won three of the four categories, but lost the series two games to four. As every game was decided by **two possessions or less** there are no “aggregation biases,” such as a blowout win compensating for 2-3 losses. What this series ultimately came down to was the **distribution of turnovers**. More specifically, the **value of a turnover** was much greater in this series than the values for the other three categories.

As a baseline, Basketball Reference posited that both the Warriors and Rockets played 579 offensive possessions, resulting in offensive ratings of 115.7 and 113.8, respectively. Using this baseline, we value the **“average possession” **as 1.157 points for the Warriors and 1.138 points for the Rockets. If we look at the turnover battle, the only category the Rockets lost, Houston turned the ball over **98 times** (including 11 shot clock violations) compared to Golden State’s **83 turnovers**. The latter of which contains zero shot clock violations.

As an average, the Rockets gave up an extra 2.5 possessions per game off the turnover; but this does not account for the “4-6 points per game” lost. Using the baseline, this amounts to only about **2.78 points of differential**. Houston won every other category… so where does the remainder of the differential come from?

A way to break down the value of a turnover is to look at the difference between a “live ball” and “dead ball” turnover. To start, a **live ball** turnover is when a defense is able to immediately move into transition without any stoppage of play. The most common live ball turnover is an errant pass that leads to a steal. **Every live ball turnover must have a steal credited to a defender**. Conversely, a **dead ball **turnover is when the defense’s transition is briefly interrupted by a stoppage in play. **Every dead ball turnover must have an in-bounding pass to initiate transition**.

From a psychological stand-point, live ball and dead ball turnovers can bring about drastic effects on transition defense. For instance, a live ball turnover tends to lead to a scrambling **recovery **defense. As the play is “live” a defense has much less time to “set” than usual. However, a dead ball turnover can lead to bickering between teammates, between opponents, and between players and referees; causing a disruption in communication on the ensuing possession. For instance, a bad pass out of bounds may lead to a passer to voice a grievance to their teammate. For the brief moments this occurs, a transitioning offense may be running a designed attack such as a **Pistol **or a **Pin-Down ****Floppy **to pick-apart the distracted, and potentially frustrated, defenders.

Due to these mechanical natures (response time, psychological effects, etc.), the value of a turnover differs from team to team. For the Houston – Golden State series, here’s how the type of turnovers looked:

We see that Golden State had a tendency to turn the ball over live for **57.8% **of their turnovers! Compare this to Houston’s much lower **44.9%**, and we see that at least Houston gives themselves much more time to set on defense; as a non-substitution in-bounding typically takes between 2 and 8 seconds.

When Golden State turned over the ball live, Houston flourished, posting a 129 offensive rating. However, in dead ball turnover situations, Houston dropped significantly, even falling below their baseline rate of 113.8 with a rating of 109:

Compare this to Golden State’s transitions off of turnovers, and we find that their numbers increased in every case:

What this meant was that while Houston would punish the Warriors for live ball turnovers, if Golden State could protect the ball just enough and ensure the Rockets kept pace with them, Golden State would not just win the turnover battle, **but turn it into enough of a win to compensate losing the other three categories most associated with winning.**

Case in point: Houston’s turnovers cost them on average 3.27 points per game; more than one possession in two possession games.

While we presented an argument that turnovers were a significant factor in the Houston – Golden State series, we need to come full circle and identify that the point of this exercise is to show the **value of a turnover** and how it can sway games. In fact, the team that won the turnover battle went on to **lose four games in the series!**

In fact, teams that won the offensive rebounding battle went 5-1 in the series. Teams that won the effective field goal percentage battle went 5-1 in the series. Teams that won the free-throw rate battles went 2-4 in the series.

In fact, the story of Game One was offensive rebounding and Golden State’s control of the offensive glass.

In Game Two, Houston improve on the glass greatly (from .099 in Game One to .270 in Game Two), but the weak-side pin down action to open weak-side rebounding for the Warriors kept going strong, as they too improved their offensive rebounding numbers from .258 to .367. While this closed the gap substantially, Houston gave up 20 points on possessions following a turnover; 13 on live ball turnovers. In fact, Golden State started the game scoring **twelve of their first fourteen points on possessions after turnovers**.

In Game Three, Houston dominated the offensive glass much like Golden State did in Game One. In Game Four, Houston continued this trend. Despite losing the turnover battle in both games, by limiting their TOV% to approximately 11%, Houston managed to keep Golden State at bay when it came to increasing their points per possession.

Game Five and Game Six saw the points per turnovers take a jump. In Game Five, the Warriors used a mix of offensive rebounding an transition off turnovers to take the narrow win. In Game Six, Golden State scored **35 points **off of **17 turnovers** for an outrageous 2.06 points per turnover.

Throughout the playoffs, it has not been the Warriors who have punished teams for turning over the ball. It’s been the **Toronto Raptors**. Through their first fifteen games, the Raptors have netted the largest turnover differential in the post-season with **a +49 turnover differential**. While the entirety of the differential has come at the hands of the Orlando Magic and the Philadelphia 76ers [they are currently losing the turnover battle 40-43 to Milwaukee after three games], the Raptors need to continue their turnover domination in an effort to stay afloat in a challenging Eastern Conference Finals.

As a similar baseline, Toronto has an offensive rating of 106.6 with a defensive rating of 102. This translates to 1.066 points per offensive possession and 1.020 points per defensive possession. However, whenever Toronto generates a turnover, much like in the case of the Houston Rockets, their opponents **increase their scoring**:

The disparity of the live ball and dead ball turnovers are outrageous. This is due to the duration of time and plays allowed after a turnover. For instance, the average duration of a possession after a Toronto live ball turnover is 7.3 seconds. For a dead ball possession, Toronto’s opponents slow down their offense to a 15.2 second pace.

What this indicates is that Toronto’s transition defense is sub-optimal when it comes to turnovers. Specifically, the guards are unable to retreat as players such as Serge Ibaka and Kawhi Leonard have actually managed to dissuade attempts on live ball situations.

if we overlay the distribution of (relative) points on top of the duration of the plays, we find that there’s a “sweet spot” for teams to score after a Toronto turnover.

In this case, the first 2-5 seconds yields points for a Toronto opponent. These are live ball turnovers that turn into fast-break layups and threes. In fact, opponents are shooting 41-for-55 for two-point field goals after a live-ball Toronto turnover.

On the flip side, the Raptors perform a little weaker in transition than their opponents. Despite dominating the turnover battle, the Raptors have a lowly 90.9 offensive rating when they create a dead ball turnover on defense. Much of this is due to the slower pace of play the Raptors play at after a dead ball turnover, compared to their counterparts.

Despite the Raptors ending up with an average possession duration 14.6 seconds, the probability of a possessions taking longer than their counterpart is close to 60%. This is due to a significant bump at 1-2 seconds due to fouling for free throws (“Hack-a-Player”). Therefore we tend to expect, after a dead ball turnover, the Raptors take approximately 15.2 seconds per possession compared to 12.9 seconds of their opponents.

If we overlay the (relative) points scored, we obtain a slightly different picture than their opponents:

As the Milwaukee Bucks and Toronto Raptors are leading the playoffs in Defensive Rating, the teams could not be any more different in approaches to their defense. The Bucks dominate the glass on the defensive end, limiting opponents to only 16.4% OREB%. For the roughly 60% of misses an opponent take in the course of a game [which is approximately 55 misses a game], their opponents are lucky to see more than **NINE **second chance opportunities a game. Similarly, the Bucks play Wisconsin-brand basketball by limiting fouling on field goal attempts; settling in third for the post-season with a .194 free throw rate. In comparison, the Raptors are at 22.7% OREB% and .233 FTr. Playing the point-value game, we would find the Bucks to be 3-4 point favorite based on these stats alone. Combine this with Milwaukee’s +.02 advantage in eFG% (.526 to .507) and the odds stack even more in favor of the Bucks.

It is TOV% where the Raptors are a +3% over the Bucks. Which means they should expect roughly 3 more turnovers a game, which if played as live-ball turnovers, could result in an extra 4-5 points per game. And it’s here that Toronto makes its mark.

Much like the Houston-Golden State series, the Milwaukee – Toronto series is going to be (and is indeed being) dictated by who can control the four factors better. While the teams are evenly aligned point-wise, depending on your viewpoint, either team has a recipe for success: Milwaukee needs to limit turnovers and play their brand of basketball. Toronto needs to continue the defensive effort and focus on keeping Milwaukee out of the paint; thereby reducing each of the Bucks’ effective field goal percentage, attempts at the foul line, and chances at offensive rebounding.

Of course, as the Los Angeles Clippers have shown us twice, having hot shooting nights are always a bonus, too. But we can’t count on that to happen consistently. Effectively, one of these teams have to blink.

So far it has been Toronto.

Over the first three games of the Eastern Conference Finals, Milwaukee has controlled every single Four Factor category. Despite Toronto’s ratcheted defense affecting Milwaukee’s eFG%; Milwaukee has continued to control the glass, and more importantly, **limit turnovers**. Despite Toronto picking up 23 live ball turnovers over three games against Milwaukee, they have only been able to convert them into 29 points (1.26 points per turnover). Compare this to Milwaukee’s 28 live ball turnovers generated off the Toronto offense, and their resulting 40 points (1.43 points per turnover), and the Raptors’ turnover edge has been effectively eradicated this series.

Only in Game Three has Toronto managed to win any Four Factor category: TOV% and eFG%. By playing their style of defense and managing to knock down the Bucks’ eFG%, the Raptors managed to make it to overtime and wait out a Giannis Antetokounmpo foul-out before taking over and winning the game.

Despite winning the turnover battle in Game Three .130 to .146, Toronto generated 14 points on 11 Live Ball turnovers (1.27 points per turnover) and 7 points on 9 Dead Ball turnovers (0.78 points per turnover). Comparing this to Milwaukee scoring 16 points on 14 Live Ball turnovers (1.14 points per turnover) and 0 points on 3 Dead Ball turnovers, we see Toronto eked out only a four point advantage over the number one seed.

Compare this to Milwaukee’s 9 points over 6 Live Ball turnovers and 10 points over 8 Dead Ball turnovers, and this can be seen as a marked improvement for the Raptors transition defense on turnovers between Games Two and Three; despite only getting this game to overtime.

Good defenses take away scoring chances from opponents. Defensive rebounds erase an opponent’s chances at Second-Chance points. Turnovers tend to take away those field goal attempts in the first place. However, when a turnover occurs, chaos ensues.

Some teams race down the court to capitalize on defenses attempting to sort themselves out. Some teams use the transition to work into their rhythm and start their offense with less pressure. Some teams just simply overthink, either taking a low quality field goal attempt of turning the ball over.

It is clear that live ball turnovers are much more detrimental to a team than dead ball turnovers. We also see it’s a way to significantly increase the pace of the game while increasing offensive rating; as we’ve seen possessions run at average 7-10 seconds faster than normal possessions with offensive ratings of 120-140 points.

Teams can thrive on transitioning the turnover. It’s a great equalizer. But only if you can generate the live ball turnover and transition it well.

]]>Two years ago, I posted a basic algorithm that counts every probability of every pick without any trades. This algorithm is able to easily recreate the table we find in Wikipedia, and other sites, when it comes to finding a probability matrix for teams:

Using our aforementioned post, I was able to reconstruct the entire draft lottery algorithm and produce this table within five minutes. Sweet! The code still works! However, these are not the true probabilities for each team thanks to trades made over the previous years. Therefore, other tables that we find on sites like ESPN, HoopsRumors, and even Wikipedia post the incorrect probabilities:

In all cases the trades were either hyperlinked or stuffed within text, forcing the reader to search for context. This season, the trades are rather tame as there are no “pick swap” trades: trades where a team gets the “better” of two picks, contained within the lottery. The closest we get is the Sacramento to Philadelphia/Boston pick swap. Due to this tameness, teams effectively **trade probabilities**. So we can give a pass to the Sacramento Kings having a 1% chance of obtaining the first pick. In reality, it’s **zero **as Philadelphia owns their number one pick.

This is okay, but it requires the reader to search.

But what about Atlanta? Atlanta actually has a **47.02% chance of obtaining the 9th overall pick**. That’s thanks to the Trae Young – Luka Doncic draft night deal. And while Dallas has asterisks next to their odds, it’s Atlanta that doesn’t have any indication.

Similarly, Boston has **two trades** lingering in the draft. They have interesting probabilities floating about the table as well. But that’s not readily apparent either. So let’s incorporate the trades and then update this table. Thanks to Real GM, we are able to turn these trades into code.

From the draft night trade in the 2018 Draft, the Atlanta Hawks managed to move down in the draft in order to allow Dallas to guarantee the rights to Luka Doncic. In order to complete this trade and incentivize Atlanta moving down in the lottery, Atlanta gained a pick-protected lottery pick for this season. That is, if Dallas falls between the 6th and 14th picks, Atlanta gains the Mavericks’ lottery pick. We can represent this code (using the variables from our previous lottery odds post) as:

As a reminder: **remainingProbs** is a **fixed-draw double array** that simply aligns the teams that were not selected in the first four picks. There are a total of ten of these positions: picks 5 through 14. Since pick 5 is protected, we count the last nine spots.

On January 12, 2015 a three-team trade involving five players and three draft picks took place between the Boston Celtics, Memphis Grizzlies, and New Orleans Pelicans. In this trade, Memphis sent Tayshaun Prince to Boston and Quincy Pondexter to New Orleans. In return, New Orleans sent Russ Smith and a traded player exception to Memphis and Boston sent Jeff Green (the centerpiece of the deal) to Memphis. In the process, Boston also obtained Austin Rivers from New Orleans.

To soften the loss of Green, Memphis included a protected future first round pick to Boston. Similarly, to help address the loss of Rivers from New Orleans Memphis included a second round pick to the Pelicans. This season, that first round pick comes into play as Memphis is slotted as the 8 team; with highest probability of **keeping their pick**. Despite this, Boston still has a significant chance of nabbing the Memphis pick, provided Memphis hits that unlucky **42.6% chance of getting the 9th, 10th, or 11th pick in the draft**.

Due to the straightforward nature of the trade, we can easily code this as:

Known as the “Stauskas Trade” back on July 9, 2015, the Sacramento Kings shipped Nik Stauskas, Carl Landry, Jason Thompson, and two future first round picks for the rights to Arturas Gudaitis and Luka Mitrovic. The move for the Kings was essentially to clear cap space for the 2015-16 NBA season in an attempt to Rajon Rondo, Marco Belinelli, and Kosta Koufos. For the future first round draft picks, a series of pick protections were placed on the 2017 and 2018 draft picks. If those protections were satisfied for Sacramento, then Sacramento’s 2019 first round draft pick went to Philadelphia.

In those years, theKings managed to keep their picks.

Despite this…

…on June 19th, 2017 the Philadelphia 76ers traded their rights to Sacramento’s 2019 first round pick to the Boston Celtics when they made the move from 3rd in the 2017 draft to 1st. It was part of a conditional trade where Boston gained the 2019 Sacramento pick as long as the Los Angeles Lakers’ 2018 Draft pick landed between 2nd and 5th. That draft pick landed 10th and Boston become owner of Sacramento’s 2019 Draft Pick, protected as number one.

To this end, we code this trade as:

Applying these trades as a Python script, we are able to generate the probabilities for every team in the draft of obtaining a lottery pick:

Here, we see Sacramento is completely wiped off the map. Here we also see the updated probabilities for Atlanta as well as the illustrated potential of Memphis possibly losing their pick.

This year is a relatively straightforward year when it comes to lottery trades. But at least know how to handle them within our code, as we can visually see everyone’s probabilities. Come May 14th, you now know the true probabilities for your team.

Over the recent year or so, I’ve been touched upon by two NBA Analytics team Directors about this particular problem: constructing NBA lottery probabilities. The reason is this: Both teams used this problem as an applicant test problem to better understand the applicant’s thought process and coding capabilities. In both instances, reviewers noticed an all-too eery duplication in vastly different applicants. The reason? **Code was copied here and passed off as their own**. Both times I was given evidence. Not cool.

The purpose of this site is to introduce concepts and some basic coding principles to help folks learn **the basics**. Posts with code are meant for folks with remedial-or-beginner capabilities in coding to give them a nudge in testing out ideas on their own. Posts without code are for the more sophisticated readers to understand the thought process and theory; even to just open a small discussion.

However, if this trend continues, the amount of code that appears on the site or becomes available by other means will start to disappear rapidly. So, for benefit of the people that enjoy this site, just **be cool** and **do it on your own**.

So let’s break down what makes a -3.5 rating…

Recall that net rating is calculated by

This is just the difference of offensive and defensive ratings. This is merely a linear stretching of **points per possession** to per 100 possessions, to give the effect of **if these players played a whole game at this uniform consistency**. And that’s okay; it’s mainly there for readers to digest the information in an easier manner.

Rarely does a **rotation** play more of one type of possession over another; particularly within a four game series. For starters, we typically see three-to-four **stints** per game for a starting rotation. Rake that over 4 games, and we expect the starters to play **12-16 stints**. Therefore at its worst, possession difference is would be 32 possessions. In reality, its much closer to zero.

Using these facts, we can begin to construct what a -3.5 rating really means: a differential of **-.035 points per possession**. What does this number actually mean? This actually means **every 28 possessions played, the Boston starters needed and extra offensive possession to match what their defense was giving up**. Does this mean the Boston starters were outscored? Without extra information, possibly.

**Example: **Boston starters have 114 offensive possessions to Indiana’s 109 offensive possessions with a final score of 110 – 109 leads to the starters outscoring their competition while maintaining a **-3.5 net rating**.

While this may not be the reality of the Boston starters; the discussion here is to not fall into the trap of **comparing ratings without context**.

A bigger challenge with ratings is the **randomness** of it all. Over the past couple years, different methods of **smoothing** have been used to reduce the noise in ratings. One of the most-used forms is **luck-adjusted rating**. Even this is just a regression methodology at the **zeroth-order level **with a little first-order effects mixed in. Other models such as **Adjusted Plus-Minus** and all of its various add-ons/follow-ons/hierarchical or Bayesian updates/etc. are again just regression methods applied at the **first-order level**. Interaction methods developed by guys like myself or a couple of my past collaborators (and teams) are still just again regression methods applied at the **higher-order levels**. The point is, every single of of these methods treat stints as observations and then apply the smoothing at the response level. Every single one of the methods above are a marked improvement over citing raw net ratings but even they too fail at understanding the randomness of an actual stint.

Let’s take a deep look at a single stint from the Boston-Indiana series.

At the start of game three, the Celtics lit up the floor by scoring on 12 of their first 18 possessions to race out to a 29-18 lead. Buoyed by five three point field goals, Boston maintained an offensive rating of **161.11** for their first stint. In contrast, the Pacers spent half their possessions turning the ball over through bad passes and missed field goals, only converting 44% of their possessions into field goals en route to 18 points; an offensive rating of **100.00**. The differential suggests that the Celtics had a net rating of 61.11; indicating the starters were vastly superior to their opponents. A little troubling for a teams that ended up with a **-3.5 **when all was said and done.

When all was said and done, the distribution of points per possession are given as

- Boston Celtics
**0 points:**6 possessions**1 point:**0 possessions**2 points:**7 possessions**3 points:**5 possessions

- Indiana Pacers
**0 points:**9 possessions**1 point:**1 possession**2 points:**7 possessions**3 points:**1 possession

Let’s play a little game with this “training data.”

By supposing the distribution of points scored per possession are given above by the Celtics-Pacers stint, we can simulate the 18 possession stint over and over to understand the randomness of the data. Of course, we assume there is noise on the above data, so we will apply a basic Bayesian filter for multinomial data. Furthermore, we **won’t even apply luck adjustments **to bias everything we can towards Boston.

**The idea here is to look at a net rating an understand, given the randomness of scoring, how noisy that rating really is.**

Here, we apply a simple algorithm that samples the distribution of points scored from the multinomial-Dirichlet model trained by the Celtics’ +61.11 net rating.

p1 = [0.3182, 0.0455, 0.3636, 0.2727] p2 = [0.4545, 0.0909, 0.3636, 0.0909] scores1 = [] scores2 = [] ratings1 = [] ratings2 = [] netRatings = [] wins = 0. Games = 1000000 for i in range(Games): # Simulate Team 1 score1 = 0. for j in range(18): r1 = random.random() if r1 < p1[0]: # No Points Scored. continue elif r1 < (p1[0]+p1[0]): score1 += 1. elif r1 < (1. - p1[3]): score1 += 2. else: score1 += 3. # Simulate Team 2 score2 = 0. for j in range(18): r2 = random.random() if r2 < p2[0]: # No Points Scored. continue elif r2 < (p2[0]+p2[0]): score2 += 1. elif r2 score2: wins += 1.

Running the simulation, we see that even with this absurd differential, **the Pacers are expected to win more than 5% of these stints! **The probability of a Pacers win under these scoring distributions are **5.2%. **Now this doesn’t mean that when Boston posts up a +61.11 net rating, the Pacers will win 5% of the time. This means **when Boston plays like a +61.11 net rating team, the Pacers are still expected to win more than 5% of the time**.

Therefore, the net rating doesn’t indicate that Boston is 61 points better, it’s merely a **symptom** of whatever the true net rating is. In fact, let’s take a look at the distribution of offensive ratings:

We see there is significant overlap in the two distributions. In fact, to illustrate the symptom effect described above, Indiana played at **72.7 offensive rating** but yet they latched onto a 100.00 offensive rating. Similarly, Boston’s distribution of scoring reflects a **131.84 offensive rating** despite the 161 that was posted. What this shows is, the teams are symptomatic of “**luck.**”

**(Note: **For those who are fully aware of statistical analysis and resulting **continuity correction **being applied by the Dirichlet-Multinomial model above, luck is being defined as points over/under expectation, inflated at small probability regions. In this case, it’s free throws and three point field goals; hence the drops just noted.**)**

The more important takeaway is that the style of play from Boston led to a **larger variance** in play. That is, their ratings have a standard deviation of **28 points**. Compare this to the Pacer’s much smaller **20 points**, and we see that ratings follow a **heteroskedastic process**.

With that in mind, we can look at the net ratings for the Boston starters:

What ends up happening is the phenomenon that beats up most regression analyses on ratings: **skewness**. Here, we can actually see the skewness as the distribution is left-tailed. In fact, due to randomness we see that the game **with a given true net rating of +61.11** could **produce a net rating of** **-100**.

The point here is, a **-3.5 net rating **is relatively meaningless. It’s just another descriptive number that needs **a lot more context**. Negative net ratings still produce wins. That’s a problem when trying to understand how well a unit works together.

Furthermore, even if a very high net rating is used as truth, we can still get wildly varying net ratings.

In fact, a former Sloan presenter one told me that **“Six possessions are enough to invoke Central Limit Theorem**” which I’ve never seen as true. Above is yet another example where we even triple the size and still get a heavy skewness in the results using the tests derived from Columbia University , skewness for this sample is** strong **with **p-value **4.38 x 10(-29) for **one million samples**.

Lastly, ratings are heteroskedastic. Meaning every regression model poorly reduces noise if heteroskadasticity is not taken into account.

More importantly, the argument is to identify the **ratings **are **symptoms** of other phenomenon. Instead, we should focus on **transactional interactions** such as **actions and scenarios that feed into points per possession from possession to possession**. This isn’t to suggest using a singular point per possession, but rather develop an artifical-intelligence-based approach to **understanding the decision making process of a collective unit** **given the state of the gaming system**.

Currently, several teams are approaching this venture. Some is developed on play-to-play analysis such as live and dead ball turnovers thanks to Mike Beuoy and Seth Partnow. Some is developed by tracking such as trying to quantify actions as competing risk models thanks to Dan Cervone. These are just a handful of examples in existence, and even then they struggle to maintain fidelity to the game; a fact of the ever changing landscape of how points are scored.

Until we are able to represent the **stochastic partial differential equation** that defines basketball, we are left nibbling at its edges with summary statistics, regression models, and partial “solutions.” And that’s okay for now.

Just remember that a 61.11 positive net rating match-up is expected to lose over 5% of the time.

]]>