For instance, let’s consider effective field goal percentage. The **Golden State Warriors** have posted a .558 eFG% while limiting their opponents to a .518 eFG%. While this by far the best eFG%; the differential (+.041) is only good for second in the post-season, behind the **Milwaukee Bucks’ **+.056. It’s no wonder both teams are deep into the playoffs as they are outscoring their opponents at such high rates. The second best eFG% in the post-season has been posted by the **Houston Rockets** at .527 with a positive differential at .038; third best in the post-season. Effectively, these are the teams that cannot be “out-shot” in games. Instead, alternative measures must be taken.

Taking a closer look at the Rockets-Warriors series, the Rockets apparently defeated the Warriors in almost every category of the Four Factors:

Here, we see that Houston indeed won three of the four categories, but lost the series two games to four. As every game was decided by **two possessions or less** there are no “aggregation biases,” such as a blowout win compensating for 2-3 losses. What this series ultimately came down to was the **distribution of turnovers**. More specifically, the **value of a turnover** was much greater in this series than the values for the other three categories.

As a baseline, Basketball Reference posited that both the Warriors and Rockets played 579 offensive possessions, resulting in offensive ratings of 115.7 and 113.8, respectively. Using this baseline, we value the **“average possession” **as 1.157 points for the Warriors and 1.138 points for the Rockets. If we look at the turnover battle, the only category the Rockets lost, Houston turned the ball over **98 times** (including 11 shot clock violations) compared to Golden State’s **83 turnovers**. The latter of which contains zero shot clock violations.

As an average, the Rockets gave up an extra 2.5 possessions per game off the turnover; but this does not account for the “4-6 points per game” lost. Using the baseline, this amounts to only about **2.78 points of differential**. Houston won every other category… so where does the remainder of the differential come from?

A way to break down the value of a turnover is to look at the difference between a “live ball” and “dead ball” turnover. To start, a **live ball** turnover is when a defense is able to immediately move into transition without any stoppage of play. The most common live ball turnover is an errant pass that leads to a steal. **Every live ball turnover must have a steal credited to a defender**. Conversely, a **dead ball **turnover is when the defense’s transition is briefly interrupted by a stoppage in play. **Every dead ball turnover must have an in-bounding pass to initiate transition**.

From a psychological stand-point, live ball and dead ball turnovers can bring about drastic effects on transition defense. For instance, a live ball turnover tends to lead to a scrambling **recovery **defense. As the play is “live” a defense has much less time to “set” than usual. However, a dead ball turnover can lead to bickering between teammates, between opponents, and between players and referees; causing a disruption in communication on the ensuing possession. For instance, a bad pass out of bounds may lead to a passer to voice a grievance to their teammate. For the brief moments this occurs, a transitioning offense may be running a designed attack such as a **Pistol **or a **Pin-Down ****Floppy **to pick-apart the distracted, and potentially frustrated, defenders.

Due to these mechanical natures (response time, psychological effects, etc.), the value of a turnover differs from team to team. For the Houston – Golden State series, here’s how the type of turnovers looked:

We see that Golden State had a tendency to turn the ball over live for **57.8% **of their turnovers! Compare this to Houston’s much lower **44.9%**, and we see that at least Houston gives themselves much more time to set on defense; as a non-substitution in-bounding typically takes between 2 and 8 seconds.

When Golden State turned over the ball live, Houston flourished, posting a 129 offensive rating. However, in dead ball turnover situations, Houston dropped significantly, even falling below their baseline rate of 113.8 with a rating of 109:

Compare this to Golden State’s transitions off of turnovers, and we find that their numbers increased in every case:

What this meant was that while Houston would punish the Warriors for live ball turnovers, if Golden State could protect the ball just enough and ensure the Rockets kept pace with them, Golden State would not just win the turnover battle, **but turn it into enough of a win to compensate losing the other three categories most associated with winning.**

Case in point: Houston’s turnovers cost them on average 3.27 points per game; more than one possession in two possession games.

While we presented an argument that turnovers were a significant factor in the Houston – Golden State series, we need to come full circle and identify that the point of this exercise is to show the **value of a turnover** and how it can sway games. In fact, the team that won the turnover battle went on to **lose four games in the series!**

In fact, teams that won the offensive rebounding battle went 5-1 in the series. Teams that won the effective field goal percentage battle went 5-1 in the series. Teams that won the free-throw rate battles went 2-4 in the series.

In fact, the story of Game One was offensive rebounding and Golden State’s control of the offensive glass.

In Game Two, Houston improve on the glass greatly (from .099 in Game One to .270 in Game Two), but the weak-side pin down action to open weak-side rebounding for the Warriors kept going strong, as they too improved their offensive rebounding numbers from .258 to .367. While this closed the gap substantially, Houston gave up 20 points on possessions following a turnover; 13 on live ball turnovers. In fact, Golden State started the game scoring **twelve of their first fourteen points on possessions after turnovers**.

In Game Three, Houston dominated the offensive glass much like Golden State did in Game One. In Game Four, Houston continued this trend. Despite losing the turnover battle in both games, by limiting their TOV% to approximately 11%, Houston managed to keep Golden State at bay when it came to increasing their points per possession.

Game Five and Game Six saw the points per turnovers take a jump. In Game Five, the Warriors used a mix of offensive rebounding an transition off turnovers to take the narrow win. In Game Six, Golden State scored **35 points **off of **17 turnovers** for an outrageous 2.06 points per turnover.

Throughout the playoffs, it has not been the Warriors who have punished teams for turning over the ball. It’s been the **Toronto Raptors**. Through their first fifteen games, the Raptors have netted the largest turnover differential in the post-season with **a +49 turnover differential**. While the entirety of the differential has come at the hands of the Orlando Magic and the Philadelphia 76ers [they are currently losing the turnover battle 40-43 to Milwaukee after three games], the Raptors need to continue their turnover domination in an effort to stay afloat in a challenging Eastern Conference Finals.

As a similar baseline, Toronto has an offensive rating of 106.6 with a defensive rating of 102. This translates to 1.066 points per offensive possession and 1.020 points per defensive possession. However, whenever Toronto generates a turnover, much like in the case of the Houston Rockets, their opponents **increase their scoring**:

The disparity of the live ball and dead ball turnovers are outrageous. This is due to the duration of time and plays allowed after a turnover. For instance, the average duration of a possession after a Toronto live ball turnover is 7.3 seconds. For a dead ball possession, Toronto’s opponents slow down their offense to a 15.2 second pace.

What this indicates is that Toronto’s transition defense is sub-optimal when it comes to turnovers. Specifically, the guards are unable to retreat as players such as Serge Ibaka and Kawhi Leonard have actually managed to dissuade attempts on live ball situations.

if we overlay the distribution of (relative) points on top of the duration of the plays, we find that there’s a “sweet spot” for teams to score after a Toronto turnover.

In this case, the first 2-5 seconds yields points for a Toronto opponent. These are live ball turnovers that turn into fast-break layups and threes. In fact, opponents are shooting 41-for-55 for two-point field goals after a live-ball Toronto turnover.

On the flip side, the Raptors perform a little weaker in transition than their opponents. Despite dominating the turnover battle, the Raptors have a lowly 90.9 offensive rating when they create a dead ball turnover on defense. Much of this is due to the slower pace of play the Raptors play at after a dead ball turnover, compared to their counterparts.

Despite the Raptors ending up with an average possession duration 14.6 seconds, the probability of a possessions taking longer than their counterpart is close to 60%. This is due to a significant bump at 1-2 seconds due to fouling for free throws (“Hack-a-Player”). Therefore we tend to expect, after a dead ball turnover, the Raptors take approximately 15.2 seconds per possession compared to 12.9 seconds of their opponents.

If we overlay the (relative) points scored, we obtain a slightly different picture than their opponents:

As the Milwaukee Bucks and Toronto Raptors are leading the playoffs in Defensive Rating, the teams could not be any more different in approaches to their defense. The Bucks dominate the glass on the defensive end, limiting opponents to only 16.4% OREB%. For the roughly 60% of misses an opponent take in the course of a game [which is approximately 55 misses a game], their opponents are lucky to see more than **NINE **second chance opportunities a game. Similarly, the Bucks play Wisconsin-brand basketball by limiting fouling on field goal attempts; settling in third for the post-season with a .194 free throw rate. In comparison, the Raptors are at 22.7% OREB% and .233 FTr. Playing the point-value game, we would find the Bucks to be 3-4 point favorite based on these stats alone. Combine this with Milwaukee’s +.02 advantage in eFG% (.526 to .507) and the odds stack even more in favor of the Bucks.

It is TOV% where the Raptors are a +3% over the Bucks. Which means they should expect roughly 3 more turnovers a game, which if played as live-ball turnovers, could result in an extra 4-5 points per game. And it’s here that Toronto makes its mark.

Much like the Houston-Golden State series, the Milwaukee – Toronto series is going to be (and is indeed being) dictated by who can control the four factors better. While the teams are evenly aligned point-wise, depending on your viewpoint, either team has a recipe for success: Milwaukee needs to limit turnovers and play their brand of basketball. Toronto needs to continue the defensive effort and focus on keeping Milwaukee out of the paint; thereby reducing each of the Bucks’ effective field goal percentage, attempts at the foul line, and chances at offensive rebounding.

Of course, as the Los Angeles Clippers have shown us twice, having hot shooting nights are always a bonus, too. But we can’t count on that to happen consistently. Effectively, one of these teams have to blink.

So far it has been Toronto.

Over the first three games of the Eastern Conference Finals, Milwaukee has controlled every single Four Factor category. Despite Toronto’s ratcheted defense affecting Milwaukee’s eFG%; Milwaukee has continued to control the glass, and more importantly, **limit turnovers**. Despite Toronto picking up 23 live ball turnovers over three games against Milwaukee, they have only been able to convert them into 29 points (1.26 points per turnover). Compare this to Milwaukee’s 28 live ball turnovers generated off the Toronto offense, and their resulting 40 points (1.43 points per turnover), and the Raptors’ turnover edge has been effectively eradicated this series.

Only in Game Three has Toronto managed to win any Four Factor category: TOV% and eFG%. By playing their style of defense and managing to knock down the Bucks’ eFG%, the Raptors managed to make it to overtime and wait out a Giannis Antetokounmpo foul-out before taking over and winning the game.

Despite winning the turnover battle in Game Three .130 to .146, Toronto generated 14 points on 11 Live Ball turnovers (1.27 points per turnover) and 7 points on 9 Dead Ball turnovers (0.78 points per turnover). Comparing this to Milwaukee scoring 16 points on 14 Live Ball turnovers (1.14 points per turnover) and 0 points on 3 Dead Ball turnovers, we see Toronto eked out only a four point advantage over the number one seed.

Compare this to Milwaukee’s 9 points over 6 Live Ball turnovers and 10 points over 8 Dead Ball turnovers, and this can be seen as a marked improvement for the Raptors transition defense on turnovers between Games Two and Three; despite only getting this game to overtime.

Good defenses take away scoring chances from opponents. Defensive rebounds erase an opponent’s chances at Second-Chance points. Turnovers tend to take away those field goal attempts in the first place. However, when a turnover occurs, chaos ensues.

Some teams race down the court to capitalize on defenses attempting to sort themselves out. Some teams use the transition to work into their rhythm and start their offense with less pressure. Some teams just simply overthink, either taking a low quality field goal attempt of turning the ball over.

It is clear that live ball turnovers are much more detrimental to a team than dead ball turnovers. We also see it’s a way to significantly increase the pace of the game while increasing offensive rating; as we’ve seen possessions run at average 7-10 seconds faster than normal possessions with offensive ratings of 120-140 points.

Teams can thrive on transitioning the turnover. It’s a great equalizer. But only if you can generate the live ball turnover and transition it well.

]]>Two years ago, I posted a basic algorithm that counts every probability of every pick without any trades. This algorithm is able to easily recreate the table we find in Wikipedia, and other sites, when it comes to finding a probability matrix for teams:

Using our aforementioned post, I was able to reconstruct the entire draft lottery algorithm and produce this table within five minutes. Sweet! The code still works! However, these are not the true probabilities for each team thanks to trades made over the previous years. Therefore, other tables that we find on sites like ESPN, HoopsRumors, and even Wikipedia post the incorrect probabilities:

In all cases the trades were either hyperlinked or stuffed within text, forcing the reader to search for context. This season, the trades are rather tame as there are no “pick swap” trades: trades where a team gets the “better” of two picks, contained within the lottery. The closest we get is the Sacramento to Philadelphia/Boston pick swap. Due to this tameness, teams effectively **trade probabilities**. So we can give a pass to the Sacramento Kings having a 1% chance of obtaining the first pick. In reality, it’s **zero **as Philadelphia owns their number one pick.

This is okay, but it requires the reader to search.

But what about Atlanta? Atlanta actually has a **47.02% chance of obtaining the 9th overall pick**. That’s thanks to the Trae Young – Luka Doncic draft night deal. And while Dallas has asterisks next to their odds, it’s Atlanta that doesn’t have any indication.

Similarly, Boston has **two trades** lingering in the draft. They have interesting probabilities floating about the table as well. But that’s not readily apparent either. So let’s incorporate the trades and then update this table. Thanks to Real GM, we are able to turn these trades into code.

From the draft night trade in the 2018 Draft, the Atlanta Hawks managed to move down in the draft in order to allow Dallas to guarantee the rights to Luka Doncic. In order to complete this trade and incentivize Atlanta moving down in the lottery, Atlanta gained a pick-protected lottery pick for this season. That is, if Dallas falls between the 6th and 14th picks, Atlanta gains the Mavericks’ lottery pick. We can represent this code (using the variables from our previous lottery odds post) as:

As a reminder: **remainingProbs** is a **fixed-draw double array** that simply aligns the teams that were not selected in the first four picks. There are a total of ten of these positions: picks 5 through 14. Since pick 5 is protected, we count the last nine spots.

On January 12, 2015 a three-team trade involving five players and three draft picks took place between the Boston Celtics, Memphis Grizzlies, and New Orleans Pelicans. In this trade, Memphis sent Tayshaun Prince to Boston and Quincy Pondexter to New Orleans. In return, New Orleans sent Russ Smith and a traded player exception to Memphis and Boston sent Jeff Green (the centerpiece of the deal) to Memphis. In the process, Boston also obtained Austin Rivers from New Orleans.

To soften the loss of Green, Memphis included a protected future first round pick to Boston. Similarly, to help address the loss of Rivers from New Orleans Memphis included a second round pick to the Pelicans. This season, that first round pick comes into play as Memphis is slotted as the 8 team; with highest probability of **keeping their pick**. Despite this, Boston still has a significant chance of nabbing the Memphis pick, provided Memphis hits that unlucky **42.6% chance of getting the 9th, 10th, or 11th pick in the draft**.

Due to the straightforward nature of the trade, we can easily code this as:

Known as the “Stauskas Trade” back on July 9, 2015, the Sacramento Kings shipped Nik Stauskas, Carl Landry, Jason Thompson, and two future first round picks for the rights to Arturas Gudaitis and Luka Mitrovic. The move for the Kings was essentially to clear cap space for the 2015-16 NBA season in an attempt to Rajon Rondo, Marco Belinelli, and Kosta Koufos. For the future first round draft picks, a series of pick protections were placed on the 2017 and 2018 draft picks. If those protections were satisfied for Sacramento, then Sacramento’s 2019 first round draft pick went to Philadelphia.

In those years, theKings managed to keep their picks.

Despite this…

…on June 19th, 2017 the Philadelphia 76ers traded their rights to Sacramento’s 2019 first round pick to the Boston Celtics when they made the move from 3rd in the 2017 draft to 1st. It was part of a conditional trade where Boston gained the 2019 Sacramento pick as long as the Los Angeles Lakers’ 2018 Draft pick landed between 2nd and 5th. That draft pick landed 10th and Boston become owner of Sacramento’s 2019 Draft Pick, protected as number one.

To this end, we code this trade as:

Applying these trades as a Python script, we are able to generate the probabilities for every team in the draft of obtaining a lottery pick:

Here, we see Sacramento is completely wiped off the map. Here we also see the updated probabilities for Atlanta as well as the illustrated potential of Memphis possibly losing their pick.

This year is a relatively straightforward year when it comes to lottery trades. But at least know how to handle them within our code, as we can visually see everyone’s probabilities. Come May 14th, you now know the true probabilities for your team.

Over the recent year or so, I’ve been touched upon by two NBA Analytics team Directors about this particular problem: constructing NBA lottery probabilities. The reason is this: Both teams used this problem as an applicant test problem to better understand the applicant’s thought process and coding capabilities. In both instances, reviewers noticed an all-too eery duplication in vastly different applicants. The reason? **Code was copied here and passed off as their own**. Both times I was given evidence. Not cool.

The purpose of this site is to introduce concepts and some basic coding principles to help folks learn **the basics**. Posts with code are meant for folks with remedial-or-beginner capabilities in coding to give them a nudge in testing out ideas on their own. Posts without code are for the more sophisticated readers to understand the thought process and theory; even to just open a small discussion.

However, if this trend continues, the amount of code that appears on the site or becomes available by other means will start to disappear rapidly. So, for benefit of the people that enjoy this site, just **be cool** and **do it on your own**.

So let’s break down what makes a -3.5 rating…

Recall that net rating is calculated by

This is just the difference of offensive and defensive ratings. This is merely a linear stretching of **points per possession** to per 100 possessions, to give the effect of **if these players played a whole game at this uniform consistency**. And that’s okay; it’s mainly there for readers to digest the information in an easier manner.

Rarely does a **rotation** play more of one type of possession over another; particularly within a four game series. For starters, we typically see three-to-four **stints** per game for a starting rotation. Rake that over 4 games, and we expect the starters to play **12-16 stints**. Therefore at its worst, possession difference is would be 32 possessions. In reality, its much closer to zero.

Using these facts, we can begin to construct what a -3.5 rating really means: a differential of **-.035 points per possession**. What does this number actually mean? This actually means **every 28 possessions played, the Boston starters needed and extra offensive possession to match what their defense was giving up**. Does this mean the Boston starters were outscored? Without extra information, possibly.

**Example: **Boston starters have 114 offensive possessions to Indiana’s 109 offensive possessions with a final score of 110 – 109 leads to the starters outscoring their competition while maintaining a **-3.5 net rating**.

While this may not be the reality of the Boston starters; the discussion here is to not fall into the trap of **comparing ratings without context**.

A bigger challenge with ratings is the **randomness** of it all. Over the past couple years, different methods of **smoothing** have been used to reduce the noise in ratings. One of the most-used forms is **luck-adjusted rating**. Even this is just a regression methodology at the **zeroth-order level **with a little first-order effects mixed in. Other models such as **Adjusted Plus-Minus** and all of its various add-ons/follow-ons/hierarchical or Bayesian updates/etc. are again just regression methods applied at the **first-order level**. Interaction methods developed by guys like myself or a couple of my past collaborators (and teams) are still just again regression methods applied at the **higher-order levels**. The point is, every single of of these methods treat stints as observations and then apply the smoothing at the response level. Every single one of the methods above are a marked improvement over citing raw net ratings but even they too fail at understanding the randomness of an actual stint.

Let’s take a deep look at a single stint from the Boston-Indiana series.

At the start of game three, the Celtics lit up the floor by scoring on 12 of their first 18 possessions to race out to a 29-18 lead. Buoyed by five three point field goals, Boston maintained an offensive rating of **161.11** for their first stint. In contrast, the Pacers spent half their possessions turning the ball over through bad passes and missed field goals, only converting 44% of their possessions into field goals en route to 18 points; an offensive rating of **100.00**. The differential suggests that the Celtics had a net rating of 61.11; indicating the starters were vastly superior to their opponents. A little troubling for a teams that ended up with a **-3.5 **when all was said and done.

When all was said and done, the distribution of points per possession are given as

- Boston Celtics
**0 points:**6 possessions**1 point:**0 possessions**2 points:**7 possessions**3 points:**5 possessions

- Indiana Pacers
**0 points:**9 possessions**1 point:**1 possession**2 points:**7 possessions**3 points:**1 possession

Let’s play a little game with this “training data.”

By supposing the distribution of points scored per possession are given above by the Celtics-Pacers stint, we can simulate the 18 possession stint over and over to understand the randomness of the data. Of course, we assume there is noise on the above data, so we will apply a basic Bayesian filter for multinomial data. Furthermore, we **won’t even apply luck adjustments **to bias everything we can towards Boston.

**The idea here is to look at a net rating an understand, given the randomness of scoring, how noisy that rating really is.**

Here, we apply a simple algorithm that samples the distribution of points scored from the multinomial-Dirichlet model trained by the Celtics’ +61.11 net rating.

p1 = [0.3182, 0.0455, 0.3636, 0.2727] p2 = [0.4545, 0.0909, 0.3636, 0.0909] scores1 = [] scores2 = [] ratings1 = [] ratings2 = [] netRatings = [] wins = 0. Games = 1000000 for i in range(Games): # Simulate Team 1 score1 = 0. for j in range(18): r1 = random.random() if r1 < p1[0]: # No Points Scored. continue elif r1 < (p1[0]+p1[0]): score1 += 1. elif r1 < (1. - p1[3]): score1 += 2. else: score1 += 3. # Simulate Team 2 score2 = 0. for j in range(18): r2 = random.random() if r2 < p2[0]: # No Points Scored. continue elif r2 < (p2[0]+p2[0]): score2 += 1. elif r2 score2: wins += 1.

Running the simulation, we see that even with this absurd differential, **the Pacers are expected to win more than 5% of these stints! **The probability of a Pacers win under these scoring distributions are **5.2%. **Now this doesn’t mean that when Boston posts up a +61.11 net rating, the Pacers will win 5% of the time. This means **when Boston plays like a +61.11 net rating team, the Pacers are still expected to win more than 5% of the time**.

Therefore, the net rating doesn’t indicate that Boston is 61 points better, it’s merely a **symptom** of whatever the true net rating is. In fact, let’s take a look at the distribution of offensive ratings:

We see there is significant overlap in the two distributions. In fact, to illustrate the symptom effect described above, Indiana played at **72.7 offensive rating** but yet they latched onto a 100.00 offensive rating. Similarly, Boston’s distribution of scoring reflects a **131.84 offensive rating** despite the 161 that was posted. What this shows is, the teams are symptomatic of “**luck.**”

**(Note: **For those who are fully aware of statistical analysis and resulting **continuity correction **being applied by the Dirichlet-Multinomial model above, luck is being defined as points over/under expectation, inflated at small probability regions. In this case, it’s free throws and three point field goals; hence the drops just noted.**)**

The more important takeaway is that the style of play from Boston led to a **larger variance** in play. That is, their ratings have a standard deviation of **28 points**. Compare this to the Pacer’s much smaller **20 points**, and we see that ratings follow a **heteroskedastic process**.

With that in mind, we can look at the net ratings for the Boston starters:

What ends up happening is the phenomenon that beats up most regression analyses on ratings: **skewness**. Here, we can actually see the skewness as the distribution is left-tailed. In fact, due to randomness we see that the game **with a given true net rating of +61.11** could **produce a net rating of** **-100**.

The point here is, a **-3.5 net rating **is relatively meaningless. It’s just another descriptive number that needs **a lot more context**. Negative net ratings still produce wins. That’s a problem when trying to understand how well a unit works together.

Furthermore, even if a very high net rating is used as truth, we can still get wildly varying net ratings.

In fact, a former Sloan presenter one told me that **“Six possessions are enough to invoke Central Limit Theorem**” which I’ve never seen as true. Above is yet another example where we even triple the size and still get a heavy skewness in the results using the tests derived from Columbia University , skewness for this sample is** strong **with **p-value **4.38 x 10(-29) for **one million samples**.

Lastly, ratings are heteroskedastic. Meaning every regression model poorly reduces noise if heteroskadasticity is not taken into account.

More importantly, the argument is to identify the **ratings **are **symptoms** of other phenomenon. Instead, we should focus on **transactional interactions** such as **actions and scenarios that feed into points per possession from possession to possession**. This isn’t to suggest using a singular point per possession, but rather develop an artifical-intelligence-based approach to **understanding the decision making process of a collective unit** **given the state of the gaming system**.

Currently, several teams are approaching this venture. Some is developed on play-to-play analysis such as live and dead ball turnovers thanks to Mike Beuoy and Seth Partnow. Some is developed by tracking such as trying to quantify actions as competing risk models thanks to Dan Cervone. These are just a handful of examples in existence, and even then they struggle to maintain fidelity to the game; a fact of the ever changing landscape of how points are scored.

Until we are able to represent the **stochastic partial differential equation** that defines basketball, we are left nibbling at its edges with summary statistics, regression models, and partial “solutions.” And that’s okay for now.

Just remember that a 61.11 positive net rating match-up is expected to lose over 5% of the time.

]]>

However, for the uninitiated, scoring is not just simply putting the ball in the basket. It’s about getting your team to convert as many points per possession as possible. It’s a reason **Steve Nash** was a consecutive year MVP. It’s a reason why many people remember **Magic Johnson** as an elite scorer; he really wasn’t as only sub-20-PPG player for most of his career. And if you’ve been paying attention this year, it’s also why **Ben Simmons** and **Giannis Antetokounmpo **are such scoring threats despite rarely (or never) making a three point attempt during the season.

Once we plot their **spatial scoring distributions**, we immediately see the scoring value of a player by looking beyond “where they make their shots.” From here, we look into how many points a player **contributes** to their team within a game. At a cursory look, we take a step back from “deeper” such as **credit sharing **techniques that Dean Oliver or I have shared in the past and simply look at the accumulated points from points and assists within a game.

**Note: **By looking at **points + assisted points** we will “score” more points than a team does in a game. In this, we try to take a step back from **point splitting**.

In something I have called **points responsible for (PRF)** since the mid-80’s, we simply add the points scored and assisted points scored. The reason for such a term is from my days as a kid watching the **Showtime Lakers** with **Chick Hearn** and **Stu Lantz** on the television commentating the game. During one game, Magic Johnson had contributed to a series of baskets, causing Chick Hearn to say (as I most likely faulty remember) **“Another basket by Magic. He’s contributed to [x-amount] points over their last [y-points scored].”** And it was his points scored and assisted points. Ever since then I, in a nuanced fashion and to confused response, would tell people in games how many points they contributed.

By counting the contribution of a player through PRF, we start to understand how many points a player is really scoring. Consider a simpler, poor-man’s version of **offensive** **rating**. And in keeping with traditional statistics, such as points, we can start to look at the same question we introduced at the beginning of this article: **who are the greatest single game scorers?**

Recall that since the introduction of the three point line nearly 40 years ago, not all assists are created equal. To this end, we cannot simply count how many assists a player has and multiply it by a constant. **Pre-1980 this is easy! **Post 1979 this is much more difficult.

Using play-by-play, this is rather simple. We can look at every basket made and look at the assist tag. Using Python, we can simply plow through each action in play-by-play and hold results in dictionary.

for index,row in df.iterrows(): if row['event_type'] == 'shot': player = row['player']+','+row['team'] if player not in playersTemp: playersTemp[player] = [0.,0.,0.,0.,0.] if np.sum(playersTemp[player]) > 0.: print 'MASSIVE ERROR: PLAYERS TEMP FILE CAME INTO THE GAME NONEMPTY!!!', player, playersTemp[player] if row['points'] > 2.: # Three Point FG Made assistMan = str(row['assist'])+','+row['team'] if assistMan not in playersTemp: playersTemp[assistMan] = [0.,0.,0.,0.,0.] if assistMan in playersTemp: playersTemp[assistMan] = [x+y for x,y in zip(playersTemp[assistMan],[0.,0.,0.,0.,3.])] playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[0.,3.,0.,0.,0.])] if assistMan not in playerPTS: playerPTS[assistMan] = row['points'] else: playerPTS[assistMan] += row['points'] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[0.,3.,0.,0.,0.])] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) elif row['points'] > 1.: # Two Point FG Made assistMan = str(row['assist'])+','+row['team'] if assistMan not in playersTemp: playersTemp[assistMan] = [0.,0.,0.,0.,0.] if assistMan in playersTemp: playersTemp[assistMan] = [x+y for x,y in zip(playersTemp[assistMan],[0.,0.,0.,2.,0.])] playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[2.,0.,0.,0.,0.])] if assistMan not in playerPTS: playerPTS[assistMan] = row['points'] else: playerPTS[assistMan] += row['points'] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[2.,0.,0.,0.,0.])] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) elif float(row['points']) > 0.: # Free Throw Made print 'FREE THROW MADE' playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[0.,0.,1.,0.,0.])] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) if row['event_type'] == 'free throw': player = str(row['player'])+','+str(row['team']) if player not in playersTemp: playersTemp[player] = [0.,0.,0.,0.,0.] if float(row['points']) > 0.: # Free Throw Made playersTemp[player] = [x+y for x,y in zip(playersTemp[player],[0.,0.,1.,0.,0.])] if player not in playerPTS: playerPTS[player] = row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS) else: playerPTS[player] += row['points'] teamPTS = buildTeam(row['team'],row['points'],teamPTS)

And to this end, we can easily identify how much PRF Kobe had in his 81 point game and Devin had in his 70 point game.

**Kobe Bryant: 86 PRF**- 81 points
- 42 points on 2PM
- 21 points on 3PM
- 18 points on FTM

- 5 assisted points
- 2 points on A-2PM
- 3 points on A-3PM

- 81 points

**Devin Booker: 83 PRF**- 70 points
- 34 points on 2PM
- 12 points on 3PM
- 24 points on FTM

- 13 assisted points
- 10 points on A-2PM
- 3 points on A-3PM

- 70 points

And we see that both players contributed to slightly over 80 points. We also see that Kobe Bryant contributed to over **seventy percent **of his team’s scoring in the Toronto game. To this end, we can show Kobe’s scoring chart:

In the play-by-play era, both Kobe and Devin are the points “darlings” of the league. But who has the **highest PRF? **Not **Russell Westbrook** or **LeBron James**. Instead it’s…

Since 2004, James Harden has posted some of the highest PRF games. While Kobe Bryant has constructed an 86 point effort, James Harden is the **only player in the play-by-play era **to wrangle **multiple 90-point games**.

On December 31, 2016 James Harden posted one of the singly epic games in the history of the league with a **53 point, 42 assisted points **game in a 129-122 win over the New York Knicks. This resulted in Harden having a hand in **over 73% of the Rockets points**.

Just under a year later, Harden posted the second-best PRF total in the play-by-play era with a 91 point effort over the Utah Jazz on November 5, 2017. During this game, Harden dropped **56 points** while contributing to **45 points** via the assist. With a 137-110 victory over the Jazz, Harden contributed to **two-thirds of the team’s** **points**.

However, before play-by-play we have the added challenge of attempting to figure out how many three’s were assisted in the seasons between 1980 and the play-by-play years. To this end, we abandon **Python **in favor of **R** and a package developed by **Alex Bresler**, called **nbastatR**. Using nbastatR, we are able to leverage the NBA API to pull **game logs** from every season dating back to 1947.

for (season in seq(1947,1997)){ if (season < 1980){ multiplier <- 2.0 }else{ multiplier <- 3.0 # We set this multiplier at 3 because it's maximum possible. # There's no play-by-play being leveraged here. } print(season) gamesPlayedByPlayer <- game_logs(seasons=season,league="NBA",result_types="player",season_types="Regular Season") print('made it here!') if ("ast" %in% colnames(gamesPlayedByPlayer)){ cat("YES! ASSISTS ARE HERE!\n") }else{ cat("NAH.... ASSISTS ARE MISSING!\n") } prf <- multiplier*gamesPlayedByPlayer$ast + gamesPlayedByPlayer$pts highPRF 79. highPRFindices <- which(highPRF) print(highPRFindices) }

The above code identifies true PRF for all players pre-1980 and provides an **upper bound** for all players between the 1980 NBA season and the play-by-play era. One challenge we run into is that the NBA **does not post assist totals for several seasons…**

Here we see that the NBA stats don’t have assist totals for several games. In fact, there are 14 seasons missing assists entirely; including **Bob Cousy’s then record for most assists in a game **(28). Similarly, for seasons that do report assists, some games have no assists included, such as **Wilt Chamberlain’s ** December 8th, 1961 game. That’s alright, we can scan through **Basketball Reference** using the listing missing seasons. For these seasons, it’s straight-forward: walk through all game logs and compute **PTS + 2*AST**. Afterall, there’s no three point line back then.

Finally, we have to identify non-play-by-play assisted point totals. First we **assume all assists are three point assists**. By doing this, we provide an upper bound for all PRF totals. If no player hits a threshhold we like, the player game is dropped. From there, we can whittle down by looking at the **box score** and computing the value **diff3AST** and **diff2AST** which are created off the **difference** of the **player scoring a three** and the **team scoring a three**. This becomes encapsulated in the code block:

if (season > 1979){ print(index) print(gamesPlayedByPlayer[index,]$namePlayer) print(gamesPlayedByPlayer[index,]$idGame) boxScores <- box_scores(gamesPlayedByPlayer[index,]$idGame,box_score_types = c("Traditional")) for (teamNum in seq(1,2)){ if (boxScores$dataBoxScore[[2]][teamNum,]$slugTeam == gamesPlayedByPlayer[index,]$slugTeam){ tot3PossAST <- boxScores$dataBoxScore[[2]][teamNum,]$fg3m - gamesPlayedByPlayer[index,]$fg3m diff2AST <- gamesPlayedByPlayer[index,]$ast - tot3PossAST print(diff2AST) if (diff2AST < 0.){ diff2AST <- 0. diff3AST <- gamesPlayedByPlayer[index,]$ast }else{ diff3AST <- gamesPlayedByPlayer[index,]$ast - diff2AST } print("Diff 3AST") print(diff3AST) print ("Diff 2AST") print (diff2AST) print(gamesPlayedByPlayer[index,]$ast) astPTS <- 3.*diff3AST + 2.*diff2AST prf <- astPTS + gamesPlayedByPlayer[index,]$pts print(prf) } }

The idea is simple. Suppose a player has **10 assists **and we are interested in an 80 point threshhold. Similarly, they made **2** **3PA** while their team has made **10 3PA**. Then the player **cannot assist more than 8 3PA! **Suppose the player scored 55 points. Then their maximum possible PRF is 55 + 28 for 83 points. If this happens, then we must scan the game footage (if it exists) to see where all the three’s happened. If **four are not assisted by our player**, they fall off the list.

Similarly, if that player only had 51 points, their maximum PRF is 79 and they cannot be a member of the **80-Point Club**.

In this case, as seen by the R-code above, we also leverage nbastatR’s **boxscore **function.

Since 80 points is rare air when it comes to personally scoring 80 points (Wilt and Kobe), we consider 80 points for PRF as the threshhold. In this case, we obtain **five** questionable players when it comes to **three-point differential** in calculating PRF. These five players:

- Scott Skiles
- December 30, 1990 vs. Denver
- 22 points scored, 30 assists, 2 possible 3P-AST
- Guaranteed 82 PRF, maximium possible 84 PRF

- Michael Jordan
- March 28, 1990 vs. Cleveland
- 69 points scored, 6 assists, 1 possible 3P-AST
- Guaranteed 81 PRF, maximum possible 82 PRF

- David Robinson
- April 24, 1994 vs. LA Clippers
- 71 points scored, 5 assists, 1 possible 3P-AST
- Guaranteed 81 PRF, maximum possible 82 PRF

- Tim Hardaway
- April 25, 1993 vs. Seattle
- 41 points scored, 18 assists, 3 possible 3P-AST
- Guaranteed 77 PRF, maximum possible 80 PRF

- Jason Kidd
- February 2nd, 1996 vs. Utah
- 20 points scored, 25 assists, 11 possible 3P-AST
- Guaranteed 70 PRF, maximum possible 81 PRF

And that’s it. Thank you, nbastatR! We find three players immediately qualify for the 80 Point Club, but we’d like to settle their actual score. And we have two questionable players, that would barely crack the bottom of the club. To find thruth, we seek **game footage**.

Skile’s 30 assist game is not only a record setter for the then-lowly Orlando Magic, it also gave Skiles’ entry into the 80 Point Club. Thanks to YouTube:

We can view every single one of Skiles’ assists. In the game, there are only two possible assists and Skiles indeed picks one of the up, to **Dennis Scott**, securing him with **83 PRF.**

Jordan’s highest scoring game came with one possible assist for three. That would be to **Charles Davis**. Thanks again to YouTube, we are able to see Charles’ and Chicago’s only non-Michael Jordan three point field goal:

And we see it was assisted to him by **Stacey King**. Therefore Jordan remains put at **81 PRF**.

In a double-overtime affair against the Utah Jazz, Jason Kidd set a personal best early in his career with 25 assists. With Dallas hitting a then-remarkable 14 three point field goals on a television-announcer-flabberghasting 32 attempts, Kidd was on the hook for 11 possible three point attempts; as he made 3 of the 14. Thanks once again to YouTube:

We see that Jason Kidd assisted effectively all but three three point attempts: **David** **Wood **hit two three point attempts, both with Kidd on the court; but both assisted by **Jim Jackson**. Also, **George McCloud** razzle-dazzled for a step-back three in overtime to knock Kidd well below the 80 Point Club limit.

Unfortunately, we could not find footage for both the David Robinson and Tim Hardaway games. In Robinson’s case, there’s only one missing three point field goal, taken by 80 Point Club flirting **Sleepy Floyd (had several PRF > 70 games in the 1980’s)**.

Similarly, Tim Hardaway’s game on April 25, 1993 against the Seattle Supersonics appears to be a mystery. Despite this, there are a total of **three 3PM** that could have been assisted by Hardaway; all belonging to **Latrell** **Sprewell**. To this end, Hardaway sits at 77 Points and will not be included into the 80 Point Club until proper verification is provided.

Finally, onto the club:

Since 1947, there have been only **31 confirmed cases** of players hitting 80+ PRF in a game. Despite having thirty cases, there are only **20 players **who have reached the 80 PRF mark. As indicated earlier, the king of PRF is **James Harden**, who has hit this threshhold **6 times!**

The next frequent player on the list is **Oscar Robertson **and **Russell Westbrook** with **three entries**. A curious note about Westbrook is that he is **the only player to hit 80 PRF in a playoff game. **Only one. Ever.

If you’re keeping track, this means there are only **two players **remaining with **two ****80+ PRF** games. Those two players are **Wilt Chamberlain** and **LeBron James**. That’s it.

More strikingly, Magic Johnson never reached 80 PRF despite repeatedly hitting 75 throughout the 1980’s. Larry Bird? Not close. Michael Jordan? Once. And he needed 69 points. Kobe Bryant? Same, but with 81 points.

To date, here’s the list of verified 80 Point Club Games:

By looking at the list, we see a couple reflections. First, we see the “high scoring”-“low scoring”-“high scoring” phenomenon of the league between the early years through the muddling 80’s and 90’s to today’s three point revolution. Second, we see the exponential increase due to the three point revolution in the league through its stars. Graphically, these reflections are viewed this way:

Using the graph, we see **Oscar, Wilt, and Cousy** dominating the 60’s. Throughout the 70’s and early 90’s we have special cases by singular players: Jordan’s 69 point game with Skiles’ 30 assist game for the 90s’; Pistol Pete, Rick Barry, and Nate Archibald buoying the 70’s with their singular performances.

We also see the death of the 80 Point Club in the late 90’s and early 2000’s as games ground to low scoring affairs. Then, as the three point revolution has taken off over the last 4-5 years, we see the number of 80 Point Games explode; thanks primarily in part to Harden and Westbrook.

As players continue to adapt to the three point line, we will start to see this list expand and eventually have to ditch the 80 Point Club in favor of a more exclusive club. Possibly a 100 Point Club? If so… how soon will it be until we have three members?

]]>To start, let’s first identify how points are scored. In a singular game only **free throws**, **two-point field goals**, and **three-points field goals** can score points. Therefore the basic points model is given by

On offense, our goal is to increase **all three** categories. Conversely, on defense, our goal is to reduce **all three categories**. Using this formula for points scored leads to uninspiring models. Lots of information is left on the cutting room floor. However, we identify the **Madden Equation**

For the uninitiated, this equation is read as **a win equals a situation where an offense scores more than a** **defense**. In other words, “**The team who scores more than their opponent usually** **wins**.”

Ultimately, we become interested in increasing our number of FTM, 2PM, 3PM while decreasing our opponents’ FTM, 2PM, and 3PM. Or do we?

Nearly a decade ago, Kirk Goldsberry (then of Grantland fame) wrote about the **frequency **and **efficiency **of teams and players. It wasn’t a new idea, but it was one of the first times it was explicitly put into practice. While phrases such as “limiting opponents attempts” has been around for decades, the points model could be explicitly written as

The trick here is that we multiplied by one and applied the commutative law of products. Viola! **Frequency** and **Efficiency**.

Now we can interpret the points model as gaining or limiting attempts (**frequency**) while increasing or decreasing field goal or free throw percentages (**efficiency**).

While this is a much more helpful “model” when it comes to understanding points, it’s still light on intelligence. Thankfully, there are many ways to branch from here. Let’s start with a little bit of **Four Factors.**

In 2002, Dean Oliver introduced the idea of the **Four Factors** to the world. Within the four factors, **Effective Field Goal Percentage** and **Free Throw Rate** were identified as key components for understanding team success. We can obtain the relationship of these to **points scored** by again introducing the multiply by one trick:

In this case, we obtain points as a **multiplier **to FGA. There is no marked difference between this model and the Goldsberry model above, other than the follow along analysis, which we will eventually get to below. Before that, another few representations of the model.

The zonal model builds off of the Goldsberry model above by introducing **spatial components **into the attempts. Using this model, we determine ahead of time which locations, called **bins**, we would like to aggregate attempts into. The traditional model used **seven** **regions: **2 corner three slots, rim, paint, non-paint 2-pt FGA, and above-the-break left / right 3PA. As of this morning, NBA stats uses a total of **fifteen** spatial locations:

Writing out the model for this becomes tedious. Instead, we use the **summation notation** to help condense the points scored model. We label **S_2** as zones within the 2-point range and **S_3 **as zones within the 3-point range. By using the notation **“i in S_2”** we are simply just taking **bin “i”** from **S_2**. Understanding this, we have the zonal model of

From this model, we can start to ask **where** players are taking (and making) their field goal attempts. Using this block model, we can discretize field goal locations. It’s simple to understand and helps to quickly tease out locations.

Building off of the spatial model, instead of building a discrete binning structure, if we apply **the counting measure**, we obtain the traditional shot chart. The chart is a “well, duh” but the mathematics behind it is much more complicated, but allows us to do some fairly wild analysis. Instead of making this a PhD level analysis, we will use the term counting measure to **illustrate **field goal attempts. In reality, we are using the **Lebesgue measure** and this allows us to immediately develop **random process models**, which are leveraged near-explicitly in Expected Point Values (EPV) and what I’ve called my **“Left-Of-Boom” **Model from 2015.

The idea here is that every field goal attempt is part of a **spatial point process** that occurs with some **random structure**. In doing this, we divide the court into spatial regions again, this time using **D_2 **for the region of two point attempts and **D_3** as the region of three point attempts. The model is then written as

Using the two integrals allows us to plot this attempts such as these for **Trae Young, Giannis Antetokounmpo, **and **Ben Simmons**.

Immediately, we see that Trae Young is primarily an above-the-break shooter with a tendency to kick to the corners. Similarly, Giannis Antetokounmpo is a left-wing three point shooter, but will primarily attack the rim. Finally, Ben Simmons is not a three point shooter. He gets a **ton** of assists out there, but he’s primarily a paint player.

From here, we then would model points as a process model. The benefits of this is that we can get surgical with our analysis. The drawbacks is that we need **lots **of data; which we rarely have.

While we’ve primarily focused on **counting** the number of attempts, we also would like to look at **Per Possession Models**. The easiest way to limit points for a team on defense is to reduce the number of possessions. However, looking at the **Madden Model** above, limiting possessions, while indeed reducing points, does not necessarily translate to wins; as the number of offensive possessions decrease as well. Instead we may be more interested in **points per possession**, or (effectively) **ratings**; that is, points per 100 possessions. Since games tend to hover near the 100 possession mark, we will focus on a **ratings model** for scoring.

To build a rating, we simply count the number of possessions and dive that into the total number of points scored. To make the number more palatable, we then multiply this total by 100, obtaining what we call a **rating**. Applying this to the points model, we obtain

This can be used in the Goldsberry Frequency-Efficiency model as well:

And immediately, we focus on the likelihood of getting at least one 2PA or 3PA **within a single possession**. More importantly, we begin to introduce, through possession counting, non-scoring factors such as **turnovers** and **missed FGA/FTA with defensive rebounds**. These are situations with potentially **zero ****points** scored on a possession. In fact, this leads us to the last model we will touch on: the **zero point model**.

The zero point model includes **zero point possessions**. It’s implicitly defined in all the above models, but we intentionally left it out to build up the thought process in modeling points scored. In this model, we simply introduce the **zero point**:

Notice we put an **X** in the model. This is why we left out the zero-points in all the above models. The challenge that arises is placing the appropriate features into **X**. Since **X** is flattened by zero, interpretation of the feature is lost; and hence why a **ratings model** becomes much more favorable. Initially, we can think of the obvious**: Turnovers**. On a similar accord, **defensive** **rebounds**, would lead to zero points for the action.

It is important to note that points can still be scored within a possession that contains a turnover or defensive rebound. Despite this, all of the models above partition these instances into points scored prior to a defensive rebound or turnover.

While these two values are obvious, we are missing **much much more** when it comes to analysis of points. And it’s here where we actually **put action into the term MODEL**.

The phrase model simply means **description of a system**. In the above, we described points as a function of field goals, turnovers, and defensive rebounds. In reality, the **system** of the game of basketball is much more. And it’s impact is important on points scored. For instance, what’s the value of Andre Drummond and Steven Adams to their respective teams? **They are big on offensive rebounds**.

Similarly, what’s the impact of a drive from De’Aaron Fox? From Trae Young? From Lonzo Ball? If we run the Hammer to Patty Mills, how likely is he to score? More importantly, how likely will that shot attempt be available? It’s these questions that begin to drive the components of the models above.

Treating outcomes as observations from a probabilistic model, we can begin to statistically model points scored. Let’s go back to that **counting model**. Let’s treat the Milwaukee Bucks as an opponent. The Bucks still lead the league as of this morning in **percentage of FGA at the rim** with **35.2% of attempts within three feet**. They are second in the league in **percentage of FGA beyond the arc **with **41.8%**. And the Bucks are very effective at the rim (70% – 3rd in the league) despite being pedestrian from beyond the arc (35% – 15th in the league).

Now, we are going to start at a basic level. Anything more would be too complicated for this setting. Let’s just assume that all attempts are generated by some distribution and nothing more. No assists, drives, screens, etc. At the novice level, we’d say FTM, FGM, and TOV all follow a **Poisson **distribution, but that’s if we’re completely disregarding the game of basketball. Instead, let’s **understand the system**.

The **counting vector** is a vector that contains the number of instances of a component of interest. For our most basic scoring model, these might be **FTM, 2PM, 3PM, **and **TOV. **Then we look at the distribution of each over the course of a game. For our opponent, the Milwaukee Bucks, some examples (FTM, 2PM, 3PM, TOV) are:

- 15, 28, 14, 21
- 13, 27, 17, 17
- 17, 28, 17, 14
- 28, 28, 13, 11
- 8, 30, 19, 17
- 21, 31, 10, 21
- 15, 26, 19, 17
- 22, 32, 9, 14
- 30, 24, 22, 12
- 7, 24, 16, 11

These are the numbers from the Bucks’ first 10 games. For all games this season, we end up with a correlation structure that’s not excessively significant, but also not surprising.

Here we see that whenever they take less 3PM, they get more 2PM (correlation -.34) and that turnovers are positively correlated with FTM and 3PM (correlations .03 and .18, respectively). For the 3PM and 2PM relationship, we have a significant enough correlation (p-value < .0005, effect size > 3.0) to suggest that there is an inverse relationship between the two components. Comparing 3PM to turnovers, we see a similar relationship: p-value of .03 and significant effect size of over 3.0. This indicates the Bucks turnover the ball more often when 3-point attempts are not falling.

Similarly, there is a weak inverse relationship between 2PM and turnovers. Here we obtain a p-value of .03 but an effect size of 0.17; which indicates its a weak but potentially existent **positive **relationship.This would indicate that the team is likely to increase their odds in turning the ball over when attacking inside the 3-point line.

Finally, all other relationships are weak-to-nonexistent.

What this exercise emphasizes is that there is a **correlative relationship** between the different mechanical parts of the basic model. This is a big deal, as we **cannot assume independence**. Therefore we model for even the basic model is:

And here we don’t specifically specify the error distribution. Instead, we identify that error is unbiased with some covariance artifact. Therefore, we may be interested in expanding the model to include **interactions**.

Or we try a different approach.

A popular method in basketball analytics is to develop a **conditional** or **hierarchical**** model**. These models assume that quantities FGM or FGA are **sub-targets** that are responses of other basketball characteristics such as passes, drives, “openness” for attempt, defensive pressure. The most common example is the **Shot Quality **model. In this model, we typically (implicitly or explicitly) model FG% based on **minutes played, distance to closest defender, shot location, number of dribbles taken**, etc. In this case, we can write the distribution of points scored as

We can also begin to derive more complicated models, such as EPV, through the counting process model. In these cases, we can begin to start surgically building a hierarchical model that takes the spatial, temporal, and mechanical components of the game and develop a sophisticated model that ultimately goes back to quantifying the **Madden ****Model**.

The resulting coefficients of these hierarchical components help us then identify the **contribution** a player makes within the scoring model. Want to improve scoring when Lonzo Ball is on the court? We can now measure the impact of pick-and-roll offenses that lead to drives; with understanding of personnel on the court.

But be cautious when performing a hierarchical analysis. Small samples will begin to creep in and borrowing strength will become ultimately important. And it’s here where the edge gets to be gained.

]]>

This is Kevin Durant‘s percentage of field goal attempts, aggregated by specific distance for the first two seasons of his career. This table gives some information, indeed, however does it really paint the picture of where Durant takes his shots? More importantly, are we able to make proper decisions about the style of play for Kevin Durant?

The short answer is, well… not really.

Commonly, we find that much of the analysis about player tendency and capability stops here. We talk about at what distance a player takes their shots and then typically jump to **effective field goal percentage** and translate that to rudimentary calculations of **expected point value **per field goal attempt. Some analysts attempt to take this one step forward and produce a **shot quality metric** to identify the quality of shot, which actually doesn’t use the above information explicitly.

What happens if we produce another player with almost an identical table? Are these two players the same? Sure, we could build a **Chi-Square Test** to compare the players, but we may be rudely woken up to the fact that neither player is the same. Let’s take a look at these two players:

Can you guess the two players? They have very similar distributions and, while still being **significantly different **according to the Chi-Square test, it’s mainly due to the **failure of the Normal assumption** for the small values in the table. **14 ****versus 45 **causes **73% of the test statistic**. But who are these players?

On the left we have **P.J. Tucker **of the Houston Rockets. On the right, we have **Brook Lopez** of the Milwaukee Bucks. They are both three-ball-dominant shooters with a tendency to attack the rim. As Milwaukee has modeled their offense much like the Houston Rockets, it’s no surprise these two shooters appear to have the same distribution of field goal attempts. Or do they?

If we take a quick glance at Brook Lopez’s shot distribution, we find that he primarily takes attempts between the **-45 degree** to **45 degree** range along the top of the key.

We see the ghost town of field goal attempts in the mid-range, as well as the string of short-range attempts that litter the key.

Comparing this to PJ Tucker and we obtain an entirely different story.

We see that almost all FGA occur in the corners. We also see the ghost town of mid-range attempts. The shots in the lane? More along the baseline than being a steady stream towards the free throw line.

It is clear that the distributions are no longer the same. But how do we measure their difference? One solution is to use **shooting zones.**

A shooting zone is a region of the court that encapsulates field goal attempts at different locations on the court. It’s a step in the right direction as we can now differentiate between a corner three and a top-of-the-key three. Similarly, we are able to differentiate between a left-corner three versus a right-corner three.

Take for instance, Brook Lopez’s shot chart from NBA Stats. It’s a little misleading only due to the fact that they combine both frequency and efficiency. The colors indicate efficiency while the fractions indicate frequency. Here we see the high volume along the top-of-the-key zones.

We see the same misleading representation with PJ Tucker and again focus on the fractions.

And we see a nearly “inverted” plot as majority of PJ Tucker’s three-point attempts are located in the corners.

While this “one step further” plot helps us, there’s still a ton of information left on the cutting room floor. For instance, Brook Lopez is a -45 to 45 degree shooter. The zonal plots do not capture that activity. Right elbow and left elbow are not differentiated, where almost every player favors one over the other. A dunk is also values as much as a hook shot according to the zone distributions.

There’s just a lot of information still being lost.

We turn over to the next step further. Basketball shot charts have been around for **decades**. Kernel density plotting of basketball shot charts, too has been around since decades. In 2001, I had to write code for a list of **x,y-coordinates** into a **kernel density algorithm** algorithm using a seemingly newfangled programming language called MATLAB (It wasn’t new and I wasn’t alone). And when the KDE revolution finally started to take hold in the media nearly a **decade later**, being called **heat maps** at this time, there were still significant flaws in some people’s designs. For instance, old plots would not include distance skewing such as a **log-transform, **a requirement in effort to show actual three-point effects in scoring. Yes, that is a post from **four years ago** as a knee-jerk response to poorly displayed ESPN shot charts at the time. That shows the log-transform representation.

If we apply the density function formulation here,we can obtain kde plots for both Lopez and Tucker.

Of course, we’d like to play with the bandwidth to make the charts “prettier.” This is simply an out-of-the-box method using Python. We of course use the **jet** **color map option** from Python, a MATLAB classic color map, to display the **heat** associated with a field goal attempt.

We immediately are able to surgically identify locations of every field goal attempt by both players. And more importantly, we have an **nonparametric** **approximate distribution** for each shooter’s field goal attempts. And unlike the “second step further” plots that we skipped over with **scatter ****(hexagon) plotting,** we’re not solely dealing with empirical data points, which by the way, **are noisy to being with**.

And armed with this distributional knowledge, we can finally start to say something intelligent with shot chart data. Yes… there’s been negligible intelligence obtained thus far.

Our discussion started by asking about the similarities between **two players**. While this is helpful in understanding where players are positioned, this is rarely the question that we would like to answer. In order to understand the question we really want to answer (and we haven’t asked just yet), we will tackle this thought exercise first in an effort to understand **Kullback-Leibler Divergence. **

Kullback-Leibler Divergence is a method for measuring the similarity between two distributions. Developed by **Solomon Kullback **and **Richard Leibler** for **public release **in 1951, KL-Divergence aims to identify the **divergence** of a probability distribution given a **baseline** distribution. That is, for a target distribution, **P**, we compare a competing distribution, **Q**, by computing the **expected value of the log-odds of the two distributions:**

Here, we used the one-dimensional notion, the two-dimensional notion is similar; just use a double integral with **t := (x,y)** and **dt := dxdy**. It’s obvious that if the two distributions are identical, then the integral is **zero**.

Also, with a little bit of work we can show that the KL-Divergence is **non-negative**. Meaning, that the smallest possible value is zero (distributions are equal) and the maximum value is **infinity**. We obtain infinity when **P** is defined in a region where **Q** can never exist. Therefore, it is common to assume both distributions exists on the same support.

The KL-Divergence is a technique that spawned from research performed at the National Security Agency. Richard Liebler, who would eventually become the Director of Mathematical Research, and Solomon Kullback, who then focused on COMSEC operations, developed the methodology while analyzing bit strings in relation to known coding algorithms. The aim was to identify **shared information** in effort to exploit **weaknesses** shared between known crypto-algorithms and crypto-algorithms in the wild. Since its public release, KL-Divergence has been used extensively across many fields; and still is considered one of the most important **entropy measuring tools** in cryptography and information theory.

If we apply KL-Divergence to shot charts, we can immediately begin to compare the **spatial representation** of the two shooter’s tendencies. To do this, we must build a **quadrature** to estimate the integral from the KDE. This is a relatively straightforward method that can be exploited using the **scipy.integrate.dblquad** package in Python, or crudely using the **midpoint rule**. Either way, the answers are similar. Just be sure to **assign the shot charts to be numpy arrays**.

For the case of Brook Lopez and PJ Tucker, we obtain a **KL-Divergence of 0.0929**. This is a relatively small KL-Divergence, but it could be smaller! Let’s compare this to **Rudy Gobert **of Utah. As Gobert rarely shoots three point attempts, we expect a much larger **KL-Divergence**. In fact, the divergence of Gobert from Tucker is **47.5551!**

Immediately, we gain an idea of differentiation between the players’s shot location tendencies. In order to identify **where players differ**, all we need to do is look at the integration process; **exactly like we did with the Chi-Square Test above! **And it’s here that we see it’s specific locations that we mentioned above that differ between Lopez and Tucker.

Now that we know how compute KL-Divergence, we need to understand what it is telling us. First, **KL-Divergence is not a metric!** A metric, by definition, is a measurement function that satisfies three conditions: symmetry, non-negativeness with equality at zero, and the triangle inequality. **KL-Divergence only satisfies the second condition**. Due to this, we call it a divergence instead of a measurement.

Since the divergence is not symmetric, we **must specify the baseline distribution**. This distribution is **Q. **This seems counter-intuitive since the expectation is taken with respect to **P**. But there’s a simple explanation for this.

We think of **Q** as prior knowledge. Either a known cryptosystem in 1945, or a **current player of interest**. We then introduce a new observation: a new bit sequence or a new player. Now, given knowledge of the current player, how “**alike**” is the new player to the old? In order to understand the new player, we consider the new player **as new information introduced to the old player**. Therefore, the new player is a **posterior distribution**. If the **posterior does not change**, then the new player is exactly the same as the current (prior) player.

Therefore, the **0.0929 **indicates how much **PJ Tucker diverges from Brook Lopez in shooting frequency**.

Now… that’s not so much the intelligence part. Let’s get to that.

We can leverage the KL-Divergence in an effort to understand changes to **offensive schemes** and reaction to **defensive maneuverings**. The **most explosive revelation leveraging KL-Divergence** is **measuring field goal attempts with respect to BLUE action**. That is, when perimeter defenders in PnR situations move to a seemingly unfavorable defensive position in an effort to divert the PnR into a favorable defensive match-up. This past year alone, BLUE situations on the left wing led to a KL-Divergence of **10.373** when compared to non-BLUE situations. That’s almost entirely generated off the changes in shots becoming left-wing / left-wing in BLUE situations versus right-side/at-rim from middle-lane location in non-BLUE attempts.

We can also begin to analyze changes in shot frequency, a bane for understanding perimeter defenders. Using the KL-divergence, we can start measuring the **changes in frequencies** due to **close-outs** and **quality perimeter defenders **to help understand when teams are **not taking the three they usually take**. Granted, we cannot simply use **defensive three point shooting as a** **metric** and we certainly cannot use simple frequencies of shooting (they’re too few in a game). But we can build a distribution and measure the KL-divergence, which helps **borrow strength** from nearby field goal locations and allows us to start asking which features lead to changes in KL-Divergence.

In doing this, for this given year, you’ll immediately start seeing the defensive differences in two former Spurs: **Danny Green **and **Jonathon Simmons. **One being significantly “better” at perimeter defense than the other.

Similarly, if an offense uses a PnR action that leads to a rim-running event, **where are the field goal attempts likely going to be generated**. If DeAndre Jordan is swapped with Enes Kanter, **we will see a ridiculously different result**. This indicates that the same action with different personnel yields different results. We can peel back the integral and see exactly where the spatial locations vary** and understand how those locations impact the divergence**.

Combining this knowledge with those players’ efficiencies, and we start gaining insight of where we want to push the ball on defense. And, more importantly, how we might want to rotate on defense.

Remember though that changes in KL-Divergence does not mean good or bad. It simply means **change**. It’s not a **target variable**, but rather a methodology to quickly run through several iterations of teams and players, giving insight as to which players are similar in which situations and which teams are similar in others in certain situations, and even (if applied to a same team) how a team makes adjustments over the course of the game.

To gain insight of **good or bad**, we must then build the analytical model that identifies good and bad. Be it an expected point value, or some other win-shares type action.

]]>

In this game, with the use of a 21-foot three point line, Columbia defeated Fordham 73-58. Columbia managed to knock down 11 three point attempts to Fordham’s 9 makes. The 73 points marked a Columbia school record at the time. It was proposed that the **actual score** would have been 59-44 in favor of **Columbia*.** Despite the increased scoring, many fans were left confused and upset over the new rule. Even leading the New York Times to write that the three point line “experiment” had been “far from a howling success” and that the three point line would “die a natural death.”

If you notice that the **actual** score was supposedly 59-44 despite only 11 and 9 threes for each team, respectively; then you are quick to realize that something else was being experimented in this game. If we were to subtract out the extra 11 and 9 points, respectively, the score would have been **62-49, **leaving anther 3 points on the table for Columbia and 5 points for Fordham.

This is due to an extra experiment **that has never caught on since** where players had the option of shooting free throws from the foul line for 1 point or the top of the key for two points. On free throw trips with two or more free throw attempts, the player could only score a maximum of three points, as only one attempt could be selected as a two-point try.

The three point play was seen as diminishing team play, as players would race to the three point line to shoot instead of passing the ball. In fact, according to the New York Times, several players were called for **traveling ** as they forgot to dribble while sprinting to the three point line.

Similarly, there were complaints that the three point line **ruined zone defenses** and required less strategy for offensive teams. This complaint was exacerbated **by a third rule change during that Columbia-Fordham game**: the lane was widened to 12 feet from its original six feet to test spacing effects.

What is interesting about the Columbia-Fordham game is that of the 1000+ spectators present for the game, roughly **250 collegiate coaches and league representatives **were present for the game. Shortly after the game, they submitted votes on whether the league should invest in possibly establishing the new rule changes. The votes were as follows:

- 148 in favor of a three point line, 105 opposed
- 152 in favor of widening the foul lane, 65 opposed
- 133 in favor of the 2-point foul shot, 85 opposed

It would be a while before the NCAA accepted any changes.

The collegiate ranks attempted the three point line a couple more times over the following two decades. On February 1st, 1959****,** thirteen years after the previous experiment, Siena and Fordham used a **23′ three point line**, where it was reported that each team scored once from that range and “then forgot all about it.”

**** **In an attempt to track down a source for this game, the January 4th game is not listed on any major media news outlet. The game between Siena and St. Francis on February 2nd, 1958 (the only other 1958 meeting between the two teams) is listed in the New York Times, and no three point field goals are mentioned while the box score accounts for all points as free throws and two-point field goals. Upon further research, the game was erroneously listed for 1958 in the Dartmouth magazine, as it actually happened on **February 1,** **1959**.

In this game, St. Francis defeated Siena 67-50. In this game, St. Francis attempted 6 three-point tries while Siena attempted 9 of their own. Each team did indeed connect on one apiece, as indicated in the box score and summary:

The three point experiment would not be revisited again until 1961 in a game between Dartmouth and Boston in a **wildly different three point plan: **Every FGA counted as three points. Dartmouth’s head coach at the time, Alvin Julian, was growing infuriated with fouling and increased foul shooting. His response was to go to the Ivy League board and get permission to experiment for a game with three point field goals instead of two in an effort to incentivize scoring over foul shooting. Boston University, also mired in a dismal season, agreed to the experiment. The result did not do much to change the game, and it was **the only time in the NCAA and NBA that the three point line was at zero feet for an official game**. Take that for trivia.

Despite the first three attempts gaining mixed reactions to outright discouragement, the three point line slowly began to take hold. The American Basketball League used a **25′-foot line** in 1961. The Eastern Professional Basketball League adopted a similar rule in their 1964 season. Unfortunately, the ABL folded in December of 1962 after one and a half seasons, and the EPBL rebranded itself in 1971 as the Eastern Basketball Association (and eventually the Continental Basketball Association), a “feeder” system into the NBA and ABA.

Seeing the development of skilled shooters in the EBA, George Mikan, then Commissioner and Founder of the ABA, adopted the three point line in 1968 as a means to supposedly “give the smaller player a chance to score and open up the defense to make the game more enjoyable for the fans”. This is according to Wikipedia, as the Associated Press link is now defunct.

The three point shot was viewed as a gimmick, as the previous experiments had been decried by critics and other leagues that used them had folded so quickly. However, the ABA turned this into a marketing tool. The NBA was viewed as a slog with focus on small ball-handlers, dominant big-men, and repetitive high-paced dump and chase attack of 5-10 foot hook shots and rebounds. Instead the ABA had high flying dunks and three pointers. In fact, the ABA had not only adopted the three point line, they were embracing it with teams averaging **over five 3PA a game from their first season!**

For comparison, the NBA wouldn’t hit that mark until their **tenth season** (1989) using the three point line, when teams were finally attempting **6.6 3PA per game**. It should be noted that the league scratched the 5.0 attempts mark in their ninth season, still less than the ABA rate.

In 1979, twelve years after the ABA, the NBA had finally adopted the three point line. In its inaugural season, the three point line was used on average of **2.8 times per game**, a far cry from the ABA’s **5.0 attempts**. It took quite a while for teams to adapt to the three point line, as it was still seen as a gimmick. Shooters had not yet developed to effectively and consistently knock down three point attempts in the early 1980’s.

Even less, so, the **three point offense** was almost never used as the shot was seen as invaluable. **Effective field goal percentage** had been forgotten about since its inception in 1945.

In 1984, FIBA adopted the three point line, setting the stage for International teams to develop their skill set from beyond the arc. The line was slightly shorter than all previous attempts at **20.5′**. And many teams still did not adopt offenses that could maximize its potential. It was still seen as a gimmick, but also leveraged as a means to spread the court and possibly give more value to smaller players.

In 1986, after five years of scattershot experimentation in conference play, the NCAA finally adopted the three point line. Like FIBA, the three point was shorter: this time being a mere **19.75′ **from the basket. Despite still seen as a gimmick and as only a means of aiding smaller players against bigger, supposedly more-athletic, players, teams quickly adopted the three point line. Michigan attempted 11.4 3PA per game in its inaugural season, with 16.8% of FGA (366 of 2175 FGA). Duke also was taking over 11 3PA per game (11.2) at a rate of 18.9% of FGA being from three. Even, the famed Loyola Marymount team, who had received transfer (sitting the year) Bo Kimble only attempted 14.25 3PA per game with a rate of 21.2% of FGA resulting in a 3PA.

In fact, during the Loyola Marymount run-and-gun days, the Lions never crossed the 30% frequency mark despite posting scoring totals of upwards of 130+ points; the 1989-90 team averaged 122.4 points per game. In their final year with Kimble, Hank Gathers, and Jeff Fryer, the Lions would raise the bar to attempting **23 3PA per game** (737 3PA over 32 games) but only as **26.2% of their FGA** (737 of 2808 FGA).

The NCAA three point revolution may have started, but it hadn’t really taken hold for any team just yet.

In the 1988 Seoul Olympics, international teams finally were able to test their abilities at three point shooting. Some teams were sheepish. For instance, the 1988 USA Men’s team attempted a whopping **14 3PA **out of **181** FGA over their tournament play. That’s less than **5 3PA** per game with only **7.7% **of FGA being 3PA.

On the other end of the spectrum, the 1988 China Men’s basketball team attempted **40 3PA** over their two classification games for **20 3PA per game!** Of their 104 FGA, this was the first true team to start attempting a significant amount of three’s at the international level, with **38.5% **of FGA being 3PA.

For all other teams in the 1988 Olympics, almost every other team settled in at between 13 and 17 attempts per game, all at roughly 20-25% of their FGA’s. This was on-par with traditional NCAA teams at the time; except for the bronze-medal NCAA all-star team that USA put out.

The Chinese Men’s basketball team was the third smallest team in the field, with a mean height of **6’4″** and a median height of **6’5″**, only one player standing more than **6’7″ **on the team. Comparatively, the **fourth**** shortest team Egypt** boasted only a slightly higher average height (still only) **6’4″ **with a median height of **6’5″** as well. The shortest team in the tournament? **South Korea** with a mean height of **6’2″** and a median height of **6’2″** as well. The second shortest team? **Central African Republic** with a mean height of **6’4″** and median height of **6’3″**. It was no surprise that these four teams finished at the bottom of the classification. And, with exception of **Australia, **these teams had taken the most amount of three’s per game, as South Korea had one high game of 30 3PA out of 70 FGA.

From the 1988 Olympics, the old tale of the three point line trying to aid unskilled short players in a big-man’s game still rang true. It was still a gimmick, and teams that used the three point line could only hope to keep games close in an otherwise “should-be-routed” game.

**Interesting Side-Note: **The United States had the fifth smallest team in the tournament with an average and median height of **6’6″. **Only two players were taller than 6’10” for that team: Charles Smith and David Robinson.

Unfortunately, footage from the 1988 games is relatively sparse and there exists no play-by-play from those games to measure the amount of impact China had with their three point shooting.

In 1992, the Barcelona Summer Olympics had two milestone achievements in basketball: the Dream Team of the United States and the second Olympics with a three point line. **Angola** flipped the script and took an **excessive amount of three’s**.

In 1992, the Dream Team came to fruition and took the Barcelona Olympics by storm. Not only had the United States brought in much more skilled players, as they finally were able to leverage their professional system, but they also had much more size on their roster. With an average and median height of **6’9″, **the Dream Team instilled fear of attacking the rim onto their opponents. With **Michael Jordan **and **Charles Barkley** being the two shortest players (Stockton was listed, but sidelined with a knee injury) at **6’6″**, teams reverted back to using perimeter offense as a means to survive.

First victim? **Angola**.

Angola became the first team to try such an offense: the three point and rim offense. Of their **68 FGA**, Angola attempted **37 3PA** for a rate of **54.4% of FGA attempted as 3PA**. Over half of their field goal attempts were threes! Similarly, the Angolans attempted to get shots at the rim with **11 **of their **31** **FGA** within three feet of the rim.It was a short lived plan, as **Patrick Ewing, David Robinson, Karl Malone**, and especially an elbow-happy **Charles Barkley** denied interior shots as the game wore on, forcing Angolan players to settle for the mid-range.

And with a terrible efficiency from Angola, the United States settled after a shaky first seven minutes to route Angola 116 – 48. Angola had finally tried something that hadn’t been done before: layups and threes. It was still viewed as an inferior team trying to get equalization against a far superior team, but the table was set for high-percentage three point teams. It only needed more skilled shooters.

In the 1993 season, head coach David Arseneault of Grinnell College identified that the Pioneers were not having fun playing basketball. Before his arrival in 1989, the Pioneers had 25 consecutive losing seasons; and in his first couple years, players that were not receiving enough playing time were quitting after their first year. In response, he decided to make the game “more fun” and developed elements from the fairly tame Loyola-Marymount up-tempo offense. A team, I personally became familiar with in college spending two of my four years of collegiate basketball playing at a small Division III school.

The Grinnell **System** is an unorthodox offense that focuses on full-court pressing, quick shooting, crashing for rebounds, and **giving up uncontested layups**. Yes you read that right. In my first game against Grinnell, I had been warned that the Pioneers will abandon the defense if the shot clock drops to 25 seconds. In my first possession against Grinnell, we set into the Princeton 4-out offense, one pass was made, two cuts, and **three Grinnell defenders sprinted back to their half of the court**. This left a lane open for our wing to drive in to the hoop. As he drove in, **one Grinnell defender went underneath the basket to collect the make** as **the other defender ran to the sideline for an outlet pass**.

We didn’t recover well enough as four of our players crashed. Grinnell sent the ball down the sideline, took a three and missed. The two other cherry-picking players crashed and kicked the ball right back out for a second chance, this time making it. Score: 3-2 Grinnell.

Prior to our game, our assistant coach told us about the **System**. They try to attempt **100 FGA in a game with a minimum of 50 3PA**. They also attempt to** grab 33% of their missed FGA’s**. This would equate to hopefully **1 point per possession if they miss**. And they **refuse to let the clock stop**. The aim was that long defensive possessions and free throws will stall their offense. To avoid both, if a team is able to break their press, which presses result in layups, they rewarded the team with a free layup. And, boy did they run: **every two minutes a wholesale change would occur** as five new players would come in for their five players on the court. Hot hands stayed on.

It was bizarre. However, **it was the first time in NCAA and NBA where a team specifically dictated 50% of their FGA should be from 3-point range**.

We fortunately won our first match-up 116 – 92 as the Pioneers were a measly 19-55 from beyond the arc. In fact, I still have our **hot-wash** report, which included their shot-chart:

As we can see, once again the Angola methodology was used: layups and threes. However, the corner was not being exploited and the layups were almost all exclusively off of turnovers and put-backs.

Later, in 2001, we got a peek behind the scenes, thanks to USA today. The numbers were almost on par with what our coach had told us. But we now had a **clearly defined gameplan:**

- Take 94 or more FGA
- Ensure 50% of FGA are 3PA
- Force 32 turnovers
- Get offensive rebounds on 33% of missed shots
- Take 25 or more FGA than your opponent

It would be years before any other team would adopt the 50% three point strategy.

As the NBA was reluctant to adopt the three point line, it didn’t take until **29 seasons later** in the 2017-18 season before the **Houston Rockets** finally crossed the **50% of FGA being 3PA** threshold. And it worked to success: Houston finished first in the Western Conference, thanks to a polar vortex crashing down on them in the 4th quarter of Game 7 in the Western Conference Finals, almost an NBA Finals appearance. That season, Houston provided a road map for teams to fully weaponize the three point line.

It was no secret that the three point line was being embraced by the league more and more over the years. We’ve all seen the same plot of 3PA per game over the course of a season:

Back in 1990, Paul Westhead attempted to bring over his run-and-gun offense to the NBA through the Denver Nuggets. They went on to put up eye-popping numbers on offense, with 119.2 points per game. But they suffered on defense, giving up 130 points per game. And their three point attempts were rather pedestrian with only **12.9 3PA **per game, at a lowly **11.9% of FGA as 3PA**. There was no revolution happening.

However with Houston, they took a look at **effective field goal percentage**, or more importantly, **true shooting percentage**, taking a page of out Hobson’s 1945 analysis from Columbia and making it intelligence in today’s game; which is now household knowledge for any NBA analyst. If effectively **adopted the Angola** offense, but used players **capable of playing at an NBA level**. That is, attack the rim and knock down three’s. Increase the frequency of both, to almost **Grinnell** levels, and we should have a recipe for success.

And success it was.

Just fourteen years prior, the NBA was still mired in mid-range purgatory. By performing a non-negative matrix factorization on the shot locations in the 2005 NBA season, we find that there are a couple three point ranges as preferred shot locations, but there were two mid-range preferences that had dominance over the distribution of field goal attempts.

Feel free to thumb through the different types of FGA here:

Click to view slideshow.Now compare this to the current 2019 NBA season:

Click to view slideshow.Notice the severe difference? That’s Houston’s influence on the league. Notice we use the **Milwaukee Bucks court** as the backdrop for this current season. That’s because the Bucks have adopted the Houston strategy and has ridden it to a **league leading 37-13 record as of today. **

The big difference with the Houston system has been **mobile** bigs and highly skilled guard play, with bigs capable of attacking the rim and knocking down the three and the guards slicing up a switching defense. The emergence of **positionless basketball** has also helped develop the **6’7″-6’11” **point-forward; seen as an anomaly with **Magic Johnson** but is now common with players like **Giannis Antetokounmpo, Ben Simmons**, and **Kevin Durant**. It has also developed the skilled scorers such as **Stephen Curry **and **James Harden** as today there are now 3-4 knock-down shooters per team on the court at any given moment; a rare thought in 1990.

And despite this emergence, only a couple years ago former players were still calling this a gimmick; growing up professionally in a “live by the three, die by the three” era. However, elbow-throwing Charles Barkley had to eventually eat crow as the Warriors showed that the three point ball could drastically alter opponents game plans.

It only took **seventy years** to get to this point… From Columbia via Angola and Iowa to Houston. It leaves to beg one question:

What’s the next 70 year revolution going to be?

]]>

Let’s see a typical defensive rotation in action:

In this play, **Blake Griffin** of Detroit runs a **dribble hand-off **(DHO) with **Langston Galloway**. Despite Griffin being a relatively strong perimeter shooter, **DJ Wilson** of Milwaukee **drops** to allow **George Hill** to slip the screen. I use the term slip, because there is no fight as Wilson gave him room to avoid the screen.

While the screen is happening, **Bruce Brown** of Detroit runs a **Deep Cut** from the top of the perimeter to the weak side corner. The aim is to **pull ****Khris Middleton** off the nail and tangle him with **Brook Lopez** in the paint. Middleton merely checks his weak side, sees that the weak side is currently clogged with **Glenn Robinson III **and **Eric Bledsoe** hovering about.

The defensive plan here is to keep drivers out of the paint and chase shooters off the line. The primary option for Detroit’s offense is to find a driving lane, which is now gone thanks to Middleton, Hill, and Wilson. As Wilson had dropped, the only true option for Griffin is to pop to the perimeter, which will require a pass over the shoulder from Galloway; who is not entirely known for hook passes for pick-and-pop three’s. Instead, the Pistons run through their second option.

As Galloway deep cuts to the weak side, we should expect the Bucks to anticipate a 1-on-1 situation between Griffin and Wilson. In the 1990’s this would spell almost certain doom as Griffin is as strong as they come. Instead, the Bucks drop back into their zone-style of play using a **check **from Brook Lopez to allow him to stray deep into the paint with a fresh set of three seconds.

Bledsoe begins to sag back onto the nail in attempt to cover Griffin’s dominant hand should he come crashing into the lane. Thanks to Lopez’s **switch**, Hill is able to **slip** the switch back onto Galloway, allowing Lopez to **show** within his three second window.

As predicted, Griffin turns over to his strong hand, causing Bledsoe to **blitz** Griffin. This minor miscue allows Robinson III to backdoor cut towards the basket. Griffin slips a nice pass to Robinson III into the paint. Despite this, Lopez is in **show **position and contests the field goal attempt with a **block **and Eric Bledsoe **defensive rebound**.

In the annals of play-by-play data, this Detroit possession will be logged as a **Robinson III FGA (3′ Cutting Layup), Lopez BLK, Bledsoe DREB.** It will ultimately be seen as **zero points out of one offensive possession **for Detroit and **one stop out of one defensive possession** for Milwaukee. In this case, the term **stop ** simply refers to a defensive possession where the offense scores zero points.

So the question is, **how do we quantify this stop?**

Commonly throughout a game we will hear phrases such as, “all we need is one stop” or “[Team A] made this a two possession ball game.” What these statements are referring to is the stop. A stop is simply a defensive possession that results in zero points: the defensive team has stopped an offensive team from scoring in that given possession. Ideally, we would like to assign credit to the defenders. In models such as **adjusted plus-minus **and **RAPM**, there is no credit assigning mechanism other than a **regression-based methodology **that isn’t an actual regression model (IE: short story is, there can never be a 0.14 player in the game; nor is he isolated. These are more akin to poorly managed fractional-factorial designs with heavy aliasing.). Using such a a model will identify some key traits of players, but the numbers themselves are effectively meaningless when relating to true defensive impact. That is, having a defensive RAPM of 4 just means you’re on the court during situations that positively affect the defensive rating more often than someone who has, say a defensive RAPM of 3. It doesn’t mean that player is one more point better per 100 possessions (it’s a biased estimator, remember) and it certainly doesn’t mean that the player contributes to 4 points worth of defensive efficiency (due to aliasing).

We can also use **RPM-style Bayesian models**. While RAPM is a Bayesian process, it’s not a Bayesian process in the eyes of a player. It’s merely a regularizer that controls the variation effect of the parameter space, not the model space where the players exist. In this case, we can apply priors based on **box score stats** that help reduce the effect of the bias of the aforementioned “regression” methods. Using box score type statistics as a prior distribution helps smooth the RAPM estimates to allow for **some** credit of defenders. For instance, the Bucks play above will give more credit to Brook Lopez and Eric Bledsoe, but only because Brook Lopez obtain a block and Eric Bledsoe obtained a defensive rebound. It’s certainly a flawed system, but takes into better account the defensive actions.

Another method is the **Stop Percentage**, as developed by Dean Oliver. In this case, Oliver focuses in on the instances in which a defensive player **terminates ** an offensive possession. And it is broken up into two “orthogonal” parts, which we will liberally call the **personal effect **and the **team effect. ** The result is a cascading equation that breaks down a play from zero points per defensive possession to the box score actions taken over that possession.

Let’s break this all down using Justin Kubatko’s breakdown of Oliver’s stops calculation.

The first step is partitioning stops into **personal stops** and **team stops**. This reflected into the equation

We define personal stops to be **steals**, **weighted** **blocks**, and **weighted defensive rebounds**. While we know that steals completely terminate the possession. Field goal misses do not. More importantly, defensive rebounds are not entirely attributed to missed field goal attempts. And using box score data, we cannot necessarily separate out free throw and field goal attempt defensive rebounds. Therefore, we need to incorporate a weighting scheme to understand how much a **block** would become a stop and a **defensive rebound **would become a stop as well.To do this, we need to compute three quantities: the **defensive field goal percentage, **the **opponent offensive rebounding** **percentage**, and the **forced miss weight.**

Defensive Field Goal Percentage (DFG%) is simply defined as the field goal percentage of an opponent. It is given by

Opponent Offensive Rebounding Percentage (DOR%) is also simply defined as the percentage of rebounds obtained by the offense during a defensive possession. It is given by

Forced Miss Weight (FMwt) is a slightly more difficult number to compute. It is given by

This quantity appears to be backwards because we think of obtaining defensive rebounds on missed field goal attempts, while this equation coyly places defensive rebounds on made field goals. But that’s not the aim of this equation. The aim here is to **weight the value of a missed FG **versus a **defensive rebound. **In this case, the product is looking at **field goal attempts that either are made or defensive rebounded** versus **missed field goal attempts that are offensive rebounds**. IE: possession ending events on a FGA or possession continuing events.

With these three components in hand, we can compute personal stops as

How do we read this equation? Let’s walk through it.A personal stop is when a player obtains a **steal**, **block, **or **defensive rebound**. That’s the three addition components.

However, blocks and defensive rebounds don’t necessarily create stops. Take for instance, a made field goal, and And-1 foul, a missed free throw, and a defensive rebound. In this case, there is no stop on the possession. **This is where that DFG% comes in with FMwt above!**

The value of 1.07, while not in Oliver’s original work, is an adjusted value to account for the number of rebounds off of And-1 (and similar) free throws.In this case, for **blocks**, we have two components, the **blocks ****that results in forced misses** and the **blocks that result in made baskets**. The first part is obvious. The second part is nuanced as these are blocks that go out of bounds and stay in the offense’s possession or are offensive rebounds that result in points. We must subtract these out.

The third component on **defensive rebounds** are simply the remaining component of the personal stop, as we count all defensive rebounds and subtract out the ones that have had points scored on the possession prior to the defensive rebound.

Now, the second step in computing stops is **team stops**. These are computed rather straightforward, albeit lengthy, using the formula

We call this a **team stop **as these components focus more on the team’s element on gaining a defensive stop. For instance, the first component identifies all opponent non-blocked field goal misses and estimates how many will result in defensive rebounds with no made field goals prior on the possession.

The second component counts the number of non-stolen turnovers committed by the offense and, assuming a uniform distribution over time, estimates the number that should have occurred while a player was on the court.

The third component estimates the number of free throw situations that result in two misses given the personal fouls committed by a player.

Let’s see how these components tie together with the Bucks example from above.

In the play above between the Bucks and the Pistons, we saw that there was indeed one stop on the play. We’d like to give much of the credit to Brook Lopez, but how much credit does he, and his teammates, deserve? Let’s start naively and suppose the entire game lasts one possession for illustration purposes.

In this case, we compute **DOR%** to be **0** as there are no offensive rebounds and **DFG%** to be **0** as there are no made field goal attempts. This will cause stress in the computation of **FMwt** as the denominator will become **0*1 + 1*0 = 0**. As this is a box score estimate, we should require a complete box score for this play. So let’s go back and leverage the teams’ box score stats.

For this Pistons-Bucks game, the Pistons were 42-89 from the field for **.472**. The Pistons also secured 10 rebounds out of 43 possible rebounds. Note that we are skirting the true rebound total as the actual NBA box score does not list team defensive rebounds. This gives us an estimated **.233 DOR%**. Now, we ascertain **FMwt** to be **.746**.

Since the entire team played this segment without breaks, for this particular play, we will have a factor of **0.2** on the unstolen turnovers. However, there are **zero unblocked FGA’s** and **zero unstolen turnvoers** and **zero personal** fouls. Therefore the team stops for this particular play is zero. This means that all contributions are personal driven.

For Brook Lopez, we recorded one block on the play. This translates to a personal stop value of **.5600**.

As Eric Bledsoe obtained the rebound, he also contributes significantly to the stop. In this case, Bledsoe’s personal stop value is **.254.**

For Wilson, Hill, and Middleton, they obtained no steals, blocks, or defensive rebounds on the play. In this case, they all come up Milhouse with a value of **.000**. What this ultimately means is that the credit for the stop comes out to be **.814, **slightly shy of the entire **one stop**.

Now, granted, this is a **box score result**. Therefore, the game shall become completed before we make the estimates. Applying this to one play is unfair to the analytic.

By using the box score, we are able to extract out the estimated number of stops in the game. In this case, we have the following:

In case it is too difficult to read, this suggests that **Giannis Antetokounmpo **obtained 5.898 personal stops with 3.726 team stops for a total of 9.624 stops; leading the team for the night. Brook Lopez, on the other hand obtained 3.204 personal stops with 3.592 team stops for a total of 6.796 stops; good for second best on the team.

Continuing in this manner:

- Giannis Antetokounmpo
**9.624** - Brook Lopez
**6.796** - Eric Bledsoe
**6.180** - George Hill
**6.111** - Khris Middleton
**5.046** - Tony Snell
**3.733** - Pat Connaughton
**3.041** - Ersan Ilyasova
**2.997** - DJ Wilson
**2.474** - Christian Wood
**0.000**

This would suggest there were a total of 46.003 stops in the game. But how would we actually verify this?

The easiest way is to crawl through play-by-play. By doing this, we find that there are exactly 12 stops in the first quarter, 9 stops in the second quarter, 11 stops in the third quarter, and 8 stops in the fourth quarter for a total of **40 stops in total**, identifying six over-estimated stops for the game. This means that despite the box score underestimating the number of stops for a single possession, the number of stops are actually higher across the entire game. **This is not always the case**.

This is actually expected as box score analysis is coarser than play-by-play analysis. However, if we shift our focus to play by pay, what are some methods we can use to determine credit for stops?

One way is to count the number of stops and throughout the course of the game and then fit the number of defensive statistics to each number of stops and perform a “regression” of sorts. This will give answers, but will be quite volatile.

The next way is to simply assign credit to each player based on their stats. We can walk through each possession, and if a stop occurs, we can either **blindly set attribution to 0.2** per player (uniform credit) which will drastically undervalue real defensive stoppers, or we can **weight** defensive statistics. For instance, in the example above, Brook Lopez gets the block and Eric Bledsoe gets the rebound. Let’s credit them with **.5 each**.

But in doing this, we drastically underestimate the amount of contribution supplied by Wilson, Hill and Middleton. If we recall, Khris Middleton’s hold at the nail as Brown attempted to pull the defense, along with Wilson’s drop and Hill’s slip on the fight-though stopped the primary option from occurring. If Middleton blindly follows Brown on the deep cut, Detroit sets themselves with a 1-on-2 with Galloway/Drummond on Lopez.

Furthermore, how do we credit Bledsoe’s gaffe that ultimately led to a pass to Robinson III for a layup? Should Lopez get more credit for the stop because of his read of the play and ensuing block? How do we put some of the onus on Robinson III for not taking a floater and instead crazy-braving himself into a 7′ tall shot blocker?

There’s only really one way: measuring the decision making process of defenders.

In an effort to understand how stops are created is to really dig in deep onto the X’s and O’s of a defense in response to an offense. Ultimately the game of basketball boils down to an offense making a series of decisions in an effort to force the defense to lose synchronization and open up regions of the court where there is a high probability to score. It’s a chess match where the offense primarily dictates the motion.

The defense, in response, can only implement counter-actions to force an offense to make poor decisions. In this vein we measure defensive contribution through the defender’s ability to move the offense into low probability areas of interest. This, by the way, is an open thread of research:

One way to start crediting stops is to look at Detroit’s early offense. Recall this was a DHO between Griffin and Galloway after a reversal and weak-side deep cut from Brown. Middleton’s hold on the nail along with the slip by Hill eliminated the driving lane, which may have been there in the past. Therefore, we can look back at all initializations of this early offense (across all roles) and see all the various directions of play occurred. One way to do this is through **ghosting** to train the **average defensive response**. Then using the ghosting output, we can build a markov model that estimates the decisions made by the offensive team. Using the ghosting/Markov model, we obtain a **probability distribution **on the actions. And we find out in this case, based on this year, the Pistons tend to score an effective 1.02 points off that action.

By the two actions from Hill and Middleton, Detroit’s expected points scored on the play dips to 0.83 points. **That’s a positive 0.19 differential. **If we remove Bledsoe, attach him as a tether to Brown by implementing a **Brown Position – Brown Velocity + Noise** model, and run the Markovian model; Detroit’s expected point value increases to 1.11 points. Therefore that **Middleton action ****may have seemed meaningless, **but it saved the Bucks potentially **0.28 points**.

The challenge then becomes “how do we integrate these components on defense?” For instance, Detroit’s expected point value actually increases with the blitz from Eric Bledsoe; from 0.92 points to 1.08 points. Fortunately, Lopez’s show and Robinson III’s extra step drops the value down to 0.98 before the shot, which is ultimately blocked.

If we integrate out the actions, we flatten out most of the work performed. Therefore, some form of localization between defenders needs to be identified. In the end, Lopez, Hill, and Middleton should get most credit for the play as they thwarted the primary option (Hill/Middleton) and then eliminated a gaffe on a back-turning blitz (Lopez).

And on a further note, the next question is whether the **template** **of the defensive scheme **is the real stopper in this situation as Middleton and Lopez play their roles correctly. How do we quantify this effect? How much credit does Budenholzer get for this? Does he deserve credit?

It’s definitely a real challenge. But if you can figure this out, I’ll see you at the next Sloan Conference presenting your work. For now, we rely on carefully thought out work by Dean Oliver, as missing six stops isn’t bad at all. The next game will be -3, the next 1. It’s all an approximate process holding a place for when we figure out how to better quantify the X’s and O’s.

]]>For instance, how well does a player protect the ball? I pick this category because I’ve had a long belief that a turnover is as bad as a missed field goal attempt with a defensive rebound. They serve the same purpose as no points are scored while the ball falls back into the opponent’s possession. Due to this, my clunky version of computing **adjusted field goal percentage **back in 1997 would divide by FGA + TOV. I hadn’t thought of “points per possession” as a high school kid. Despite this philosophy, we have seen that all turnovers are not created equal, as loose ball turnovers can lead to fast breaks much more often than an offensive foul turnover, or a “kick-the-ball-20-rows-deep” turnover.

In our quest to break down turnovers, we found some much lesser known turnover types. In this post, we look at the distribution of turnovers, describe some of the lesser known types, and then take a look at a select few players with respect to their distributions of turnovers.

As of this morning (27 January 2019) there have been **21,280 turnovers**. That sounds like a lot, however, there has been a total of 734 games played for an average of 29 turnovers a game. That breaks down to 14-15 turnovers per team per game.

The most common type of turnover is the **Bad Pass**. This type of turnover is a **live ball** turnover and has occurred 7570 times throughout the season. The second most common type of turnover is the **Lost Ball**, yet another **live ball** turnover. This occurred 3995 times during the season. This means that at least 11,565 of the 21,280 turnovers, **over half**, are live-ball turnovers that potentially turn into fast breaks for opponents.

After we see **54.34% ** of turnovers become live ball turnovers, we then see a flurry of **dead ball **turnovers, such as the **Offensive Foul **(2790 times), **Bad Pass: Out-of-Bounds **(2357 times), **Traveling **(1479 times), and **Lost Ball: Out-of-Bounds **(1133 times). In total, these make up 7759 turnovers, resulting in **36.46% **of turnovers.

After these two collections of turnovers, we run into the **shot clock violation, **which has occurred 791 times over the course of the season, or approximately 1 per game. The turnovers occur when a team runs out of time on the shot clock before attempting a field goal that hits the rim. **Note: **a deflected pass off the rim does not reset the shot clock.

Despite running through a total of seven types of turnovers, we still have at least another **seventeen types **of turnovers to monitor. Some are quite obvious, but rare: **offensive goaltending**, **backcourt violation**, **double dribble**, and **Kicked Ball**. However, there are a couple rather little known types of turnovers such as the **Illegal Assist**, the **Illegal Screen**, the **Punched Ball**, and the **No Turnover**.

Yes, the “No Turnover” Turnover.

Before we discuss the rare type turnovers, here is the distribution of turnovers as of this morning:

The “No Turnover” turnover occurs when possession of the basketball is lost prior to a field goal or free throw attempt **but the opponent does not gain possession of the ball**. That’s right, there’s a turnover category where the opponent does not gain possession of the ball. Let’s take a look at the nuance of this foul type.

In the October 19th match-up between the New Orleans Pelicans and the Sacramento Kings, **Julius Randle** became one of the first players to pick up the **No Turnover** turnover. In this play, **Darius Miller** is guarding **Justin Jackson **on a drive to the basket that resulted in a miss. Just before Miller secures the rebound, referee **Sean Corbin** calls **Julius Randle **for a loose ball foul.

With the placement of the basketball, the fact that the Kings had given up possession with a missed field goal attempt, and the positioning and timing of the foul, the ball was deemed a **defensive rebound to the Pelicans **without securing the ball. This resulted in a **turnover** as a defensive rebound identifies transfer of possession to the Pelicans, despite the Pelican never having possession of the ball.

In a similar play during a November 16, 2019 match-up between the Brooklyn Nets and the Washington Wizards, **Joe Harris **picked up a **No Turnover** turnover when committing a loose ball foul against Washington’s **Bradley Beal** after **D’Angelo Russell** attempted a field goal.

Again, the ball was ruled as an **offensive team rebound** as the foul occurs during the loose ball scramble, which results in two free throws for Washington. In this case, a new **chance** continues within the offensive possession, however no field goal or free throw attempt is credited before Washington gets a chance to shoot.

What separates this example apart from above is that in the Pelicans’ case, the fouler was on defense while in the Nets’ case, the fouler is on offense. What this actually shows is the loose ball rebound foul after a field goal attempt; that it is common for play-by-play to be marked as a team rebound for the fouling party with a **No Turnover** turnover.

Loose ball fouls on field goal attempts are not the only kind of no turnover turnovers. In fact, there are situations where a No Turnover turnover occurs and no team loses possession. Consider the NBA’s rule book video example. In this case, the possession never really ceases for the red team, but only one turnover is listed.

In this case, Portland never gains possession of the ball, but a turnover is noted. This is listed as **2 possessions with one turnover** according to possession counting. Some may look at this as **one possession with no turnover** despite on being listed. Others may look at this as **three possessions with one phantom turnover**. Just keep this in mind as you look for team-to-team possession counting as this will add in an extra possession and potentially shift an identifier of which team has the basketball; depending on your methodology of possession counting.

The illegal assist is a fun turnover to track. This turnover type identifies players that hang on the rim in an effort to use the rim to assist for a rebound. This has only occurred three times so far this season. By why not enjoy the beauty, and potential hazards if you’re **Derrick Jones Jr.**

Not to single out Jones Jr., **Reggie Jackson** and **Jerami Grant** are the other two culprits to pull this stunt this year.

There are other types of odd-ball turnovers, which also include one instance this season of **Excess Timeout**, which occurred during a Dallas Mavericks versus Oklahoma City Thunder game on December 31, 2018. In this game,at 6:43 in the 4th quarter with the Mavericks trailing, Steven Adams tipped in a missed Russell Westbrook field goal attempt. Rick Carlisle immediately calls a timeout, which he unfortunately does not have.

Despite having the time for a Thunder t-shirt toss game break to discuss things over with his team, Carlisle’s gaffe cost the Mavericks their ensuing possession, resulting in a Westbrook technical free throw.

If the ball is ever going to be turned over, ideally a team would prefer that the turnover be a dead ball situation, allowing the defense to reset and force an opposing team in a half-court possession. Looking at the turnovers across the league, we find that the following players have the highest rates of turnovers that result in live ball turnovers.

Notice who is missing from the Top 25 players? In fact, the **Atlanta Hawks** lead the league in live ball turnover percentage; which is one of their primary reasons for falling behind in games. That is, **533 **of their **868 **turnovers are live ball, resulting in potnetial fast breaks, for an astonishing **61.4% **of turnovers. Compare that to the Tortonto Raptors 56%, Minnesota Timberwolves 51%, Golden State Warriors 54%, Brooklyn Nets 54%, and even the Chicago Bulls 54%, and you begin to see that they are well ahead of teams when it comes to live ball turnover rate. The third place team on this list, **Cleveland Cavaliers**, only sit at 57%. The second place team, **Houston Rockets**, settle in at **59.96%**, but have created less than **650 turnovers **compared to Atlanta’s 868.

Playing the analytics game of rates versus counts, that’s a differential of **three less live ball turnovers a game for Houston** when comparing the two teams and their rates.

On the flip side, by inverting the live-ball list, we obtain the Dead Ball Turnover “Specialists.” These players tend to kill the clock when turning over the ball. While a team would prefer to avoid turning the ball over, these players at least give their team a chance to set their defense up.

Notice that these players primarily are **post players**. This makes sens as turnovers tend to be loose ball fouls, offensive fouls, and lost ball out of bounds. Some highlighted players are **Aaron Gordon** of Orlando, **Giannis Antetokounmpo **of Milwaukee, **PJ Tucker** of Houston, **Kris Dunn **of Chicago, and **Jayson Tatum **of Boston. These players all have significant “touch time” at the perimeter and drive to the basket; but yet their turnovers tend to result in dead ball situations.

Since we are in the middle of a historic run, let’s take a look at James Harden of the Houston Rockets. According to Basketball Reference, Harden is 58-143 from between 3-and-10 feet, 22-42 from between 10-and-16 feet, 6-20 from long-range two’s, and 218-583 from beyond the arc. This leads Harden to have a sample expectation of **1.0482 points per FGA**. Recall that we come to this number by computing the **effective Field Goal percentage **over the regions of interest and multiplying this number by two.

In comparison to the league, through January 25th, there has been a total of 13,246 FGA. Of these 13,246 FGA, the league has taken 37,174 between 3-10 feet; converting 14,817 of them. Similarly, the league is 8491-for-20,731 from 10-16 feet, 4942-for-12,321 from 16-to-3pt, and 15,963-for-45,015 from three-point range. This leads to a league average of **0.9058 points per non-rim FGA.**

Hitting Harden’s 788 non-rim attempts from the field, we see that Harden is a whopping **+112.2112 points over league average** on shooting attempts.

If we compare Harden to his MVP “nemesis” Russell Westbrook, we find that Westbrook’s numbers are 10-for-63 from 3-10 feet, 41-for-130 from 10-16 feet, 47-for-123 from 16′-to-3pt, and 46-for-189 from three-point range. This leads to an estimated expected **0.6614 points per non-rim FGA.** Yikes. This leads to a **-123.43 points over league average**. Pay attention to the negative in that statement. Read that as Westbrook is one of the most detrimental “shooters” in the league. This is consistent with Fromal’s analysis last season as Westbrook was second on the list for the 2017-18 NBA season.

If we turn to Klay Thompson, we find an entirely different story. This season, Thompson has a shot distribution of 25-for-63 from 3-10 feet, 58-for-134 from 10-16 feet, 104-for-218 from 16′-3pt, and 138-for-363 from beyond the arc. This leads to an estimated expected **1.0129 points per non-rim FGA. **Comparing Thompson’s efficiency and volume relative to the average shooter in the league, and we find that Thompson is much like Harden in picking up a **+83.2876 points over league average**.

Note that we focus on volume of shots to separate out **shooters** from non-shooters who happened to have luck on their side.

And before we continue, we selected Klay Thompson instead of Stephen Curry for a very specific reason. For those who may be interested, Stephen Curry leads the league in points per non-rim attempt (at high volume) with a phenomenal **1.2187 points per non-rim FGA**. This leads to a yet-again league leading **+197.4402 points over league average** when considering his volume.

Given the three players above: James, Russ, and Klay, we have identified three different types of shooters:

Harden is the **playmaking scorer-shooter combo**. This type of player generates their own points and can tear apart a team from long range. This is the deadliest type of player in the league. Defenses have to make conscious decisions on whether to guard the drive, guard the pullup/stepback, whether to blitz/double and leave another shooter potentially open, or have to leave the shooter in off-ball situations in help defense.

If we think of the **scorer-shooter combo**, there are three levels of this player despite doing both. Harden is a **SCORER-shooter** while Curry, mentioned above, is more of a **scorer-SHOOTER**. Something we will touch on later.

Westbrook is the **playmaking scorer**. Westbrook is a high-usage player due to his ability to get to the rim and collapse defenses. Not known for his shooting touch, Westbrook shoots just enough, call it “Marcus Smart enough”, to make defenses think twice before giving him space at the perimeter. Westbrook generates offense more through his scoring abilities but will tend to lose games if forced to take all the big shots outside of 3-feet. Hence the reason for Paul George’s over-the-top strong emergence this season; reminding us of the Indiana days of PG13.

Klay Thompson is the **shooter**. This type of player is a pure shooter than can pick apart a team at any time they want. Sure, Thompson can generate points on his own, but he’s best utilized as an off-the-ball catch-and-shoot monster that can put up 20-30 points in a hurry. He is the perfect complement to a **playmaker** such as Stephen Curry or Russell Westbrook.

**Side Note: **If you are unsure of the difference between a shooter and scorer, feel free to have a discussion in the comments. This is a very important distinction that is made when discussing players around the league (and has been for well over a decade).

Now suppose we are interested in evaluating three players that are respective teammates to Russell Westbrook, James Harden, and Klay Thompson. Suppose these players are considered equivalent defensive players. And furthermore, to constrain the problem, suppose they play the same number of possessions as each other with their respective teammates, playing identical opponents, and have identical net ratings.

We’d like to ask, **which of these three smaller-fish players are more important to their offenses? **And it’s here where the “missing-ness” of stats rears its ugly head. This one being the **missed FGA off a** **pass**, also known as the **potential assist**.

A **potential assist **is a situation where a ball-handler make a pass to a player who takes a field goal attempt within the determined amount of time an effort required of earning an assist, called an **assist window**, if the field goal is converted. Tracking assists is easy. When a field goal is made the play-by-play logs tack down who the passer was, if there was a passer within the assist window. However, when a field goal is missed, the assist field is zeroed out as no assist was made. Tracking these assists are relatively easy, it just isn’t done.

Instead, we are forced to look at other methods for determining a potential assist. For instance, we can look at **tracking data** and surmise a **filtering algorithm** akin to extracting passes. But for assists does that actually work? Let’s look at what the league has to say about passes:

An assist is a pass that directly leads to a basket. This can be a pass to the low post that leads to a direct score, a long pass for a layup, a fast break pass to a teammate for a layup, and/or a pass that results in an open perimeter shot for a teammate. In basketball, an assist is awarded only if, in the judgement of the statistician, the last player’s pass contributed directly to a made basket. An assist can be awarded for a basket scored after the ball has been dribbled if the player’s pass led to the field goal being made.

Therefore, unlike passes, there is no distinct rule-based definition on what constitutes an assist. it is literally defined as a **subjective statistic**, which can be defined differently across different teams. Therefore, we cannot easily place a rule-based mechanism like we did in the past for passes, after all. Instead, we turn to the work of **machine learning**.

Ultimately, we need to know whether passing to James Harden, Russell Westbrook, or Klay Thompson is going to improve a teammate’s chances of receiving a reward such as an **assist** for a made basket or a bump in **points produced** and therefore increasing their **offensive** **rating. **By looking at the hard numbers above, if we all wanted to pad our stats then we’d all want to be Klay Thompson’s or James Harden’s teammate. Or do we?

In an effort to build a potential assist model, let’s apply a **supervised learning **technique to help introduce **labels** and **training ** into our system. Fortunately, we have a sample of labels already gathered for us through the play-by-play assist. To start, we can walk through every **made field goal attempt **and split them into two classes: **assisted field goals** and **un-assisted field goals**. Using a “0/1” label as our **response**** variable** we can employ some sort of model to identify the differences between certain **explanatory variables** such as dribbles taken, feet traveled, seconds between pass and shot, etc. in an effort to understand if a player takes two dribbles after receiving a pass **could** the passer be credited with an assist.

Immediately, to the novice user, a **logistic regression** model comes to mind since the response is binary. However, one issue that arises with logistic regression, is that we must **assume that the log-odds ratio is conditionally linear with zero multicollinearity**** across all the explanatory** **variables**. More importantly, this conditional model must satisfy the **exponential family assumptions ** in the log-odds space, which, unfortunately, usually ultimately fails in basketball analytics.

Next, we could leverage a **neural network** to do our dirty work for us. And indeed we could. However, we have a better idea for teaching some neural networks in a future posting, and why not go crazy in learning something entirely different…

A fairly flexible methodology in classification is the **support vector machine (SVM)**. In practice, this is called a **separating hyperspace **algorithm that aims to take the explanatory variables and **split** the classes using hyper-planes until all classes are split into uniform regions. Let’s look at a really basic example.

Suppose we sample **1000 points** within the unit square with a decision boundary decided by some **5th-order polynomial**. Anything below the polynomial is considered **class 1** while anything above the polynomial is considered **class 2**. Given the 1,000 samples, we can easily see the boundary:

To show we’re not hiding any cards up our sleeves, here’s the plotted decision boundary. between the two classes:

And you can even try this at home:

<code> import numpy as np import random</code></pre> x = [[],[]] y = np.array([]) cols = [] for i in range(1000): p = random.random() q = random.random() boundary = .5-(124./15.)*p + 44.*p*p - (1016./15.)*p*p*p + 32.*p*p*p*p if q < boundary: y = np.append(y,0) cols.append('blue') else: y = np.append(y,1) cols.append('green') x[0].append(p) x[1].append(q) dots = np.linspace(0,1,100) bounds = np.zeros(100) for i in range(100): p = dots[i] bounds[i] = .5-(124./15.)*p + 44.*p*p - (1016./15.)*p*p*p + 32.*p*p*p*p plt.plot(dots, bounds) plt.scatter(x[0],x[1],c=cols) plt.show()<code>

Now, if we apply a Logistic Regression, we obtain the following results:

X = np.array(x).transpose() clf = LogisticRegression(solver='lbfgs').fit(X,y) yhat = clf.predict(X)

And we find that we have a success rate of approximately 75% of correctly classifying the points! That’s actually not too good given we can easily see the boundary. This terrible results comes from the fact that this particular boundary problem and associated distribution requires a **curved exponential family** to improve on its boundary. That is, we’d have to develop a weighting scheme in order to satisfy the assumptions of the logistic regression. In two-dimensions, this is rather straightforward. However in multiple dimensions, we get into a lot of trouble as we cannot view the results.

A **support vector machine** will look for a collection of separating hyperspaces to partition the two classes. In the two-dimensional case, we will identify segments of straight lines that partition the data. If we assume a linear boundary, this will give us the best fitting “linear model”:

However, we don’t restrict ourselves to the linear model in SVM’s. We actually employ what are called **kernels**, which give weight to each data point. When paired with the potential separating hyperspace and the observed classification label (assist or non-assist), we obtain a **“linear” **boundary as such:

The image on the left shows the “Logistic Regression” type model with a linear discriminant. The image on the right shows the learned “linear boundary” from SVM’s. (Image from Elements of Statistical Learning)

If we apply this to our scheme, we find we obtain a much better classifier.

clf2 = svm.SVC(kernel='rbf',gamma=10) clf2.fit(X,y) yhat2 = clf2.predict(X)

Here’s we applied a **radial basis function** as a kernel and settled on the value of **10**, which is a smoothing parameter for the radial basis function. Selecting this parameter should be performed by **cross-validation**. In this case, the value of 10 from one-fold cross-validation gave us an average error of **0.03%**. Much better than the **25% **from logistic regression. **And this was on well-separated data.
**

Crediting an assist to a made field goal is not a well-separated distribution. There have been several instances where a play will be credited an assist for one player, but the same action may not be credited for another player. In these cases, this boils boil to the differences in judgement between two different crediting statisticians. Using the assist crediting for converted field goals, we can train an SVM model to identify key features for determining an assist when a FG is made.

Many of these features need to be teased out of tracking data, and unfortunately due to the exclusivity of the data, I cannot share code or even the data itself. However, if you get your hands on tracking data, you can test out some of these features. Note that in these results, we will use the notation of **class zero **being **no assist on attempt** and **class one** being **assist on attempt**. Here are the primary features that yielded great results

Yes, this is an obvious one. Passes are highly correlated to assists. And due to this we can immediately set field goal attempts where there were no passes were made to class zero. This is a well-separating feature and is the by far the most dominant feature in determining assists.

The second feature that well-separates the classes is the number of dribbles. And it’s also this one that starts to make situations a little mixed in the results. In fact, this season, there has been a couple assists generated off of three or more dribbles after the pass. For the most part, it’s effectively **one or zero **dribbles. Due to this draw down, there’s some room for error in predicting an assist.

We can also measure the amount of time between receiving a pass and taking a field goal attempt. The significant range lays within the first 1.5 seconds of a shooter receiving the ball. A lot can happen in 1.5 seconds of action. Despite this, we do find a significant bulk of assists lay (softly) around this boundary. This feature is the third most significant feature and actually gets tangled up with the above feature and next most significant feature.

Through film study, we notice that a player who “swings” their velocity impacts whether an assist gets credited. A “swing” in this case is when the player’s velocity vector swings from **going along the axis of a FGA attempt **into an **entirely different direction**. Just like a swing.

We use the axis notation as a player may be slowing in their direction towards the basket. And in fact, it’s not the player we measure, but rather **the basketball**. The example is given as a pass into the post. A player who catches the ball may be on the run, and hence their velocity vector is pointed at the basket.

In the cases of a turn-around, the velocity vector will point away from the basket, but along the same arc. These tend to be credited as assists as well. However, if the player makes an extra move, then the assists may no longer be an assist. For example, a player may stop and pump fake. Or the player may perform a cross-over or spin move. It’s these points where the judgement begins to become mixed.

Despite the judgement, we see the velocity vector of the basketball start to become **orthogonal** to the direction of the basket, which indicates a **basketball move** is occurring and the assist is more than likely going to evaporate.

Therefore a velocity swing is the **cosine angle** of the player’s velocity vector towards the basket and the basketball’s cosine angle of the basketball’s velocity relative to the player’s velocity. **Note that this value is always between 0 and 1**. If we integrate the cosines over the time between reception and attempt (feature three) we obtain out total amount of **velocity swing**. Small values of these lead to assists.

Using these features and a leave-one-out cross-validation, we obtain a 98.77% recall rate of crediting an assist when a field goal attempt is made. Not too shabby! This means we will typically potentially mess up 1-3 shots per game as teams tend to shoot between 150 and 200 shots, combined over the course of that game. We can live with this as, after all, assists are subjective to being with.

Now recall that we used actual assists to learn out SVM. Despite this, we never actually used the made field goal to train our data. Therefore, a missed field goal attempt suffers the same fate as a field goal attempt in the eyes of the assist. As a thought exercise, we show a creditor 100 “made field goals” and simply cut off the video before each ball was released, tell the creditor that “yeah, it was made anyways,” and we ask whether the play was credited as an assist. It then turns out all of these attempts were misses; it does not change the outcome of the experiment.

In this case, we apply the potential assists to all of our games that James Harden, Russell Westbrook, and Klay Thompson have played. Due to the availability of the data, we have only every game through January 16th of 2019. Despite this, we have the following results of our SVM:

And immediately we see the differences between these three candidates; and the reason why we selected these three players.

Immediately popping off the page is James Harden and his **7.57% of field goals being potentially assisted! **That’s absurd. Furthermore, when he is potentially assisted, Harden posts an effective field goal percentage of 0.6094, which leads to an estimated expected **1.2188 points per FGA. **Of course, the rim-attempts are tangled in here; so be cautious with the stats.

That said, we find that only **8.81% of Harden’s three point attempts are potentially assisted**. Again, a counter-intuitive game plan according to the catch-and-shoot trends in the league. In fact, Harden’s 3P% in potentially assisted attempts is **37.2%**, which is almost identical to his **pullup and stepback **three point game, which is at **37.5%**.

What this suggests is that we should up-weight a teammate’s assist total when they work with a high-usage player like Harden due to the fact that Harden will make significant moves after receiving the ball. Being Harden’s teammate when it comes to measuring true passing vision as most passes will not end up in attempted shots. Therefore a simulation mechanism needs to be in place for ascertaining value of the pass.

On the other end of the spectrum, Thompson is a passer’s best friend. Here we see that Thompson is fairly high up in percentage with **67.69% of all his FGA being potentially assisted**. More staggeringly, **over 90% of Thompson’s three-point attempts are potentially assisted**. For all high volume shooters, this is the highest in the league (by far).

Much like Harden, Thompson’s efficiency barely changes depending on the three-point attempt; as he is a **37.8% **shooter in potential assist situations and slightly over **38% **in all other situations.

Westbrook is the passing teammate’s nightmare; in the sense that an assist is not likely to get credited if Westbrook shoots the ball. Due to this, since Westbrook is an MVP caliber player capable of making plays and winning games, the teammate needs to make the pass. With this in mind, we can up-weight this player’s assists totals much like HArden’s teammates, as they are making the passes and just not getting the results. In Harden’s case it’s an extra action that’s taken. In Westbrook’s case it’s just bad luck.

As a note, Westbrook shoots **27% on potentially assisted three point attempts** while dropping down to **23% ** on pullup and stepback attempts. In this case, we actually see a fairly significant improvement in percentages; regardless of the low percentage.

By leveraging a machine learning algorithm like a support vector machine, we are able to start developing models to help us understand difficult to measure quantities such as a potential assist. There are many more ideas we can pop out using this type of machine learning capability. For instance, a follow on question may be, **can we use extra features to identify designs of plays in-game that will CREATE potential assists?**

The short answer?

Yes.

]]>