This is Kevin Durant‘s percentage of field goal attempts, aggregated by specific distance for the first two seasons of his career. This table gives some information, indeed, however does it really paint the picture of where Durant takes his shots? More importantly, are we able to make proper decisions about the style of play for Kevin Durant?

The short answer is, well… not really.

Commonly, we find that much of the analysis about player tendency and capability stops here. We talk about at what distance a player takes their shots and then typically jump to **effective field goal percentage** and translate that to rudimentary calculations of **expected point value **per field goal attempt. Some analysts attempt to take this one step forward and produce a **shot quality metric** to identify the quality of shot, which actually doesn’t use the above information explicitly.

What happens if we produce another player with almost an identical table? Are these two players the same? Sure, we could build a **Chi-Square Test** to compare the players, but we may be rudely woken up to the fact that neither player is the same. Let’s take a look at these two players:

Can you guess the two players? They have very similar distributions and, while still being **significantly different **according to the Chi-Square test, it’s mainly due to the **failure of the Normal assumption** for the small values in the table. **14 ****versus 45 **causes **73% of the test statistic**. But who are these players?

On the left we have **P.J. Tucker **of the Houston Rockets. On the right, we have **Brook Lopez** of the Milwaukee Bucks. They are both three-ball-dominant shooters with a tendency to attack the rim. As Milwaukee has modeled their offense much like the Houston Rockets, it’s no surprise these two shooters appear to have the same distribution of field goal attempts. Or do they?

If we take a quick glance at Brook Lopez’s shot distribution, we find that he primarily takes attempts between the **-45 degree** to **45 degree** range along the top of the key.

We see the ghost town of field goal attempts in the mid-range, as well as the string of short-range attempts that litter the key.

Comparing this to PJ Tucker and we obtain an entirely different story.

We see that almost all FGA occur in the corners. We also see the ghost town of mid-range attempts. The shots in the lane? More along the baseline than being a steady stream towards the free throw line.

It is clear that the distributions are no longer the same. But how do we measure their difference? One solution is to use **shooting zones.**

A shooting zone is a region of the court that encapsulates field goal attempts at different locations on the court. It’s a step in the right direction as we can now differentiate between a corner three and a top-of-the-key three. Similarly, we are able to differentiate between a left-corner three versus a right-corner three.

Take for instance, Brook Lopez’s shot chart from NBA Stats. It’s a little misleading only due to the fact that they combine both frequency and efficiency. The colors indicate efficiency while the fractions indicate frequency. Here we see the high volume along the top-of-the-key zones.

We see the same misleading representation with PJ Tucker and again focus on the fractions.

And we see a nearly “inverted” plot as majority of PJ Tucker’s three-point attempts are located in the corners.

While this “one step further” plot helps us, there’s still a ton of information left on the cutting room floor. For instance, Brook Lopez is a -45 to 45 degree shooter. The zonal plots do not capture that activity. Right elbow and left elbow are not differentiated, where almost every player favors one over the other. A dunk is also values as much as a hook shot according to the zone distributions.

There’s just a lot of information still being lost.

We turn over to the next step further. Basketball shot charts have been around for **decades**. Kernel density plotting of basketball shot charts, too has been around since decades. In 2001, I had to write code for a list of **x,y-coordinates** into a **kernel density algorithm** algorithm using a seemingly newfangled programming language called MATLAB (It wasn’t new and I wasn’t alone). And when the KDE revolution finally started to take hold in the media nearly a **decade later**, being called **heat maps** at this time, there were still significant flaws in some people’s designs. For instance, old plots would not include distance skewing such as a **log-transform, **a requirement in effort to show actual three-point effects in scoring. Yes, that is a post from **four years ago** as a knee-jerk response to poorly displayed ESPN shot charts at the time. That shows the log-transform representation.

If we apply the density function formulation here,we can obtain kde plots for both Lopez and Tucker.

Of course, we’d like to play with the bandwidth to make the charts “prettier.” This is simply an out-of-the-box method using Python. We of course use the **jet** **color map option** from Python, a MATLAB classic color map, to display the **heat** associated with a field goal attempt.

We immediately are able to surgically identify locations of every field goal attempt by both players. And more importantly, we have an **nonparametric** **approximate distribution** for each shooter’s field goal attempts. And unlike the “second step further” plots that we skipped over with **scatter ****(hexagon) plotting,** we’re not solely dealing with empirical data points, which by the way, **are noisy to being with**.

And armed with this distributional knowledge, we can finally start to say something intelligent with shot chart data. Yes… there’s been negligible intelligence obtained thus far.

Our discussion started by asking about the similarities between **two players**. While this is helpful in understanding where players are positioned, this is rarely the question that we would like to answer. In order to understand the question we really want to answer (and we haven’t asked just yet), we will tackle this thought exercise first in an effort to understand **Kullback-Leibler Divergence. **

Kullback-Leibler Divergence is a method for measuring the similarity between two distributions. Developed by **Solomon Kullback **and **Richard Leibler** for **public release **in 1951, KL-Divergence aims to identify the **divergence** of a probability distribution given a **baseline** distribution. That is, for a target distribution, **P**, we compare a competing distribution, **Q**, by computing the **expected value of the log-odds of the two distributions:**

Here, we used the one-dimensional notion, the two-dimensional notion is similar; just use a double integral with **t := (x,y)** and **dt := dxdy**. It’s obvious that if the two distributions are identical, then the integral is **zero**.

Also, with a little bit of work we can show that the KL-Divergence is **non-negative**. Meaning, that the smallest possible value is zero (distributions are equal) and the maximum value is **infinity**. We obtain infinity when **P** is defined in a region where **Q** can never exist. Therefore, it is common to assume both distributions exists on the same support.

The KL-Divergence is a technique that spawned from research performed at the National Security Agency. Richard Liebler, who would eventually become the Director of Mathematical Research, and Solomon Kullback, who then focused on COMSEC operations, developed the methodology while analyzing bit strings in relation to known coding algorithms. The aim was to identify **shared information** in effort to exploit **weaknesses** shared between known crypto-algorithms and crypto-algorithms in the wild. Since its public release, KL-Divergence has been used extensively across many fields; and still is considered one of the most important **entropy measuring tools** in cryptography and information theory.

If we apply KL-Divergence to shot charts, we can immediately begin to compare the **spatial representation** of the two shooter’s tendencies. To do this, we must build a **quadrature** to estimate the integral from the KDE. This is a relatively straightforward method that can be exploited using the **scipy.integrate.dblquad** package in Python, or crudely using the **midpoint rule**. Either way, the answers are similar. Just be sure to **assign the shot charts to be numpy arrays**.

For the case of Brook Lopez and PJ Tucker, we obtain a **KL-Divergence of 0.0929**. This is a relatively small KL-Divergence, but it could be smaller! Let’s compare this to **Rudy Gobert **of Utah. As Gobert rarely shoots three point attempts, we expect a much larger **KL-Divergence**. In fact, the divergence of Gobert from Tucker is **47.5551!**

Immediately, we gain an idea of differentiation between the players’s shot location tendencies. In order to identify **where players differ**, all we need to do is look at the integration process; **exactly like we did with the Chi-Square Test above! **And it’s here that we see it’s specific locations that we mentioned above that differ between Lopez and Tucker.

Now that we know how compute KL-Divergence, we need to understand what it is telling us. First, **KL-Divergence is not a metric!** A metric, by definition, is a measurement function that satisfies three conditions: symmetry, non-negativeness with equality at zero, and the triangle inequality. **KL-Divergence only satisfies the second condition**. Due to this, we call it a divergence instead of a measurement.

Since the divergence is not symmetric, we **must specify the baseline distribution**. This distribution is **Q. **This seems counter-intuitive since the expectation is taken with respect to **P**. But there’s a simple explanation for this.

We think of **Q** as prior knowledge. Either a known cryptosystem in 1945, or a **current player of interest**. We then introduce a new observation: a new bit sequence or a new player. Now, given knowledge of the current player, how “**alike**” is the new player to the old? In order to understand the new player, we consider the new player **as new information introduced to the old player**. Therefore, the new player is a **posterior distribution**. If the **posterior does not change**, then the new player is exactly the same as the current (prior) player.

Therefore, the **0.0929 **indicates how much **PJ Tucker diverges from Brook Lopez in shooting frequency**.

Now… that’s not so much the intelligence part. Let’s get to that.

We can leverage the KL-Divergence in an effort to understand changes to **offensive schemes** and reaction to **defensive maneuverings**. The **most explosive revelation leveraging KL-Divergence** is **measuring field goal attempts with respect to BLUE action**. That is, when perimeter defenders in PnR situations move to a seemingly unfavorable defensive position in an effort to divert the PnR into a favorable defensive match-up. This past year alone, BLUE situations on the left wing led to a KL-Divergence of **10.373** when compared to non-BLUE situations. That’s almost entirely generated off the changes in shots becoming left-wing / left-wing in BLUE situations versus right-side/at-rim from middle-lane location in non-BLUE attempts.

We can also begin to analyze changes in shot frequency, a bane for understanding perimeter defenders. Using the KL-divergence, we can start measuring the **changes in frequencies** due to **close-outs** and **quality perimeter defenders **to help understand when teams are **not taking the three they usually take**. Granted, we cannot simply use **defensive three point shooting as a** **metric** and we certainly cannot use simple frequencies of shooting (they’re too few in a game). But we can build a distribution and measure the KL-divergence, which helps **borrow strength** from nearby field goal locations and allows us to start asking which features lead to changes in KL-Divergence.

In doing this, for this given year, you’ll immediately start seeing the defensive differences in two former Spurs: **Danny Green **and **Jonathon Simmons. **One being significantly “better” at perimeter defense than the other.

Similarly, if an offense uses a PnR action that leads to a rim-running event, **where are the field goal attempts likely going to be generated**. If DeAndre Jordan is swapped with Enes Kanter, **we will see a ridiculously different result**. This indicates that the same action with different personnel yields different results. We can peel back the integral and see exactly where the spatial locations vary** and understand how those locations impact the divergence**.

Combining this knowledge with those players’ efficiencies, and we start gaining insight of where we want to push the ball on defense. And, more importantly, how we might want to rotate on defense.

Remember though that changes in KL-Divergence does not mean good or bad. It simply means **change**. It’s not a **target variable**, but rather a methodology to quickly run through several iterations of teams and players, giving insight as to which players are similar in which situations and which teams are similar in others in certain situations, and even (if applied to a same team) how a team makes adjustments over the course of the game.

To gain insight of **good or bad**, we must then build the analytical model that identifies good and bad. Be it an expected point value, or some other win-shares type action.

]]>

In this game, with the use of a 21-foot three point line, Columbia defeated Fordham 73-58. Columbia managed to knock down 11 three point attempts to Fordham’s 9 makes. The 73 points marked a Columbia school record at the time. It was proposed that the **actual score** would have been 59-44 in favor of **Columbia*.** Despite the increased scoring, many fans were left confused and upset over the new rule. Even leading the New York Times to write that the three point line “experiment” had been “far from a howling success” and that the three point line would “die a natural death.”

If you notice that the **actual** score was supposedly 59-44 despite only 11 and 9 threes for each team, respectively; then you are quick to realize that something else was being experimented in this game. If we were to subtract out the extra 11 and 9 points, respectively, the score would have been **62-49, **leaving anther 3 points on the table for Columbia and 5 points for Fordham.

This is due to an extra experiment **that has never caught on since** where players had the option of shooting free throws from the foul line for 1 point or the top of the key for two points. On free throw trips with two or more free throw attempts, the player could only score a maximum of three points, as only one attempt could be selected as a two-point try.

The three point play was seen as diminishing team play, as players would race to the three point line to shoot instead of passing the ball. In fact, according to the New York Times, several players were called for **traveling ** as they forgot to dribble while sprinting to the three point line.

Similarly, there were complaints that the three point line **ruined zone defenses** and required less strategy for offensive teams. This complaint was exacerbated **by a third rule change during that Columbia-Fordham game**: the lane was widened to 12 feet from its original six feet to test spacing effects.

What is interesting about the Columbia-Fordham game is that of the 1000+ spectators present for the game, roughly **250 collegiate coaches and league representatives **were present for the game. Shortly after the game, they submitted votes on whether the league should invest in possibly establishing the new rule changes. The votes were as follows:

- 148 in favor of a three point line, 105 opposed
- 152 in favor of widening the foul lane, 65 opposed
- 133 in favor of the 2-point foul shot, 85 opposed

It would be a while before the NCAA accepted any changes.

The collegiate ranks attempted the three point line a couple more times over the following two decades. On February 1st, 1959****,** thirteen years after the previous experiment, Siena and Fordham used a **23′ three point line**, where it was reported that each team scored once from that range and “then forgot all about it.”

**** **In an attempt to track down a source for this game, the January 4th game is not listed on any major media news outlet. The game between Siena and St. Francis on February 2nd, 1958 (the only other 1958 meeting between the two teams) is listed in the New York Times, and no three point field goals are mentioned while the box score accounts for all points as free throws and two-point field goals. Upon further research, the game was erroneously listed for 1958 in the Dartmouth magazine, as it actually happened on **February 1,** **1959**.

In this game, St. Francis defeated Siena 67-50. In this game, St. Francis attempted 6 three-point tries while Siena attempted 9 of their own. Each team did indeed connect on one apiece, as indicated in the box score and summary:

The three point experiment would not be revisited again until 1961 in a game between Dartmouth and Boston in a **wildly different three point plan: **Every FGA counted as three points. Dartmouth’s head coach at the time, Alvin Julian, was growing infuriated with fouling and increased foul shooting. His response was to go to the Ivy League board and get permission to experiment for a game with three point field goals instead of two in an effort to incentivize scoring over foul shooting. Boston University, also mired in a dismal season, agreed to the experiment. The result did not do much to change the game, and it was **the only time in the NCAA and NBA that the three point line was at zero feet for an official game**. Take that for trivia.

Despite the first three attempts gaining mixed reactions to outright discouragement, the three point line slowly began to take hold. The American Basketball League used a **25′-foot line** in 1961. The Eastern Professional Basketball League adopted a similar rule in their 1964 season. Unfortunately, the ABL folded in December of 1962 after one and a half seasons, and the EPBL rebranded itself in 1971 as the Eastern Basketball Association (and eventually the Continental Basketball Association), a “feeder” system into the NBA and ABA.

Seeing the development of skilled shooters in the EBA, George Mikan, then Commissioner and Founder of the ABA, adopted the three point line in 1968 as a means to supposedly “give the smaller player a chance to score and open up the defense to make the game more enjoyable for the fans”. This is according to Wikipedia, as the Associated Press link is now defunct.

The three point shot was viewed as a gimmick, as the previous experiments had been decried by critics and other leagues that used them had folded so quickly. However, the ABA turned this into a marketing tool. The NBA was viewed as a slog with focus on small ball-handlers, dominant big-men, and repetitive high-paced dump and chase attack of 5-10 foot hook shots and rebounds. Instead the ABA had high flying dunks and three pointers. In fact, the ABA had not only adopted the three point line, they were embracing it with teams averaging **over five 3PA a game from their first season!**

For comparison, the NBA wouldn’t hit that mark until their **tenth season** (1989) using the three point line, when teams were finally attempting **6.6 3PA per game**. It should be noted that the league scratched the 5.0 attempts mark in their ninth season, still less than the ABA rate.

In 1979, twelve years after the ABA, the NBA had finally adopted the three point line. In its inaugural season, the three point line was used on average of **2.8 times per game**, a far cry from the ABA’s **5.0 attempts**. It took quite a while for teams to adapt to the three point line, as it was still seen as a gimmick. Shooters had not yet developed to effectively and consistently knock down three point attempts in the early 1980’s.

Even less, so, the **three point offense** was almost never used as the shot was seen as invaluable. **Effective field goal percentage** had been forgotten about since its inception in 1945.

In 1984, FIBA adopted the three point line, setting the stage for International teams to develop their skill set from beyond the arc. The line was slightly shorter than all previous attempts at **20.5′**. And many teams still did not adopt offenses that could maximize its potential. It was still seen as a gimmick, but also leveraged as a means to spread the court and possibly give more value to smaller players.

In 1986, after five years of scattershot experimentation in conference play, the NCAA finally adopted the three point line. Like FIBA, the three point was shorter: this time being a mere **19.75′ **from the basket. Despite still seen as a gimmick and as only a means of aiding smaller players against bigger, supposedly more-athletic, players, teams quickly adopted the three point line. Michigan attempted 11.4 3PA per game in its inaugural season, with 16.8% of FGA (366 of 2175 FGA). Duke also was taking over 11 3PA per game (11.2) at a rate of 18.9% of FGA being from three. Even, the famed Loyola Marymount team, who had received transfer (sitting the year) Bo Kimble only attempted 14.25 3PA per game with a rate of 21.2% of FGA resulting in a 3PA.

In fact, during the Loyola Marymount run-and-gun days, the Lions never crossed the 30% frequency mark despite posting scoring totals of upwards of 130+ points; the 1989-90 team averaged 122.4 points per game. In their final year with Kimble, Hank Gathers, and Jeff Fryer, the Lions would raise the bar to attempting **23 3PA per game** (737 3PA over 32 games) but only as **26.2% of their FGA** (737 of 2808 FGA).

The NCAA three point revolution may have started, but it hadn’t really taken hold for any team just yet.

In the 1988 Seoul Olympics, international teams finally were able to test their abilities at three point shooting. Some teams were sheepish. For instance, the 1988 USA Men’s team attempted a whopping **14 3PA **out of **181** FGA over their tournament play. That’s less than **5 3PA** per game with only **7.7% **of FGA being 3PA.

On the other end of the spectrum, the 1988 China Men’s basketball team attempted **40 3PA** over their two classification games for **20 3PA per game!** Of their 104 FGA, this was the first true team to start attempting a significant amount of three’s at the international level, with **38.5% **of FGA being 3PA.

For all other teams in the 1988 Olympics, almost every other team settled in at between 13 and 17 attempts per game, all at roughly 20-25% of their FGA’s. This was on-par with traditional NCAA teams at the time; except for the bronze-medal NCAA all-star team that USA put out.

The Chinese Men’s basketball team was the third smallest team in the field, with a mean height of **6’4″** and a median height of **6’5″**, only one player standing more than **6’7″ **on the team. Comparatively, the **fourth**** shortest team Egypt** boasted only a slightly higher average height (still only) **6’4″ **with a median height of **6’5″** as well. The shortest team in the tournament? **South Korea** with a mean height of **6’2″** and a median height of **6’2″** as well. The second shortest team? **Central African Republic** with a mean height of **6’4″** and median height of **6’3″**. It was no surprise that these four teams finished at the bottom of the classification. And, with exception of **Australia, **these teams had taken the most amount of three’s per game, as South Korea had one high game of 30 3PA out of 70 FGA.

From the 1988 Olympics, the old tale of the three point line trying to aid unskilled short players in a big-man’s game still rang true. It was still a gimmick, and teams that used the three point line could only hope to keep games close in an otherwise “should-be-routed” game.

**Interesting Side-Note: **The United States had the fifth smallest team in the tournament with an average and median height of **6’6″. **Only two players were taller than 6’10” for that team: Charles Smith and David Robinson.

Unfortunately, footage from the 1988 games is relatively sparse and there exists no play-by-play from those games to measure the amount of impact China had with their three point shooting.

In 1992, the Barcelona Summer Olympics had two milestone achievements in basketball: the Dream Team of the United States and the second Olympics with a three point line. **Angola** flipped the script and took an **excessive amount of three’s**.

In 1992, the Dream Team came to fruition and took the Barcelona Olympics by storm. Not only had the United States brought in much more skilled players, as they finally were able to leverage their professional system, but they also had much more size on their roster. With an average and median height of **6’9″, **the Dream Team instilled fear of attacking the rim onto their opponents. With **Michael Jordan **and **Charles Barkley** being the two shortest players (Stockton was listed, but sidelined with a knee injury) at **6’6″**, teams reverted back to using perimeter offense as a means to survive.

First victim? **Angola**.

Angola became the first team to try such an offense: the three point and rim offense. Of their **68 FGA**, Angola attempted **37 3PA** for a rate of **54.4% of FGA attempted as 3PA**. Over half of their field goal attempts were threes! Similarly, the Angolans attempted to get shots at the rim with **11 **of their **31** **FGA** within three feet of the rim.It was a short lived plan, as **Patrick Ewing, David Robinson, Karl Malone**, and especially an elbow-happy **Charles Barkley** denied interior shots as the game wore on, forcing Angolan players to settle for the mid-range.

And with a terrible efficiency from Angola, the United States settled after a shaky first seven minutes to route Angola 116 – 48. Angola had finally tried something that hadn’t been done before: layups and threes. It was still viewed as an inferior team trying to get equalization against a far superior team, but the table was set for high-percentage three point teams. It only needed more skilled shooters.

In the 1993 season, head coach David Arseneault of Grinnell College identified that the Pioneers were not having fun playing basketball. Before his arrival in 1989, the Pioneers had 25 consecutive losing seasons; and in his first couple years, players that were not receiving enough playing time were quitting after their first year. In response, he decided to make the game “more fun” and developed elements from the fairly tame Loyola-Marymount up-tempo offense. A team, I personally became familiar with in college spending two of my four years of collegiate basketball playing at a small Division III school.

The Grinnell **System** is an unorthodox offense that focuses on full-court pressing, quick shooting, crashing for rebounds, and **giving up uncontested layups**. Yes you read that right. In my first game against Grinnell, I had been warned that the Pioneers will abandon the defense if the shot clock drops to 25 seconds. In my first possession against Grinnell, we set into the Princeton 4-out offense, one pass was made, two cuts, and **three Grinnell defenders sprinted back to their half of the court**. This left a lane open for our wing to drive in to the hoop. As he drove in, **one Grinnell defender went underneath the basket to collect the make** as **the other defender ran to the sideline for an outlet pass**.

We didn’t recover well enough as four of our players crashed. Grinnell sent the ball down the sideline, took a three and missed. The two other cherry-picking players crashed and kicked the ball right back out for a second chance, this time making it. Score: 3-2 Grinnell.

Prior to our game, our assistant coach told us about the **System**. They try to attempt **100 FGA in a game with a minimum of 50 3PA**. They also attempt to** grab 33% of their missed FGA’s**. This would equate to hopefully **1 point per possession if they miss**. And they **refuse to let the clock stop**. The aim was that long defensive possessions and free throws will stall their offense. To avoid both, if a team is able to break their press, which presses result in layups, they rewarded the team with a free layup. And, boy did they run: **every two minutes a wholesale change would occur** as five new players would come in for their five players on the court. Hot hands stayed on.

It was bizarre. However, **it was the first time in NCAA and NBA where a team specifically dictated 50% of their FGA should be from 3-point range**.

We fortunately won our first match-up 116 – 92 as the Pioneers were a measly 19-55 from beyond the arc. In fact, I still have our **hot-wash** report, which included their shot-chart:

As we can see, once again the Angola methodology was used: layups and threes. However, the corner was not being exploited and the layups were almost all exclusively off of turnovers and put-backs.

Later, in 2001, we got a peek behind the scenes, thanks to USA today. The numbers were almost on par with what our coach had told us. But we now had a **clearly defined gameplan:**

- Take 94 or more FGA
- Ensure 50% of FGA are 3PA
- Force 32 turnovers
- Get offensive rebounds on 33% of missed shots
- Take 25 or more FGA than your opponent

It would be years before any other team would adopt the 50% three point strategy.

As the NBA was reluctant to adopt the three point line, it didn’t take until **29 seasons later** in the 2017-18 season before the **Houston Rockets** finally crossed the **50% of FGA being 3PA** threshold. And it worked to success: Houston finished first in the Western Conference, thanks to a polar vortex crashing down on them in the 4th quarter of Game 7 in the Western Conference Finals, almost an NBA Finals appearance. That season, Houston provided a road map for teams to fully weaponize the three point line.

It was no secret that the three point line was being embraced by the league more and more over the years. We’ve all seen the same plot of 3PA per game over the course of a season:

Back in 1990, Paul Westhead attempted to bring over his run-and-gun offense to the NBA through the Denver Nuggets. They went on to put up eye-popping numbers on offense, with 119.2 points per game. But they suffered on defense, giving up 130 points per game. And their three point attempts were rather pedestrian with only **12.9 3PA **per game, at a lowly **11.9% of FGA as 3PA**. There was no revolution happening.

However with Houston, they took a look at **effective field goal percentage**, or more importantly, **true shooting percentage**, taking a page of out Hobson’s 1945 analysis from Columbia and making it intelligence in today’s game; which is now household knowledge for any NBA analyst. If effectively **adopted the Angola** offense, but used players **capable of playing at an NBA level**. That is, attack the rim and knock down three’s. Increase the frequency of both, to almost **Grinnell** levels, and we should have a recipe for success.

And success it was.

Just fourteen years prior, the NBA was still mired in mid-range purgatory. By performing a non-negative matrix factorization on the shot locations in the 2005 NBA season, we find that there are a couple three point ranges as preferred shot locations, but there were two mid-range preferences that had dominance over the distribution of field goal attempts.

Feel free to thumb through the different types of FGA here:

Click to view slideshow.Now compare this to the current 2019 NBA season:

Click to view slideshow.Notice the severe difference? That’s Houston’s influence on the league. Notice we use the **Milwaukee Bucks court** as the backdrop for this current season. That’s because the Bucks have adopted the Houston strategy and has ridden it to a **league leading 37-13 record as of today. **

The big difference with the Houston system has been **mobile** bigs and highly skilled guard play, with bigs capable of attacking the rim and knocking down the three and the guards slicing up a switching defense. The emergence of **positionless basketball** has also helped develop the **6’7″-6’11” **point-forward; seen as an anomaly with **Magic Johnson** but is now common with players like **Giannis Antetokounmpo, Ben Simmons**, and **Kevin Durant**. It has also developed the skilled scorers such as **Stephen Curry **and **James Harden** as today there are now 3-4 knock-down shooters per team on the court at any given moment; a rare thought in 1990.

And despite this emergence, only a couple years ago former players were still calling this a gimmick; growing up professionally in a “live by the three, die by the three” era. However, elbow-throwing Charles Barkley had to eventually eat crow as the Warriors showed that the three point ball could drastically alter opponents game plans.

It only took **seventy years** to get to this point… From Columbia via Angola and Iowa to Houston. It leaves to beg one question:

What’s the next 70 year revolution going to be?

]]>

Let’s see a typical defensive rotation in action:

In this play, **Blake Griffin** of Detroit runs a **dribble hand-off **(DHO) with **Langston Galloway**. Despite Griffin being a relatively strong perimeter shooter, **DJ Wilson** of Milwaukee **drops** to allow **George Hill** to slip the screen. I use the term slip, because there is no fight as Wilson gave him room to avoid the screen.

While the screen is happening, **Bruce Brown** of Detroit runs a **Deep Cut** from the top of the perimeter to the weak side corner. The aim is to **pull ****Khris Middleton** off the nail and tangle him with **Brook Lopez** in the paint. Middleton merely checks his weak side, sees that the weak side is currently clogged with **Glenn Robinson III **and **Eric Bledsoe** hovering about.

The defensive plan here is to keep drivers out of the paint and chase shooters off the line. The primary option for Detroit’s offense is to find a driving lane, which is now gone thanks to Middleton, Hill, and Wilson. As Wilson had dropped, the only true option for Griffin is to pop to the perimeter, which will require a pass over the shoulder from Galloway; who is not entirely known for hook passes for pick-and-pop three’s. Instead, the Pistons run through their second option.

As Galloway deep cuts to the weak side, we should expect the Bucks to anticipate a 1-on-1 situation between Griffin and Wilson. In the 1990’s this would spell almost certain doom as Griffin is as strong as they come. Instead, the Bucks drop back into their zone-style of play using a **check **from Brook Lopez to allow him to stray deep into the paint with a fresh set of three seconds.

Bledsoe begins to sag back onto the nail in attempt to cover Griffin’s dominant hand should he come crashing into the lane. Thanks to Lopez’s **switch**, Hill is able to **slip** the switch back onto Galloway, allowing Lopez to **show** within his three second window.

As predicted, Griffin turns over to his strong hand, causing Bledsoe to **blitz** Griffin. This minor miscue allows Robinson III to backdoor cut towards the basket. Griffin slips a nice pass to Robinson III into the paint. Despite this, Lopez is in **show **position and contests the field goal attempt with a **block **and Eric Bledsoe **defensive rebound**.

In the annals of play-by-play data, this Detroit possession will be logged as a **Robinson III FGA (3′ Cutting Layup), Lopez BLK, Bledsoe DREB.** It will ultimately be seen as **zero points out of one offensive possession **for Detroit and **one stop out of one defensive possession** for Milwaukee. In this case, the term **stop ** simply refers to a defensive possession where the offense scores zero points.

So the question is, **how do we quantify this stop?**

Commonly throughout a game we will hear phrases such as, “all we need is one stop” or “[Team A] made this a two possession ball game.” What these statements are referring to is the stop. A stop is simply a defensive possession that results in zero points: the defensive team has stopped an offensive team from scoring in that given possession. Ideally, we would like to assign credit to the defenders. In models such as **adjusted plus-minus **and **RAPM**, there is no credit assigning mechanism other than a **regression-based methodology **that isn’t an actual regression model (IE: short story is, there can never be a 0.14 player in the game; nor is he isolated. These are more akin to poorly managed fractional-factorial designs with heavy aliasing.). Using such a a model will identify some key traits of players, but the numbers themselves are effectively meaningless when relating to true defensive impact. That is, having a defensive RAPM of 4 just means you’re on the court during situations that positively affect the defensive rating more often than someone who has, say a defensive RAPM of 3. It doesn’t mean that player is one more point better per 100 possessions (it’s a biased estimator, remember) and it certainly doesn’t mean that the player contributes to 4 points worth of defensive efficiency (due to aliasing).

We can also use **RPM-style Bayesian models**. While RAPM is a Bayesian process, it’s not a Bayesian process in the eyes of a player. It’s merely a regularizer that controls the variation effect of the parameter space, not the model space where the players exist. In this case, we can apply priors based on **box score stats** that help reduce the effect of the bias of the aforementioned “regression” methods. Using box score type statistics as a prior distribution helps smooth the RAPM estimates to allow for **some** credit of defenders. For instance, the Bucks play above will give more credit to Brook Lopez and Eric Bledsoe, but only because Brook Lopez obtain a block and Eric Bledsoe obtained a defensive rebound. It’s certainly a flawed system, but takes into better account the defensive actions.

Another method is the **Stop Percentage**, as developed by Dean Oliver. In this case, Oliver focuses in on the instances in which a defensive player **terminates ** an offensive possession. And it is broken up into two “orthogonal” parts, which we will liberally call the **personal effect **and the **team effect. ** The result is a cascading equation that breaks down a play from zero points per defensive possession to the box score actions taken over that possession.

Let’s break this all down using Justin Kubatko’s breakdown of Oliver’s stops calculation.

The first step is partitioning stops into **personal stops** and **team stops**. This reflected into the equation

We define personal stops to be **steals**, **weighted** **blocks**, and **weighted defensive rebounds**. While we know that steals completely terminate the possession. Field goal misses do not. More importantly, defensive rebounds are not entirely attributed to missed field goal attempts. And using box score data, we cannot necessarily separate out free throw and field goal attempt defensive rebounds. Therefore, we need to incorporate a weighting scheme to understand how much a **block** would become a stop and a **defensive rebound **would become a stop as well.To do this, we need to compute three quantities: the **defensive field goal percentage, **the **opponent offensive rebounding** **percentage**, and the **forced miss weight.**

Defensive Field Goal Percentage (DFG%) is simply defined as the field goal percentage of an opponent. It is given by

Opponent Offensive Rebounding Percentage (DOR%) is also simply defined as the percentage of rebounds obtained by the offense during a defensive possession. It is given by

Forced Miss Weight (FMwt) is a slightly more difficult number to compute. It is given by

This quantity appears to be backwards because we think of obtaining defensive rebounds on missed field goal attempts, while this equation coyly places defensive rebounds on made field goals. But that’s not the aim of this equation. The aim here is to **weight the value of a missed FG **versus a **defensive rebound. **In this case, the product is looking at **field goal attempts that either are made or defensive rebounded** versus **missed field goal attempts that are offensive rebounds**. IE: possession ending events on a FGA or possession continuing events.

With these three components in hand, we can compute personal stops as

How do we read this equation? Let’s walk through it.A personal stop is when a player obtains a **steal**, **block, **or **defensive rebound**. That’s the three addition components.

However, blocks and defensive rebounds don’t necessarily create stops. Take for instance, a made field goal, and And-1 foul, a missed free throw, and a defensive rebound. In this case, there is no stop on the possession. **This is where that DFG% comes in with FMwt above!**

The value of 1.07, while not in Oliver’s original work, is an adjusted value to account for the number of rebounds off of And-1 (and similar) free throws.In this case, for **blocks**, we have two components, the **blocks ****that results in forced misses** and the **blocks that result in made baskets**. The first part is obvious. The second part is nuanced as these are blocks that go out of bounds and stay in the offense’s possession or are offensive rebounds that result in points. We must subtract these out.

The third component on **defensive rebounds** are simply the remaining component of the personal stop, as we count all defensive rebounds and subtract out the ones that have had points scored on the possession prior to the defensive rebound.

Now, the second step in computing stops is **team stops**. These are computed rather straightforward, albeit lengthy, using the formula

We call this a **team stop **as these components focus more on the team’s element on gaining a defensive stop. For instance, the first component identifies all opponent non-blocked field goal misses and estimates how many will result in defensive rebounds with no made field goals prior on the possession.

The second component counts the number of non-stolen turnovers committed by the offense and, assuming a uniform distribution over time, estimates the number that should have occurred while a player was on the court.

The third component estimates the number of free throw situations that result in two misses given the personal fouls committed by a player.

Let’s see how these components tie together with the Bucks example from above.

In the play above between the Bucks and the Pistons, we saw that there was indeed one stop on the play. We’d like to give much of the credit to Brook Lopez, but how much credit does he, and his teammates, deserve? Let’s start naively and suppose the entire game lasts one possession for illustration purposes.

In this case, we compute **DOR%** to be **0** as there are no offensive rebounds and **DFG%** to be **0** as there are no made field goal attempts. This will cause stress in the computation of **FMwt** as the denominator will become **0*1 + 1*0 = 0**. As this is a box score estimate, we should require a complete box score for this play. So let’s go back and leverage the teams’ box score stats.

For this Pistons-Bucks game, the Pistons were 42-89 from the field for **.472**. The Pistons also secured 10 rebounds out of 43 possible rebounds. Note that we are skirting the true rebound total as the actual NBA box score does not list team defensive rebounds. This gives us an estimated **.233 DOR%**. Now, we ascertain **FMwt** to be **.746**.

Since the entire team played this segment without breaks, for this particular play, we will have a factor of **0.2** on the unstolen turnovers. However, there are **zero unblocked FGA’s** and **zero unstolen turnvoers** and **zero personal** fouls. Therefore the team stops for this particular play is zero. This means that all contributions are personal driven.

For Brook Lopez, we recorded one block on the play. This translates to a personal stop value of **.5600**.

As Eric Bledsoe obtained the rebound, he also contributes significantly to the stop. In this case, Bledsoe’s personal stop value is **.254.**

For Wilson, Hill, and Middleton, they obtained no steals, blocks, or defensive rebounds on the play. In this case, they all come up Milhouse with a value of **.000**. What this ultimately means is that the credit for the stop comes out to be **.814, **slightly shy of the entire **one stop**.

Now, granted, this is a **box score result**. Therefore, the game shall become completed before we make the estimates. Applying this to one play is unfair to the analytic.

By using the box score, we are able to extract out the estimated number of stops in the game. In this case, we have the following:

In case it is too difficult to read, this suggests that **Giannis Antetokounmpo **obtained 5.898 personal stops with 3.726 team stops for a total of 9.624 stops; leading the team for the night. Brook Lopez, on the other hand obtained 3.204 personal stops with 3.592 team stops for a total of 6.796 stops; good for second best on the team.

Continuing in this manner:

- Giannis Antetokounmpo
**9.624** - Brook Lopez
**6.796** - Eric Bledsoe
**6.180** - George Hill
**6.111** - Khris Middleton
**5.046** - Tony Snell
**3.733** - Pat Connaughton
**3.041** - Ersan Ilyasova
**2.997** - DJ Wilson
**2.474** - Christian Wood
**0.000**

This would suggest there were a total of 46.003 stops in the game. But how would we actually verify this?

The easiest way is to crawl through play-by-play. By doing this, we find that there are exactly 12 stops in the first quarter, 9 stops in the second quarter, 11 stops in the third quarter, and 8 stops in the fourth quarter for a total of **40 stops in total**, identifying six over-estimated stops for the game. This means that despite the box score underestimating the number of stops for a single possession, the number of stops are actually higher across the entire game. **This is not always the case**.

This is actually expected as box score analysis is coarser than play-by-play analysis. However, if we shift our focus to play by pay, what are some methods we can use to determine credit for stops?

One way is to count the number of stops and throughout the course of the game and then fit the number of defensive statistics to each number of stops and perform a “regression” of sorts. This will give answers, but will be quite volatile.

The next way is to simply assign credit to each player based on their stats. We can walk through each possession, and if a stop occurs, we can either **blindly set attribution to 0.2** per player (uniform credit) which will drastically undervalue real defensive stoppers, or we can **weight** defensive statistics. For instance, in the example above, Brook Lopez gets the block and Eric Bledsoe gets the rebound. Let’s credit them with **.5 each**.

But in doing this, we drastically underestimate the amount of contribution supplied by Wilson, Hill and Middleton. If we recall, Khris Middleton’s hold at the nail as Brown attempted to pull the defense, along with Wilson’s drop and Hill’s slip on the fight-though stopped the primary option from occurring. If Middleton blindly follows Brown on the deep cut, Detroit sets themselves with a 1-on-2 with Galloway/Drummond on Lopez.

Furthermore, how do we credit Bledsoe’s gaffe that ultimately led to a pass to Robinson III for a layup? Should Lopez get more credit for the stop because of his read of the play and ensuing block? How do we put some of the onus on Robinson III for not taking a floater and instead crazy-braving himself into a 7′ tall shot blocker?

There’s only really one way: measuring the decision making process of defenders.

In an effort to understand how stops are created is to really dig in deep onto the X’s and O’s of a defense in response to an offense. Ultimately the game of basketball boils down to an offense making a series of decisions in an effort to force the defense to lose synchronization and open up regions of the court where there is a high probability to score. It’s a chess match where the offense primarily dictates the motion.

The defense, in response, can only implement counter-actions to force an offense to make poor decisions. In this vein we measure defensive contribution through the defender’s ability to move the offense into low probability areas of interest. This, by the way, is an open thread of research:

One way to start crediting stops is to look at Detroit’s early offense. Recall this was a DHO between Griffin and Galloway after a reversal and weak-side deep cut from Brown. Middleton’s hold on the nail along with the slip by Hill eliminated the driving lane, which may have been there in the past. Therefore, we can look back at all initializations of this early offense (across all roles) and see all the various directions of play occurred. One way to do this is through **ghosting** to train the **average defensive response**. Then using the ghosting output, we can build a markov model that estimates the decisions made by the offensive team. Using the ghosting/Markov model, we obtain a **probability distribution **on the actions. And we find out in this case, based on this year, the Pistons tend to score an effective 1.02 points off that action.

By the two actions from Hill and Middleton, Detroit’s expected points scored on the play dips to 0.83 points. **That’s a positive 0.19 differential. **If we remove Bledsoe, attach him as a tether to Brown by implementing a **Brown Position – Brown Velocity + Noise** model, and run the Markovian model; Detroit’s expected point value increases to 1.11 points. Therefore that **Middleton action ****may have seemed meaningless, **but it saved the Bucks potentially **0.28 points**.

The challenge then becomes “how do we integrate these components on defense?” For instance, Detroit’s expected point value actually increases with the blitz from Eric Bledsoe; from 0.92 points to 1.08 points. Fortunately, Lopez’s show and Robinson III’s extra step drops the value down to 0.98 before the shot, which is ultimately blocked.

If we integrate out the actions, we flatten out most of the work performed. Therefore, some form of localization between defenders needs to be identified. In the end, Lopez, Hill, and Middleton should get most credit for the play as they thwarted the primary option (Hill/Middleton) and then eliminated a gaffe on a back-turning blitz (Lopez).

And on a further note, the next question is whether the **template** **of the defensive scheme **is the real stopper in this situation as Middleton and Lopez play their roles correctly. How do we quantify this effect? How much credit does Budenholzer get for this? Does he deserve credit?

It’s definitely a real challenge. But if you can figure this out, I’ll see you at the next Sloan Conference presenting your work. For now, we rely on carefully thought out work by Dean Oliver, as missing six stops isn’t bad at all. The next game will be -3, the next 1. It’s all an approximate process holding a place for when we figure out how to better quantify the X’s and O’s.

]]>For instance, how well does a player protect the ball? I pick this category because I’ve had a long belief that a turnover is as bad as a missed field goal attempt with a defensive rebound. They serve the same purpose as no points are scored while the ball falls back into the opponent’s possession. Due to this, my clunky version of computing **adjusted field goal percentage **back in 1997 would divide by FGA + TOV. I hadn’t thought of “points per possession” as a high school kid. Despite this philosophy, we have seen that all turnovers are not created equal, as loose ball turnovers can lead to fast breaks much more often than an offensive foul turnover, or a “kick-the-ball-20-rows-deep” turnover.

In our quest to break down turnovers, we found some much lesser known turnover types. In this post, we look at the distribution of turnovers, describe some of the lesser known types, and then take a look at a select few players with respect to their distributions of turnovers.

As of this morning (27 January 2019) there have been **21,280 turnovers**. That sounds like a lot, however, there has been a total of 734 games played for an average of 29 turnovers a game. That breaks down to 14-15 turnovers per team per game.

The most common type of turnover is the **Bad Pass**. This type of turnover is a **live ball** turnover and has occurred 7570 times throughout the season. The second most common type of turnover is the **Lost Ball**, yet another **live ball** turnover. This occurred 3995 times during the season. This means that at least 11,565 of the 21,280 turnovers, **over half**, are live-ball turnovers that potentially turn into fast breaks for opponents.

After we see **54.34% ** of turnovers become live ball turnovers, we then see a flurry of **dead ball **turnovers, such as the **Offensive Foul **(2790 times), **Bad Pass: Out-of-Bounds **(2357 times), **Traveling **(1479 times), and **Lost Ball: Out-of-Bounds **(1133 times). In total, these make up 7759 turnovers, resulting in **36.46% **of turnovers.

After these two collections of turnovers, we run into the **shot clock violation, **which has occurred 791 times over the course of the season, or approximately 1 per game. The turnovers occur when a team runs out of time on the shot clock before attempting a field goal that hits the rim. **Note: **a deflected pass off the rim does not reset the shot clock.

Despite running through a total of seven types of turnovers, we still have at least another **seventeen types **of turnovers to monitor. Some are quite obvious, but rare: **offensive goaltending**, **backcourt violation**, **double dribble**, and **Kicked Ball**. However, there are a couple rather little known types of turnovers such as the **Illegal Assist**, the **Illegal Screen**, the **Punched Ball**, and the **No Turnover**.

Yes, the “No Turnover” Turnover.

Before we discuss the rare type turnovers, here is the distribution of turnovers as of this morning:

The “No Turnover” turnover occurs when possession of the basketball is lost prior to a field goal or free throw attempt **but the opponent does not gain possession of the ball**. That’s right, there’s a turnover category where the opponent does not gain possession of the ball. Let’s take a look at the nuance of this foul type.

In the October 19th match-up between the New Orleans Pelicans and the Sacramento Kings, **Julius Randle** became one of the first players to pick up the **No Turnover** turnover. In this play, **Darius Miller** is guarding **Justin Jackson **on a drive to the basket that resulted in a miss. Just before Miller secures the rebound, referee **Sean Corbin** calls **Julius Randle **for a loose ball foul.

With the placement of the basketball, the fact that the Kings had given up possession with a missed field goal attempt, and the positioning and timing of the foul, the ball was deemed a **defensive rebound to the Pelicans **without securing the ball. This resulted in a **turnover** as a defensive rebound identifies transfer of possession to the Pelicans, despite the Pelican never having possession of the ball.

In a similar play during a November 16, 2019 match-up between the Brooklyn Nets and the Washington Wizards, **Joe Harris **picked up a **No Turnover** turnover when committing a loose ball foul against Washington’s **Bradley Beal** after **D’Angelo Russell** attempted a field goal.

Again, the ball was ruled as an **offensive team rebound** as the foul occurs during the loose ball scramble, which results in two free throws for Washington. In this case, a new **chance** continues within the offensive possession, however no field goal or free throw attempt is credited before Washington gets a chance to shoot.

What separates this example apart from above is that in the Pelicans’ case, the fouler was on defense while in the Nets’ case, the fouler is on offense. What this actually shows is the loose ball rebound foul after a field goal attempt; that it is common for play-by-play to be marked as a team rebound for the fouling party with a **No Turnover** turnover.

Loose ball fouls on field goal attempts are not the only kind of no turnover turnovers. In fact, there are situations where a No Turnover turnover occurs and no team loses possession. Consider the NBA’s rule book video example. In this case, the possession never really ceases for the red team, but only one turnover is listed.

In this case, Portland never gains possession of the ball, but a turnover is noted. This is listed as **2 possessions with one turnover** according to possession counting. Some may look at this as **one possession with no turnover** despite on being listed. Others may look at this as **three possessions with one phantom turnover**. Just keep this in mind as you look for team-to-team possession counting as this will add in an extra possession and potentially shift an identifier of which team has the basketball; depending on your methodology of possession counting.

The illegal assist is a fun turnover to track. This turnover type identifies players that hang on the rim in an effort to use the rim to assist for a rebound. This has only occurred three times so far this season. By why not enjoy the beauty, and potential hazards if you’re **Derrick Jones Jr.**

Not to single out Jones Jr., **Reggie Jackson** and **Jerami Grant** are the other two culprits to pull this stunt this year.

There are other types of odd-ball turnovers, which also include one instance this season of **Excess Timeout**, which occurred during a Dallas Mavericks versus Oklahoma City Thunder game on December 31, 2018. In this game,at 6:43 in the 4th quarter with the Mavericks trailing, Steven Adams tipped in a missed Russell Westbrook field goal attempt. Rick Carlisle immediately calls a timeout, which he unfortunately does not have.

Despite having the time for a Thunder t-shirt toss game break to discuss things over with his team, Carlisle’s gaffe cost the Mavericks their ensuing possession, resulting in a Westbrook technical free throw.

If the ball is ever going to be turned over, ideally a team would prefer that the turnover be a dead ball situation, allowing the defense to reset and force an opposing team in a half-court possession. Looking at the turnovers across the league, we find that the following players have the highest rates of turnovers that result in live ball turnovers.

Notice who is missing from the Top 25 players? In fact, the **Atlanta Hawks** lead the league in live ball turnover percentage; which is one of their primary reasons for falling behind in games. That is, **533 **of their **868 **turnovers are live ball, resulting in potnetial fast breaks, for an astonishing **61.4% **of turnovers. Compare that to the Tortonto Raptors 56%, Minnesota Timberwolves 51%, Golden State Warriors 54%, Brooklyn Nets 54%, and even the Chicago Bulls 54%, and you begin to see that they are well ahead of teams when it comes to live ball turnover rate. The third place team on this list, **Cleveland Cavaliers**, only sit at 57%. The second place team, **Houston Rockets**, settle in at **59.96%**, but have created less than **650 turnovers **compared to Atlanta’s 868.

Playing the analytics game of rates versus counts, that’s a differential of **three less live ball turnovers a game for Houston** when comparing the two teams and their rates.

On the flip side, by inverting the live-ball list, we obtain the Dead Ball Turnover “Specialists.” These players tend to kill the clock when turning over the ball. While a team would prefer to avoid turning the ball over, these players at least give their team a chance to set their defense up.

Notice that these players primarily are **post players**. This makes sens as turnovers tend to be loose ball fouls, offensive fouls, and lost ball out of bounds. Some highlighted players are **Aaron Gordon** of Orlando, **Giannis Antetokounmpo **of Milwaukee, **PJ Tucker** of Houston, **Kris Dunn **of Chicago, and **Jayson Tatum **of Boston. These players all have significant “touch time” at the perimeter and drive to the basket; but yet their turnovers tend to result in dead ball situations.

Since we are in the middle of a historic run, let’s take a look at James Harden of the Houston Rockets. According to Basketball Reference, Harden is 58-143 from between 3-and-10 feet, 22-42 from between 10-and-16 feet, 6-20 from long-range two’s, and 218-583 from beyond the arc. This leads Harden to have a sample expectation of **1.0482 points per FGA**. Recall that we come to this number by computing the **effective Field Goal percentage **over the regions of interest and multiplying this number by two.

In comparison to the league, through January 25th, there has been a total of 13,246 FGA. Of these 13,246 FGA, the league has taken 37,174 between 3-10 feet; converting 14,817 of them. Similarly, the league is 8491-for-20,731 from 10-16 feet, 4942-for-12,321 from 16-to-3pt, and 15,963-for-45,015 from three-point range. This leads to a league average of **0.9058 points per non-rim FGA.**

Hitting Harden’s 788 non-rim attempts from the field, we see that Harden is a whopping **+112.2112 points over league average** on shooting attempts.

If we compare Harden to his MVP “nemesis” Russell Westbrook, we find that Westbrook’s numbers are 10-for-63 from 3-10 feet, 41-for-130 from 10-16 feet, 47-for-123 from 16′-to-3pt, and 46-for-189 from three-point range. This leads to an estimated expected **0.6614 points per non-rim FGA.** Yikes. This leads to a **-123.43 points over league average**. Pay attention to the negative in that statement. Read that as Westbrook is one of the most detrimental “shooters” in the league. This is consistent with Fromal’s analysis last season as Westbrook was second on the list for the 2017-18 NBA season.

If we turn to Klay Thompson, we find an entirely different story. This season, Thompson has a shot distribution of 25-for-63 from 3-10 feet, 58-for-134 from 10-16 feet, 104-for-218 from 16′-3pt, and 138-for-363 from beyond the arc. This leads to an estimated expected **1.0129 points per non-rim FGA. **Comparing Thompson’s efficiency and volume relative to the average shooter in the league, and we find that Thompson is much like Harden in picking up a **+83.2876 points over league average**.

Note that we focus on volume of shots to separate out **shooters** from non-shooters who happened to have luck on their side.

And before we continue, we selected Klay Thompson instead of Stephen Curry for a very specific reason. For those who may be interested, Stephen Curry leads the league in points per non-rim attempt (at high volume) with a phenomenal **1.2187 points per non-rim FGA**. This leads to a yet-again league leading **+197.4402 points over league average** when considering his volume.

Given the three players above: James, Russ, and Klay, we have identified three different types of shooters:

Harden is the **playmaking scorer-shooter combo**. This type of player generates their own points and can tear apart a team from long range. This is the deadliest type of player in the league. Defenses have to make conscious decisions on whether to guard the drive, guard the pullup/stepback, whether to blitz/double and leave another shooter potentially open, or have to leave the shooter in off-ball situations in help defense.

If we think of the **scorer-shooter combo**, there are three levels of this player despite doing both. Harden is a **SCORER-shooter** while Curry, mentioned above, is more of a **scorer-SHOOTER**. Something we will touch on later.

Westbrook is the **playmaking scorer**. Westbrook is a high-usage player due to his ability to get to the rim and collapse defenses. Not known for his shooting touch, Westbrook shoots just enough, call it “Marcus Smart enough”, to make defenses think twice before giving him space at the perimeter. Westbrook generates offense more through his scoring abilities but will tend to lose games if forced to take all the big shots outside of 3-feet. Hence the reason for Paul George’s over-the-top strong emergence this season; reminding us of the Indiana days of PG13.

Klay Thompson is the **shooter**. This type of player is a pure shooter than can pick apart a team at any time they want. Sure, Thompson can generate points on his own, but he’s best utilized as an off-the-ball catch-and-shoot monster that can put up 20-30 points in a hurry. He is the perfect complement to a **playmaker** such as Stephen Curry or Russell Westbrook.

**Side Note: **If you are unsure of the difference between a shooter and scorer, feel free to have a discussion in the comments. This is a very important distinction that is made when discussing players around the league (and has been for well over a decade).

Now suppose we are interested in evaluating three players that are respective teammates to Russell Westbrook, James Harden, and Klay Thompson. Suppose these players are considered equivalent defensive players. And furthermore, to constrain the problem, suppose they play the same number of possessions as each other with their respective teammates, playing identical opponents, and have identical net ratings.

We’d like to ask, **which of these three smaller-fish players are more important to their offenses? **And it’s here where the “missing-ness” of stats rears its ugly head. This one being the **missed FGA off a** **pass**, also known as the **potential assist**.

A **potential assist **is a situation where a ball-handler make a pass to a player who takes a field goal attempt within the determined amount of time an effort required of earning an assist, called an **assist window**, if the field goal is converted. Tracking assists is easy. When a field goal is made the play-by-play logs tack down who the passer was, if there was a passer within the assist window. However, when a field goal is missed, the assist field is zeroed out as no assist was made. Tracking these assists are relatively easy, it just isn’t done.

Instead, we are forced to look at other methods for determining a potential assist. For instance, we can look at **tracking data** and surmise a **filtering algorithm** akin to extracting passes. But for assists does that actually work? Let’s look at what the league has to say about passes:

An assist is a pass that directly leads to a basket. This can be a pass to the low post that leads to a direct score, a long pass for a layup, a fast break pass to a teammate for a layup, and/or a pass that results in an open perimeter shot for a teammate. In basketball, an assist is awarded only if, in the judgement of the statistician, the last player’s pass contributed directly to a made basket. An assist can be awarded for a basket scored after the ball has been dribbled if the player’s pass led to the field goal being made.

Therefore, unlike passes, there is no distinct rule-based definition on what constitutes an assist. it is literally defined as a **subjective statistic**, which can be defined differently across different teams. Therefore, we cannot easily place a rule-based mechanism like we did in the past for passes, after all. Instead, we turn to the work of **machine learning**.

Ultimately, we need to know whether passing to James Harden, Russell Westbrook, or Klay Thompson is going to improve a teammate’s chances of receiving a reward such as an **assist** for a made basket or a bump in **points produced** and therefore increasing their **offensive** **rating. **By looking at the hard numbers above, if we all wanted to pad our stats then we’d all want to be Klay Thompson’s or James Harden’s teammate. Or do we?

In an effort to build a potential assist model, let’s apply a **supervised learning **technique to help introduce **labels** and **training ** into our system. Fortunately, we have a sample of labels already gathered for us through the play-by-play assist. To start, we can walk through every **made field goal attempt **and split them into two classes: **assisted field goals** and **un-assisted field goals**. Using a “0/1” label as our **response**** variable** we can employ some sort of model to identify the differences between certain **explanatory variables** such as dribbles taken, feet traveled, seconds between pass and shot, etc. in an effort to understand if a player takes two dribbles after receiving a pass **could** the passer be credited with an assist.

Immediately, to the novice user, a **logistic regression** model comes to mind since the response is binary. However, one issue that arises with logistic regression, is that we must **assume that the log-odds ratio is conditionally linear with zero multicollinearity**** across all the explanatory** **variables**. More importantly, this conditional model must satisfy the **exponential family assumptions ** in the log-odds space, which, unfortunately, usually ultimately fails in basketball analytics.

Next, we could leverage a **neural network** to do our dirty work for us. And indeed we could. However, we have a better idea for teaching some neural networks in a future posting, and why not go crazy in learning something entirely different…

A fairly flexible methodology in classification is the **support vector machine (SVM)**. In practice, this is called a **separating hyperspace **algorithm that aims to take the explanatory variables and **split** the classes using hyper-planes until all classes are split into uniform regions. Let’s look at a really basic example.

Suppose we sample **1000 points** within the unit square with a decision boundary decided by some **5th-order polynomial**. Anything below the polynomial is considered **class 1** while anything above the polynomial is considered **class 2**. Given the 1,000 samples, we can easily see the boundary:

To show we’re not hiding any cards up our sleeves, here’s the plotted decision boundary. between the two classes:

And you can even try this at home:

<code> import numpy as np import random</code></pre> x = [[],[]] y = np.array([]) cols = [] for i in range(1000): p = random.random() q = random.random() boundary = .5-(124./15.)*p + 44.*p*p - (1016./15.)*p*p*p + 32.*p*p*p*p if q < boundary: y = np.append(y,0) cols.append('blue') else: y = np.append(y,1) cols.append('green') x[0].append(p) x[1].append(q) dots = np.linspace(0,1,100) bounds = np.zeros(100) for i in range(100): p = dots[i] bounds[i] = .5-(124./15.)*p + 44.*p*p - (1016./15.)*p*p*p + 32.*p*p*p*p plt.plot(dots, bounds) plt.scatter(x[0],x[1],c=cols) plt.show()<code>

Now, if we apply a Logistic Regression, we obtain the following results:

X = np.array(x).transpose() clf = LogisticRegression(solver='lbfgs').fit(X,y) yhat = clf.predict(X)

And we find that we have a success rate of approximately 75% of correctly classifying the points! That’s actually not too good given we can easily see the boundary. This terrible results comes from the fact that this particular boundary problem and associated distribution requires a **curved exponential family** to improve on its boundary. That is, we’d have to develop a weighting scheme in order to satisfy the assumptions of the logistic regression. In two-dimensions, this is rather straightforward. However in multiple dimensions, we get into a lot of trouble as we cannot view the results.

A **support vector machine** will look for a collection of separating hyperspaces to partition the two classes. In the two-dimensional case, we will identify segments of straight lines that partition the data. If we assume a linear boundary, this will give us the best fitting “linear model”:

However, we don’t restrict ourselves to the linear model in SVM’s. We actually employ what are called **kernels**, which give weight to each data point. When paired with the potential separating hyperspace and the observed classification label (assist or non-assist), we obtain a **“linear” **boundary as such:

The image on the left shows the “Logistic Regression” type model with a linear discriminant. The image on the right shows the learned “linear boundary” from SVM’s. (Image from Elements of Statistical Learning)

If we apply this to our scheme, we find we obtain a much better classifier.

clf2 = svm.SVC(kernel='rbf',gamma=10) clf2.fit(X,y) yhat2 = clf2.predict(X)

Here’s we applied a **radial basis function** as a kernel and settled on the value of **10**, which is a smoothing parameter for the radial basis function. Selecting this parameter should be performed by **cross-validation**. In this case, the value of 10 from one-fold cross-validation gave us an average error of **0.03%**. Much better than the **25% **from logistic regression. **And this was on well-separated data.
**

Crediting an assist to a made field goal is not a well-separated distribution. There have been several instances where a play will be credited an assist for one player, but the same action may not be credited for another player. In these cases, this boils boil to the differences in judgement between two different crediting statisticians. Using the assist crediting for converted field goals, we can train an SVM model to identify key features for determining an assist when a FG is made.

Many of these features need to be teased out of tracking data, and unfortunately due to the exclusivity of the data, I cannot share code or even the data itself. However, if you get your hands on tracking data, you can test out some of these features. Note that in these results, we will use the notation of **class zero **being **no assist on attempt** and **class one** being **assist on attempt**. Here are the primary features that yielded great results

Yes, this is an obvious one. Passes are highly correlated to assists. And due to this we can immediately set field goal attempts where there were no passes were made to class zero. This is a well-separating feature and is the by far the most dominant feature in determining assists.

The second feature that well-separates the classes is the number of dribbles. And it’s also this one that starts to make situations a little mixed in the results. In fact, this season, there has been a couple assists generated off of three or more dribbles after the pass. For the most part, it’s effectively **one or zero **dribbles. Due to this draw down, there’s some room for error in predicting an assist.

We can also measure the amount of time between receiving a pass and taking a field goal attempt. The significant range lays within the first 1.5 seconds of a shooter receiving the ball. A lot can happen in 1.5 seconds of action. Despite this, we do find a significant bulk of assists lay (softly) around this boundary. This feature is the third most significant feature and actually gets tangled up with the above feature and next most significant feature.

Through film study, we notice that a player who “swings” their velocity impacts whether an assist gets credited. A “swing” in this case is when the player’s velocity vector swings from **going along the axis of a FGA attempt **into an **entirely different direction**. Just like a swing.

We use the axis notation as a player may be slowing in their direction towards the basket. And in fact, it’s not the player we measure, but rather **the basketball**. The example is given as a pass into the post. A player who catches the ball may be on the run, and hence their velocity vector is pointed at the basket.

In the cases of a turn-around, the velocity vector will point away from the basket, but along the same arc. These tend to be credited as assists as well. However, if the player makes an extra move, then the assists may no longer be an assist. For example, a player may stop and pump fake. Or the player may perform a cross-over or spin move. It’s these points where the judgement begins to become mixed.

Despite the judgement, we see the velocity vector of the basketball start to become **orthogonal** to the direction of the basket, which indicates a **basketball move** is occurring and the assist is more than likely going to evaporate.

Therefore a velocity swing is the **cosine angle** of the player’s velocity vector towards the basket and the basketball’s cosine angle of the basketball’s velocity relative to the player’s velocity. **Note that this value is always between 0 and 1**. If we integrate the cosines over the time between reception and attempt (feature three) we obtain out total amount of **velocity swing**. Small values of these lead to assists.

Using these features and a leave-one-out cross-validation, we obtain a 98.77% recall rate of crediting an assist when a field goal attempt is made. Not too shabby! This means we will typically potentially mess up 1-3 shots per game as teams tend to shoot between 150 and 200 shots, combined over the course of that game. We can live with this as, after all, assists are subjective to being with.

Now recall that we used actual assists to learn out SVM. Despite this, we never actually used the made field goal to train our data. Therefore, a missed field goal attempt suffers the same fate as a field goal attempt in the eyes of the assist. As a thought exercise, we show a creditor 100 “made field goals” and simply cut off the video before each ball was released, tell the creditor that “yeah, it was made anyways,” and we ask whether the play was credited as an assist. It then turns out all of these attempts were misses; it does not change the outcome of the experiment.

In this case, we apply the potential assists to all of our games that James Harden, Russell Westbrook, and Klay Thompson have played. Due to the availability of the data, we have only every game through January 16th of 2019. Despite this, we have the following results of our SVM:

And immediately we see the differences between these three candidates; and the reason why we selected these three players.

Immediately popping off the page is James Harden and his **7.57% of field goals being potentially assisted! **That’s absurd. Furthermore, when he is potentially assisted, Harden posts an effective field goal percentage of 0.6094, which leads to an estimated expected **1.2188 points per FGA. **Of course, the rim-attempts are tangled in here; so be cautious with the stats.

That said, we find that only **8.81% of Harden’s three point attempts are potentially assisted**. Again, a counter-intuitive game plan according to the catch-and-shoot trends in the league. In fact, Harden’s 3P% in potentially assisted attempts is **37.2%**, which is almost identical to his **pullup and stepback **three point game, which is at **37.5%**.

What this suggests is that we should up-weight a teammate’s assist total when they work with a high-usage player like Harden due to the fact that Harden will make significant moves after receiving the ball. Being Harden’s teammate when it comes to measuring true passing vision as most passes will not end up in attempted shots. Therefore a simulation mechanism needs to be in place for ascertaining value of the pass.

On the other end of the spectrum, Thompson is a passer’s best friend. Here we see that Thompson is fairly high up in percentage with **67.69% of all his FGA being potentially assisted**. More staggeringly, **over 90% of Thompson’s three-point attempts are potentially assisted**. For all high volume shooters, this is the highest in the league (by far).

Much like Harden, Thompson’s efficiency barely changes depending on the three-point attempt; as he is a **37.8% **shooter in potential assist situations and slightly over **38% **in all other situations.

Westbrook is the passing teammate’s nightmare; in the sense that an assist is not likely to get credited if Westbrook shoots the ball. Due to this, since Westbrook is an MVP caliber player capable of making plays and winning games, the teammate needs to make the pass. With this in mind, we can up-weight this player’s assists totals much like HArden’s teammates, as they are making the passes and just not getting the results. In Harden’s case it’s an extra action that’s taken. In Westbrook’s case it’s just bad luck.

As a note, Westbrook shoots **27% on potentially assisted three point attempts** while dropping down to **23% ** on pullup and stepback attempts. In this case, we actually see a fairly significant improvement in percentages; regardless of the low percentage.

By leveraging a machine learning algorithm like a support vector machine, we are able to start developing models to help us understand difficult to measure quantities such as a potential assist. There are many more ideas we can pop out using this type of machine learning capability. For instance, a follow on question may be, **can we use extra features to identify designs of plays in-game that will CREATE potential assists?**

The short answer?

Yes.

]]>- Player A: 31 points, 13 rebounds, 3 assists
- Player B: 20 points, 11 rebounds, 9 assists
- Player C: 20 points, 21 rebounds, 0 assists

Frequently, we ask **who was the better player** or **which player contributed the most to the game**. Unfortunately, most questions asked within a front office and coaching staff don’t revolve around the notion of who the best is. While it is indeed an important question, it’s usually obvious that Russell Westbrook and Paul George are the top scorers, while Steven Adams is dominant on the boards for the Oklahoma City Thunder. And that LeBron James is most dominant player on the Los Angeles Lakers. There are already some great tools in evaluating the contribution of a player. By using Dean Oliver‘s points produced model, we can even break down some of the components of points scored, assists, and rebounds into respective amounts of contribution to a win or a loss.

Many of the questions that are asked revolve around, **what style of play does this player have** or **if we take away / limit this component of a player, how are they going to respond?** And it’s these questions that start to pose difficulty.

In the example above, it’s not necessarily the points the player generates, but rather: **Which player is the rebounder? Which player generates their own attempts? If we can only play two of these players at a time, which players best compliment each other?** And it’s this latter question we will focus a bit on through a real coaching problem I performed for a team. (**Note: **The subject of that analysis is considerably different than what I am about to present.)

For instance, if we were to limit Andre Drummond‘s ability to gather offensive rebounds, we know we will frustrate and severely limit the Detroit Pistons offense. Through January 14th, Drummond had posted 218 offensive rebounds, which results in **at least 218 extra chances** to score. Over the course of 42 games, that a **potential expected 6 points a game extra** just on his rebounding alone. In fact, the distribution of second chance points are distributed as:

As we see the total of 230 points scored over the course of a season given a Drummond offensive board; keeping Drummond off the offensive glass gives our team an added **5.47 points (current estimate) in score differential per game**.

This means if Drummond is off the glass, who is more likely to increase their rebounding? Ideally, it’s our team. Realistically, **someone will step up for the Pistons**. Now here lies our quest for understanding.

Our first guess is Blake Griffin. He gets about one offensive rebound a game and plays a perimeter style of play; however, he is Detroit’s second leading rebounder within the starting unit. And despite having 49 offensive rebounds, the Pistons average **less than one point per offensive rebound** and that includes third-, fourth, etc. chance points on top of second chances.

We can live with checking Blake Griffin and body if necessary. So what about the second best overall rebounder, Zaza Pachulia? Well if we place him side-by-side with Drummond, we expect to feast with small guard as neither Drummond nor Pachulia play the perimeter. Alternatively, if Pachulia subs in for Drummond, our work is done as the Pistons are now keeping Drummond off the offensive glass by putting him on the bench.

Ditto for Jon Leuer.

Now if we stick with Blake Griffin as the primary alternative crashing player, we need to begin to understand the **style** of Blake Griffin’s play. We can look at Griffin’s basic stats, his on/off numbers, and some of his advanced stats; but we still lack a method of comparing the player. We have seen several situations when Griffin has been on- or off- the court with Drummond. We can casually count the differentials in rebounding, points scored, etc. However, with Drummond being tethered by our defense; how many examples have we seen of that? How do we quantify these instances? Do we dare impose a set of **“business rules” **to define a player and vehemently argue their merits despite never statistically testing those rules? Or do we apply some form of **metric-based learning** to help understand how a player adjusts to the various roles they must play over the course of the season? I tend to select the latter.

So how do we begin to understand the different facets to Blake Griffin’s game? A simple methodology would suggest we simply apply **Euclidean distance** metrics and measure the difference between stat categories of Griffin and every other player in the league. This will provide some reasonable results. If we condition on all games where Drummond is limited on rebounds and compare Blake Griffin’s stat-lines across the league, we will indeed find some interesting results.

For instance, if Drummonds rebound total is lower than say 14 rebounds a game, we find that not only Blake Griffin’s points go up (he averages upwards 8 3PA per game in recent games) but he begins to match to players like Paul George, Stephen Curry, and James Harden. While Griffin is considered to be one of the best players in the league, he is not on that echelon this season. Unfortunately, this what what the Euclidean metrics will say. And if you bring this to a coaching staff, you’ll most likely get one of three reactions:

- You’re kidding me, right?
- What a pile of [expletive].
- Hey, thanks for this information, this is really insightful. (Not invited to the next team prep.)

So why did this weird player comparison happen? It’s primarily due in part to the stat-line not sitting on Euclidean space, but rather a **manifold space**.

**Warning:** This is a condensed, liberal, walk-through of manifold learning. If intimidated, skip at your own risk!

The primary problem with the above Euclidean solution is that the Euclidean-based metrics assume that **rebounds, points scored, **and **other stats** are **independent…** Which we know is definitely not true. But, if you need an example…

Consider a single possession with a single field goal attempted and zero fouls. This is one of the most common possessions in the league. If this occurs, we can effectively fall in one of three categories: **made field goal**, **missed field goal with defensive rebound**, or **other**. In this case, other may be an offensive rebound with turnover or end of period.

Regardless, we either score points with no rebounds or we rebound with no points scored. Unfortunately, those Euclidean metrics assume that you can both score and rebound on the same field goal. **Aside:** which is another reason why you should never use linear regression on possessions to help make decisions. Ahem… but that’s another story.

This possession actually lies on a **manifold**! A manifold is a space of points where, given a point on the manifold, the **space looks locally Euclidean**. The best example I can give is the circle. In this case, consider two points (**p** and **q**) on the circle:

If we are interested in the **average point** between p and q and we compute the Euclidean distance average, we end up with a point **not on the circle at all!** That’s a huge problem. In basketball terms that suggests **a player is expected to score 1.2 points on zero field goal** attempts. Yes, it gets that bad.

Therefore, we need to learn the manifold for which the data sits on and develop the metric. **Note: **Metric effectively means “distance measuring” function.

For the circle, this is simple **arc length**. For a small neighborhood of points about the point p, we see the circle is roughly flat. If you don’t agree, try this sub-example: you’re on the Earth with no hills or valleys. It looks flat to you. And no matter where you walk, you rotate about the Earth’s center despite the world always looking flat. That’s a manifold. The collection of your path walking between two far-away cities is no longer flat, but rather and arc. That’s your metric.

Intuitively, we all know this. But for basketball data, **what is the manifold?!?! **And this is where manifold learning comes into play.

Manifold learning techniques thrive on a very similar concept: Assume all the points are locally Euclidean and mathematically impose rules such that comes points are viewed as “too far away” to be in the “locally flat” space. Many of these methods exist, such as ISOMAP, Local Linear Embedding, and Self-Organizing Maps. A recently popular one is t-distributed Stochastic Neighbor Embedding; or t-SNE.

Despite the implementation being difficult to master, the idea is relatively straight-forward. We take a sequence of **n points. **This may be a set of stat-lines from every player in every game this season. Suppose 8 players play for every team in every game over a total of 650 games this season and we should have roughly 10,400 samples to help us understand the manifold that represents player interactions. Label these points **x_1, x_2, …, x_10400**. Each point is **p-dimensional, **where **p** is the number of statistics we consider in the model.

Note that we might use per 100 possessions to help us normalize for playing time. And we may adjacently attach on average point differential over the possessions played to help encapsulate garbage time (instead of throwing data out).

Given these roughly 10,400 samples we form the 10,400-by-10,400 matrix of **Euclidean distances** between each pair of points. From this, we compute the probability that two points are considered to be “local” to one another:

In t-SNE, we propose a new set of points, **y_1, y_2, …, y_n** that are in ridiculously small dimension, typically 2, to “project” our data points **x_1, x_2, …, x_n** onto . We compute the distances between each of these projected points and assume they follow a **Cauchy distribution** (t-Distribution with infinite variance):

And we look at the **Kullback-Leibler distance** between the projected points’ distribution and the “Euclidean distribution” we built using the matrix above. We then apply gradient descent to minimize the Kullback-Leibler distance. In doing this, new “projected” points are proposed and the process continues until Kullback-Leibler converges to some small error.

The resulting points are then **local neighborhoods **that represent the **original data set **(p-many possessional stats) in a **2-dimensional context**. And the cool part is… **distances are effectively preserved. **This gives us a clustering approach to players…

Therefore giving us insight into the **style of play **for every player.

So let’s perform a simple example: one where we can envision the data. In this case, we consider the traditional triple-double stat-line of points, rebounds, and assists. If we sample every single one of these throughout the course of the season, we obtain over 11,000 samples and a horrific plot.

This is the simplest plot we could give. If we start to include more statistics, we can no longer plot the distribution, but rather have to start looking at conditionals. No thanks. This plot, unfortunately, gives us absolutely no indication of player styles.

We may be able to label some points and attempt to perform clustering, but the Euclidean distance problem from above will bite us yet again.

However, if we apply a t-SNE algorithm, we obtain the updated plot:

Immediately we see major clusters form with many minor clusters within the major areas. Looking at this plot, we could make the argument that there are between 4 and 8 major clusters. If we start to append names to the clusters, we start to find specific groupings.

For instance, the upper right grouping are high scoring (talking 30-50 points per 100 possessions with high usage and high team usage region) with approximately 10 rebounds and 10 assists per 100 possessions. This is predominantly **James Harden **and **Anthony Davis **territory, with a few visits from **Stephen Curry** and one appearance from **Kyle Kuzma **on the fringe of this group. The latter thanks to his highly efficient 37 point, 8 rebound, 3 assist game. The last of these causing him to sift to the fringe.

Curiously, the **antipodal** player isn’t the 0-0-0 player, but rather the **rebounders / shot blockers** like Andre Drummond. Antipodal here means, “on the opposite side of the manifold” or “as far away as possible.” These players are valuable, but their style is **significantly different**. And we know that is true. These players are our **De’andre Jordan**‘s and **Tristan Thompson**‘s of the league.

What this plot breaks down for us is the **style **of play. Therefore, players that fall near other players **share the same style for that particular game** Therefore, the next steps are to start identifying the styles of play. And seeing that I personally don’t want to spend more than 2 hours on this post tonight, I marked two of the minor groups.

So now we return to our problem. Given the t-SNE plot, we are able to mark every game of every player. In this case, we can mark Andre Drummond’s low rebounding games and identify where the other Detroit players fall.

Before we continue, we note that we left **Stanley Johnson** off this list only because he is nowhere close to any of the other players and is scatter-shod all over the map. His four major appearances in Drummond’s “low rebounding” games are near Reggie Jackson for one, near the center for another, up towards the mallet looking portion, and slightly above Drummond during the same game.

What this informs us of is that Blake Griffin is most likely to be the rebounder if we begin to eliminate Drummond from the offensive glass. However, we have to keep track of Luke Kennard, as he is effectively the third option for rebounds. As such the strategy would be to rotate off Kennard to pick up Griffin, if possible, and be ready for Kennard to crash. This may lead to potential **low gravity** events when guarding Kennard to make him comfortable enough to stray out onto the three point line, allowing our team to better rebound a potential deep ball coming from Bullock or Jackson.

The good news is, we have insight of the styles. And with one final point to make: we performed all this analysis in an effort to determine the styles in players. Ultimately, this is a **qualitative** quantification. Meaning that we leveraged analytics to get to a point where we needed to summarize the style. While we obtain “closeness” of styles, if we run this algorithm again, we may see styles change ever-so-slightly depending on the blending of styles. Therefore, we say **proceed with caution** as this is a data science tool for exploration and uncovering new features and relationships within the data, through imposing some form of qualitative markings.

True shooting percentage (TS%) isn’t a new concept by any means. It’s been around for roughly 15 years, and maybe more to some savvy analysts, and has been discussed quite frequently over the years. Four years ago, Justin Willard of Nylon Calculus gave a nice introduction to TS%. For the uninitiated, the formula is given as

As a quick refresher, the idea of true shooting percentage is simple. We take half of the number of points scored by a player and divide it by the **number of possessions that result in a chance at scoring**. Some folks call this scoring possessions. Some folks call this scoring chances. Some folks call this true shooting attempts. It has many names and causes a little confusion between analysts from time to time. I personally prefer just calling it a **scoring attempt**.

As the analytic was developed several years ago, the ability to trawl through play-by-play made it fairly difficult for analysts to correctly count the number of scoring attempts. Through some detailed analysis performed during the same era, a value of **.44** was used to help approximate the number of possessions when using box score stats. The idea is that if we knew the number of free throws, along with defensive rebounds, field goal attempts, and so forth, and we assume that made field goals terminate possessions; rendering made “And-1” free throws as non-possession ending, then roughly 44% of free throws actually potentially terminate possessions. This idea is rather straightforward as this suggest **six percent **of free throws are And-1’s, missed three-point attempts, technical, flagrant, clear path, or “Away from the Play” fouls. And the analysis was fairly spot on!

Two years ago, Matt Femrite of Nylon Calculus spent time showing that the coefficient of .44 had become outdated, suggesting that possessions were now being overestimated. As possessions were being over-estimated, it was theorized that the value of .44 was too high, indicating that a higher percentage of free throws were of the three-point variety, And-1’s and the other various types listed above. We took a look at the same phenomenon from an estimation standpoint and found Matt’s work to be corroborated with a proposed **.436** value for the 2016-17 NBA season.

While our focus was on possession counting, and inherently their impact on ratings, showing that ratings were now being **underestimated**, there was a domino effect on TS%. Meaning, that TS% is now being underestimated. Therefore a field goal today is worth less than a field goal yesterday because players are more effective at drawing fouls on higher-valued scoring attempts: particularly And-1’s and three point attempts.

This led us into digging deep into the distributions of free throws for each team. We found that by counting possessions explicitly, the changes in true shooting percentages would actually shuffle players around from the theoretical answers to an updated tactical answer. Great, but honestly, overkill and unnecessary unless we really wanted to squeeze out the extra point of edge in a game. And we can find 2-3 points somewhere else… **just by inducing a rotation**. But that’s another story for another time.

If we compared the analytically-derived value of .436 and compare it to .44, every single free throw a player makes, the denominator of TS% is theoretically affected by **.008**. The article above shows that the .436 is actually not uniform at all (Nuggets and Jazz are the example) and instead we can use the offset from .436 to **give a proxy for understanding a player’s ability attack and finish** and a **team’s scheme for attacking the rim and drawing fouls beyond the arc.** You can read about that above in the last link. Regardless, unless a player **takes a significant amount of free throws relative to their field goal attempts**, the change in TS% is minuscule.

The reason for our reminiscing about the work performed on the “.44” in TS% is due to the fact that this is a **regressed** value, meaning that we are looking at a **distributional effect** of free throws on possessions and scoring attempts. Because of this distributional effect, small sample values are going to be relatively meaningless. Let’s consider this example:

**Example A: **Player A cuts through the lane and hammers home a dunk. What’s their TS%? This is simple: **2 points** goes in the numerator of TS%, while the denominator sees **1 FGA** added to **0.44 x 0 FTA**, which is **1**. Since we cut points in half and divide, we obtain a true shooting percentage of **one**.

**Example B: **Player B cuts through the lane and get decapitated by a wild-armed center. Fortunately, Player B survives and hits both free throws. What’s their TS%? This is simple as well: **2 points **goes in the numerator of TS%, while the denominator sees **0 FGA** added to **0.44 x 2 FTA**, which is **0.88**. Since we cut points in half and divide, we obtain a true shooting percentage of **1.14**.

This shows the fundamental flaw in small-samples using a large-sample estimator. Realistically, **Example B** has one scoring attempt, not **.88**. Therefore the real true shooting percentage is **one**. Therefore, we should take TS% along with other stats, particularly, **scoring attempts**. A savvy analyst today already does this.

So now that you’ve made it this far, it’s time to tell you that this post is neither about the deficiencies of TS% nor how the “.44” is over-valued. There’s actually not many deficiencies with TS% as the grain of salt about small samples and fluctuation across teams and players has been well documented. Rather today’s post is about the **distributional aspect of TS% and how we can begin using it to model effects of the game**.

Let’s begin with a reader-requested team: the New York Knicks.

As an example, let’s consider the 2018-19 New York Knicks. Through their first 40 games, the Knicks have settled into a 10-30 record. Some of this is due to their **league bleeding** (last place) **.528 TS% **and their second-to-last **.543 opponent effective Field Goal Percentage**. While their other offensive stats are top-half-of-the-league, there are some defensive deficiencies on defensive rebounding while middling in the areas of opponent TOV% and opponent FTr. As a team, **Tim Hardaway Jr.** is the main catalyst of the offense while Kristaps Porzingis rehabs from a torn ACL towards the end of the 2018 season.

By extracting the different types of free throws, we can now write a scoring attempt as:

**FGA + (FTA – A1A – TA – 3A – APA – FA)/2 + 3A/3**

This will accurately count the number of scoring attempts generated by the shooter. But, as indicated in the previous section, the updated true shooting percentage, **TS%^**, barely budges by more than a percent for anyone. What’s more important here is that we have broken up the components of true shooting percentage into **semi-independent, measurable count processes**. Wahoo!

Our ultimate goal is to build a model that identifies the variability of true shooting percentage, as well as provide a guideline for building a **regression model** to **identify the impact of actions on court **that affect TS%. We could be naive and suppose a Gaussian model, but we would have to admit we are ignoring that the Central Limit Theorem fails, a derivative result of another Nylon Calculus post about the stability of the three point attempt, this time by Darryl Blackport from 4 years ago.

Therefore, we need to identify the counting process associated with the components of the model. And, unfortunately, **Poisson ain’t** it. In fact, I use the term semi-independent as a surrogate for the fact we assume **independence of the terms** despite there may an argument that the terms are indeed not independent. The term measurable does not mean Lebesgue measure (if you don’t know that means, it’s cool… we won’t talk about it here anyways), but rather we can measure the counts using the counting measure. Yes, that’s a joke… but yes… that’s a true mathematical statement too.

All we are saying is that **FGA** and **non-FGA FTA** **independently occur** and that **we can** **count them.**

As we mentioned before, the counting process above is not Poisson. A Poisson distribution describes this process: **For a given period of time, if items arrive at random, independent, times; each with a mean time of arrival (L), how many items will arrive before the period of time ends? **The collection of observations of such as experiment form a **Poisson distribution**.

This sounds very much like how field goal attempts occur! We have a series of minutes played in a game and we suppose that all field goal attempts are independent. Therefore, the number of field goal attempts that arrive within the time window must follow a Poisson distribution! If you’re an analyst who gave out the exercise **Can you model the 3-PT% of every player in the** **league?** question for potential new hires, you’ve probably been inundated with this exact response. Unfortunately, while it’s a good first try (and you’ll even do well predicting **some**); you’re failing assumptions and (more specifically) the data science associated with the problem at hand.

If we were to look into developing a paper for Sloan, we would immediately look into the game theoretic events associated with the types of shots taken. IE: How likely are we to attack the rim given the current situation with the offensive capabilities handling the possession and the defensive abilities in movement. This type of analysis requires aid with tracking data. Instead, we stay on task with play-by-play data and ask, **how do I model my response of the number of FGA? **

Let’s take a look at Tim Hardaway Jr. once again. Over the course of 37 games, Hardaway Jr. has averaged a total of 16.7 FGA, 7.7 3PA, and 5.3 FTA per game. Respectively, the variances for each are 18.3 FGA^2, 6.4 FGA^2, 14.5^2. We use the square-notation to indicate the units for the variances. In these cases, none of the variances are the same as the means. However, there are only 37 samples. If we were to fit a Poisson distribution, we would actually obtain a relatively good fit.

We see that Hardaway Jr.’s distribution of FGA don’t necessarily satisfy a Poisson distribution, and appears to be over-dispersed indeed. Despite this, with the smaller sample size, are we able to do better? To better understand over- and under-dispersion, we can look at the **Conway-Maxwell Poisson model.**

The Conway-Maxwell Poisson model is a generalized form of the Poisson model that allows us to estimate over- and under-dispersion through a new parameter, **nu**. The generalization is in the same vein as in the generalization of the Gamma Distribution to obtain the Rayleigh or Weibull distributions. Here, the probability mass function of the Conway-Maxwell distribution is given by

This model looks very similar to the Poisson model, except that the normalizing constant isn’t a pretty exponential, **e^(-lambda)**. This is where we gain some added flexibility.

In the Poisson model, the value lambda represents the expected number of arrivals over a given period of time. In the Conway-Maxwell distribution, lambda no longer represents this value. Instead it turns into a **location-type** parameter, which helps “center” the distribution. Similarly, the parameter nu is a **scale-type** parameter, which helps “smooth” the distribution to give the distributions its shape. These interact together in a non-linear way, meaning a simple adjustment in lambda does not just shift the distribution left or right by that amount despite primarily controlling left and right movement; hence the “type” added. In fact, the expected value cannot be given in closed form other than the infinite sum:

While we can compute the mean numerically, it is still a chore to estimate the two parameters given a data sample. The way we perform this task is to write out the **log-likelihood **of the distribution, take the partial derivatives and set equal to zero. This leads us to solving the following equations for **lambda **and **nu**:

Don’t let those equations fool you, lambda and nu are tucked in the expected values; just use the expectation formula above with the appropriate values for each equation. Given these equations, we do not have a closed-form solution. Therefore, we must apply **Newton-Raphson** optimization. And once we do that, we can estimate Tim Hardaway Jr.’s FGA using our flexible distribution.

And we immediately see that a Poisson model is indeed preferred. The maximum likelihood estimates from the Newton-Raphson optimization scheme even favor the value **one** for nu; which gives us the Poisson model explicitly!

While the above model fits the Poisson distribution, this is in the **full unconditional model**, meaning that no outside factors affect the distribution of field goals. If we were to suggest that defense variables affect field goal attempts, we would require setting up a **generalize linear** **model** and the resulting conditional distribution may not be Poisson. However, let’s look at the other potion of **scoring attempt** within TS%.

If we look at the distribution of Tim Hardaway Jr.’s distribution of free throw attempts, we find that the distribution of FTA is considerably different than that of the distribution of FGA. We see that once again the Poisson fit isn’t the greatest, but this time it can be improved.

We see here that the distribution is indeed over-dispersed. In this case, we should definitely find a good fit using the Conway Maxwell Poisson distribution. And in this case, we find that lambda of approximately 1.7 and nu of approximately 0.35 help fit this distribution.

And it’s here that we see the fit of the Conway Maxwell Poisson model performs much better than the Poisson fit. And it’s this type of data that the majority of NBA players follow when it comes to FGA and FTA per **time period** over the course of the season. What we now find is that the scoring attempts for TS% can now be modeled as a **mixture of Conway-Maxwell Poisson models**.

What this allows us to do is the following:

- Understand the impact of player variation on TS% and start to log a well-fitting distribution for TS%.
- As long as we develop
**semi-independent**parts to scoring chances (which we did in out three part breakdown of scoring attempts above), we can sum the distributions. - Logging the distribution gives us parameters, which change over time. This creates a helpful longitudinal study to monitor player learning.

- As long as we develop
- Develop a generalized linear model in attempts to test components.
- Break away from Poisson modeling and build instead a flexible model that better represents the process we are interested in.

And it’s here where the fun begins and becomes challenging. We can now start to develop a distributional model for **quantity** in the now-traditional Quantity-and-Quality models exploited by Kirk Goldsberry. Therefore we can build a stronger model in predicting quantities of shots over a desired time period with an associated quality, when using a traditional logistic regression model.

At nearly 3,000 words, that’s a different story for a different day.

]]>One of the biggest troubles I see when folks **“combine” **rankings, they actually don’t know how to combine rankings. Typically, they add ranks and call it a day. In fact, one NBA I worked with simply added ranks. When I convinced them to use statistical principles, the draft model cleaned up and it dropped specific players who are **already out of the league**.

In this article, we give insight on how to combine rankings without sacrificing the integrity of the analytics. And to do this, instead of spilling secret sauce on our NCAA rankings, we look at a few ranking algorithms for **NBA Players: **RAPM, RPM, Win Shares, BPM, and PIPM. But first… an example.

Let’s consider a simple voting exercise for Most Valuable Player. Suppose that three judges are allowed to submit an ordering of five players, previously agreed upon by the trio. Suppose for this past year they agreed to vote on **Russell Westbrook, LeBron James, James Harden, Stephen Curry, **and **Anthony Davis**. Let’s suppose the first two judges submit their rankings with identical ranks:

- James Harden
- LeBron James
- Anthony Davis
- Russell Westbrook
- Stephen Curry

However, the third judge despises James Harden and knows how the other two voters are going to rank their players. To combat this, the judge submits his ballot as

- LeBron James
- Anthony Davis
- Russell Westbrook
- James Harden
- Stephen Curry

So who wins the MVP Voting? Under MVP Voting Rules, James Harden finishes with 23 points while LeBron James finishes with 24 points. This means **LeBron James** wins the MVP race **despite losing the popular vote **and **not having majority vote while another player does**. This process is akin to the voting process called **Borda Counting**.

Borda counting is the process of adding ranks together. In the above example, the Borda counting solution would be

- LeBron James – 5
- James Harden – 6
- Anthony Davis – 8
- Russell Westbrook – 11
- Stephen Curry – 15

We see the one irrational judge gets to have his vote weighted more than the other two judges merely because he was able to game the rank aggregation methodology. While this example is cartoonish in nature, it’s not fathomable to build a **cult-like **philosophy of **many voters** in an obviously minority position (with respect to voting), who are able to strategically down-vote candidates in an effort to push the candidate down.

In Borda counting, the goal is the minimize the ranking across multiple judge rankings (analytics). In doing this, we simply ignore the analytic in question and start by treating all analytics as equal. Since all analytics are not equal, we instead weight **outliers as premier voters/judges/analytics**.

One question is how to identify the **error **associated with a ranking. This means, how do we measure the difference between two rankings? For the above example, we know that Judge 1 and Judge 2 have the exact same rankings. Hence the distance between their rankings should be **zero**. For Judge 3, what’s the distance between his ranking and the other two judges’? The way we compute this is by counting the **minimum number of pairwise shuffles** to obtain each others’ list.

For the example above, we can write Judge 1’s ordering as **ABCDE**. Similarly, Judge 3’s ordering is **BCDAE**. In this case, the difference between these lists is **three**. That’s the number of shuffles required to get **BCDAE **to becomes **ABCDE:**

**BCDAE -> BCADE****BCADE -> BACDE****BACDE -> ABCDE**

By using this distance measure, we can prove this is indeed a metric on the space of all possible rankings. As a further exercise, we can show the furthest ranking from **ABCDE** is **EDCBA**. And that distance is **ten**.

If we are to draw a probability distribution, we would see that not only do we have two-thirds of our mass on **ABCDE**, but we have the rest of the probability mass at a point a distance of three away, with **several other possible rankings **just 1-2 distances away. Due to this, the **Borda counting** solution is in a low-probability location with **BACDE**.

Instead of Borda counting, we can use the probability distribution above and look for the **maximum likelihood estimator**. In this case, we **do care about the voters** instead of the gaming the voters can play. In fact, our methodology **should have the majority vote winner be the winner**. In this case, we call this a **Condorcet ranking**. And the maximum likelihood estimator is the **Kemeny-Young Ranking**.

In Kemeny-Young, the goal is to find the ranking that best fits the probability distribution; that is, to identify the highest probability ranking given all pairwise combinations of items being ranked by judges. Let’s walk through the methodology using MVP voting example above.

The first step of Kemeny-Young ranking is to look at all the pairwise comparisons given by the voters. Since we have five candidates, we will obtain **ten pairwise comparisons**. For Judge 1, we can write the **10-by-2 voting matrix as**

This matrix represents the ordering of **Harden-James-Davis-Westbrook-Curry**. The result for Judge 2 is identical. This leaves us with irrational Judge 3:

We see that Harden loses the first three rows, but everything else remains the same. The Kemeny-Young ranking methodology then looks at **adding** the **pairwise voting matrices**. When we do this, we obtain the **overall voting matrix:**

Immediately we see that if we take the maximum across each row, we obtain the **maximum likelihood estimator** for the ranking, which **matches Judge 1 and Judge 2’s votes**. We even get the majority vote winner winning the MVP!

This process is a little harder than the example looks, as there may **not be a unique solution**. There may be situations where a circular argument exists. In this case, all the equivalent rankings are equal.

Let’s apply this to the 2018-19 single season numbers. One of the benefits of aggregating player analytics is that we are able to see how **robustly different the analytic **is when compared to its brethren. For this exercise, we take a look at **RPM**, **RAPM, PIPM**, **BPM,** and **Win Shares. **Using these five metrics, our goal is to identify the **TOP 10 **players in the league.

Since we can argue the merits of each of these analytics, let’s just assume that all are relatively blind, but good intentioned, much like the **manatees from South Park**.

Real Plus-Minus is one of the “black-box” analytics used to help users identify an estimate for a player’s net differential per 100 possessions. Under it’s disclaimer, the measurement leverages teammates, opponents, and “additional factors.” Using this metric, the current top 10 players in the league are

- Paul George – 7.64
- James Harden – 7.52
- Anthony Davis – 7.20
- Nikola Jokic – 6.60
- Kyrie Irving – 5.82
- LeBron James – 5.50
- Stephen Curry – 5.14
- Kyle Lowry – 5.06
- Nikola Vucevic – 5.03
- Kevin Durant – 4.82

**RAPM: Regularized Adjusted Plus-Minus**

RAPM is a ridge regression that is applied on lineup data. It does not have a prior distribution, nor does it have an augmented box-score data set. It’s simply on-off net differential the leverages penalization to avoid variance inflation. A lot has been written on this here. Using Ryan Davis’ current listing, the Top-10 players under RAPM are given by

- Danny Green – 4.99
- Kevin Durant – 3.69
- Jrue Holiday – 3.29
- Maxi Kleber – 3.20
- Kyle Lowry – 2.88
- Paul George – 2.85
- Seth Curry – 2.76
- Giannis Antetokounmpo – 2.70
- Brook Lopez – 2.70
- Steven Adams – 2.59

Player Impact Plus-Minus is yet another plus-minus algorithm, developed by Jacob Goldstein, that leverages a box-score prior distribution with luck-adjustment on top of 15 years worth of RAPM data. The idea is to smooth RAPM estimates in an effort to develop a posterior distribution that can predict slightly better than RAPM and RPM. It gives a slightly different top 10 than the other algorithms, and has just as good passing of the eye-test; so much so that many folks have started adopting it within the league.

Currently, the Top 10 players are given by

- Giannis Antetokounmpo – 6.08
- Kevin Durant – 5.72
- Paul George – 5.47
- Anthony Davis – 5.33
- Kyle Lowry – 4.62
- Stephen Curry – 4.61
- Joel Embiid – 4.54
- Kyrie Irving – 4.45
- Nikola Vucevic – 4.25
- Mike Conley – 4.06

Win Shares, as obtained from Basketball-Reference, is a derived as a metric that mimics Bill James’ same-named algorithm within Major League Baseball. It follows the use of Dean Oliver’s Points Produced model and constructs a marginal offense per marginal points per win. The amount of marginal points produced by each player is their resulting contribution to the win. Alas, win shares.

It’s obtained a bad rap around the league over the past few years, mainly for it’s inability to predict future values directly. It’s primarily used as a summarization tool; which even then it is discarded for points produced. Despite this, the listing is actually quite reasonable for a top 10:

- Anthony Davis – 5.80
- Kevin Durant – 5.60
- Rudy Gobert – 5.60
- Giannis Antetokounmpo – 5.50
- Paul George – 5.30
- LeBron James – 5.10
- James Harden – 5.00
- Damian Lillard – 5.00
- Clint Capela – 4.80
- Kawhi Leonard – 4.60

Box Plus-Minus is yet another plus-minus algorithm that attempts to apply prior distributions using box-score data, but instead of focusing on line-up based analysis, it focuses on rate-based analysis; and it shows with many early entries. Due to this, BPM fails eye tests and requires filtering; which we will perform to at least make BPM palatable. As a side note, every team I have ever worked with has dismissed Box-Plus Minus; even moreso since the arrival of PIPM.

As a side note, BPM suffers many of the exact same problems as John Hollinger’s Player Efficiency Rating (PER). And we will place these lists side by side for your viewing pleasure.

One rule of thumb in measuring contribution of players is that if you have to filter, your metric is massively flawed and should never be trusted. But filter, we must… as even Hollinger’s metric has a **qualified** tab to click on.

So to appease the ghosts of analytics past, we filter to obtain at least a reasonable top 10 list. First, Box Plus-Minus filtered on minutes played:

- James Harden – 10.0
- Giannis Antetokounmpo – 9.2
- Nikola Jokic – 8.9
- Anthony Davis – 8.8
- LeBron James – 7.6
- Kyrie Irving – 7.5
- Stephen Curry – 6.6
- Rudy Gobert – 6.5
- Paul George – 6.5
- Russell Westbrook – 6.4

And then for Player Efficiency Rating:

- Anthony Davis – 29.66
- Giannis Antetokounmpo – 28.50
- James Harden – 28.34
- Boban Marjanovich – 28.11
- LeBron James – 26.73
- Kawhi Leonard – 26.56
- Kevin Durant – 26.37
- Stephen Curry – 26.18
- Nikola Vucevic – 25.89
- Jonas Valanciunas – 25.33

Sorry, Montrezl Harrell, we left you off the PER list.

Now that we are armed with six ranking analytics, we can apply Kemeny-Young to identify the consensus Top-10 Players. For Borda counting, we would take players and arbitrarily assign **’11’** values if they do not make the list. This makes no sense at all. Instead, we simply just don’t count them in the Kemeny-Young process and treat them as “ties.”

Across the six metrics, we have a total of **25 players across the Top 10!** Well, that’s not a good sign. In this case, we will be forced to look at permutations of 25 players across the league; which is a total of 1.55×10^25. Yeah, that’s a huge number… Nonetheless, we look at the 300 total pairwise comparisons made across the six analytics.

Now the difficult task is finding the right permutation that maximizes the score across all players. For instance, suppose our Top 10 ranking is

- Paul George (101)
- James Harden (84)
- Anthony Davis (106)
- Giannis Antetokounmpo (101)
- LeBron James (74)
- Stephen Curry (67)
- Kyrie Irving (49)
- Kevin Durant (79)
- Kyle Lowry (44)
- Nikola Vucevic (41)

Then the score for this grouping is 746. But is this the highest score possible? The answer is no. In fact, we find that Kevin Durant beat Kyrie Irving in 4 of 6 categories. By swapping Durant and Irving, we lose two points for having Irving beat Durant, but gain four points for Durant beating Irving; a total of **748 points!**

By playing this **really hard swap game**, we are able to identify our aggregated rankings of players:

- Anthony Davis
- Giannis Antetokounmpo
- Kevin Durant
- Paul George
- James Harden
- LeBron James
- Stephen Curry
- Kyle Lowry
- Kyrie Irving
- Nikola Vucevic

In fact, there are **two ties** in the system: Antetokoumnpo and Durant are interchangeable; as well as Kyle Lowry and Kyrie Irving. Despite this, we are able to obtain our aggregated ranking and we can start asking questions such as “how reliable are the metrics relative to other metrics?” To do this, we can look at the permutation difference we outlined above. For further reference, this is called the Kendall Tau Distance.

With respect to each metric, we can weight the voting matrix above by distributing weight relative to the **cumulative distribution functions** for each analytic. We’ve seen this before. We can also use the associated variation with each metric to identify how “reliable” the voting metric is. We’ve seen this before as well. There are many routes to go to build the permutation distribution and develop a maximum likelihood estimator. And all are much better than being lazy and applying Borda counting.

So armed with this knowledge, which of the six metrics would you trust? Or do we trust the aggregation instead…?

]]>

Let’s start with a simple example to break down what we mean.

To start, let’s consider a Minnesota Timberwolves initial offensive scheme from last season. For this scheme, we have a 4-out and 1-in initialization with a post player starting at the block. This player will then initialize a pick-and-roll to obtain the first looks for the offense.

If we are to look at the center of mass for this offense, it would be offset to the left elbow. The perimeter players would form an arc about the center of mass with the post player looming in the interior of the arc. Much like a 4-out / 1-in offense would be centered as. However, when the motion of the PnR occurs, the center of mass moves with the offense.

At this point, the center of mass lifts into the left wing. What’s more challenging is that the point guard and the post player **interchange **positions. We watch the guard become the interior of the offense with the post settling at the top of the arc; either staying put or rolling into the key. This is common for the 4-out offense as they are attempting to open driving lanes and finding mismatches on switches.

We all know who the point guard is, and who the center is. However, if we restrict ourselves to positions as this, we immediately **correlate** the guard and the center as their positions overlap in tracking. And it’s this very reason why we need to **de-correlate** the tracking positions by increasing the number of iterations; that is, watch 400+ PnR plays; or employ **kriging**. One method of pre-processing for kriging is **role-alignment.**

The process of role alignment originally comes from tracking in soccer, and was once a paper submitted as an ICDM submission back in 2014. The idea in soccer is intuitive and doesn’t translate to basketball properly; but its use in borrowing strength is a net positive, as we will discuss later.

The process is straightforward:

The first step is to take tracking data, frame by frame, and identify the centroid of the offensive players. This is the very first step we performed in the Minnesota Timberwolves example above, with each image being a frame in the data. This centroid will show how the team is distributed about their center of mass.

Next, we arbitrarily set roles. We can use a dictionary order, or a player order. Either way, we start with an ordering of roles that makes sense. In our case, we use players. The idea is that we start with a role assignment and then walk through the roles asking the question, **“Does this role at this frame make sense with respect to this role across all frames?” **

This part is the crux of the algorithm. The idea is that we take the centered distribution of each role and walk through **every combination of positions for each frame**. We build a cost function to identify the **distance** of each **player location **from **each of the five** **roles**. This builds us a **cost matrix**. And then we apply **Munkres Linear Assignment** to identify the optimal role assignment for that frame. This is called an **updated role. **

After we walk through all frames in the segment, we repeat until convergence of roles.

So let’s see this in action…

In this example, we extract out a possession between the Utah Jazz and the Sacramento Kings on November 21, 2018. In this possession, Ricky Rubio brings up the ball after a made free throw from Marvin Bagley III, and runs through a PnR action with Rudy Gobert. The track paths look like so:

As we see the tracks tangling, there is a lot of correlation between players’ paths; requiring some form of strength borrowing. By applying the centering scheme for every frame, we find that the player track paths show a much different picture.

And we no longer see clean segments of tracking. We find that some players rotate, such as the blue path. We see red is actually segmented between two groupings. What this shows are a pair of player switches: one off a screen, another off a shallow switch.

Therefore, we step through the third part of the role alignment process and perform the linear assignment until convergence. So we do just that to obtain roles within the offense.

And here we see the clusters organize much nicer than in the original assignment. There are still some tricky interactions going on but we clearly see a blue, black, red, and green. The purple action in this plot is primarily the red, blue, and green swapping in the original plot. Therefore, we see the purple segments continuing to try and rotate around the center of the offense before snapping back towards the lower-left-hand side of the offense.

Refitting the roles into the track paths, we obtain these segments:

Apologies on the colors, as the roles keep swapping due to first-in, last-out assignment from Python dictionaries (R does the same, don’t worry). We see how a basketball player breaks up as a sequence of different roles throughout the possession. In soccer, the original playing field for development, it is rare to see these many role swaps. However, we still borrow strength from now defining roles as opposed to individual players. The correlation now transfers to the role swapping locations, which is a smaller set of landmarks than two overlapping track paths.

The main crux of the algorithm is the iterative process. In this case, we bring in our role dictionary, **roles**, which is just an index list of tracks. At the iterative step, we compute the distribution, **Gaussian** in this case, as a variance one, mean-estimated distribution.

We then walk through each frame of data and compute the 5×5 cost matrix as a **Kullback-Leibler** divergence between the new point and all the points for all players. After the cost matrix is assigned, we can apply the Munkres linear assignment algorithm; which is a built-in package in Python.

We copy over **roles** using a temporary **newRoles** dictionary, and repeat the process. In this simple case, we cut down the iterative process to 10 iterations. For the Utah example, we required **23 iterations before convergence**…

Once we can get role alignment to work, the next steps are to leverage the borrowed-strength data for machine learning algorithms. By itself, the role alignments are meaningless and interpretation is non-illuminating. As a data science tool, they are powerful. One of the most powerful algorithms on the market at the moment is **Ghosting**. That is, the application of **Long-Short-Term Memory (LSTM)** neural networks in processing average motion between offensive and defensive players.

In the Ghosting framework, instead of laboring to positions, we apply role alignment to reduce the amount of learning required to **de-correlate **overlapping tracks. Instead, we learn roles and learn the role swapping. To aid in further de-correlation, blocks of data are created for **redundancy **and **for the ball position**. Breaking down this algorithm is almost elementary at this point; as role alignment is the long pole in the tent process of Ghosting.

So how would you change role alignment? What would you build off this role alignment algorithm? There are many ways to borrow strength. This is just one, and it happens to be fairly effective. But it’s not the only way.

]]>

Get ready for some math on this fine Christmas Eve!

The RAPM process is simply a linear regression model with a weight played on the square of the coefficients. The idea is simple: adjusted plus-minus is a poor tool that has bloated variances due to a non-invertible distribution of players. This non-inversion bloats the coefficients for each player and gives us a false representation of how players actually play. Again, clearly shown in the previous example.

To combat this bloat, we place a penalty weight on the square of the coefficient. This forces bloating coefficients to fall to zero, while hopefully forcing true coefficients stay relatively the same. It’s almost a magical process, except that all we did was place a prior distribution on the multiple linear regression model (adjusted plus-minus model), and we let the design matrix take care of the rest!

Over the previous years, I’ve been asked over 100 times for this proof; and have even written it up on the walls of three… **three**… NBA Front Offices in an effort to identify what is really going on with RAPM. Here’s the process…

We immediately see that the mean of the posterior distribution is exactly the ridge regression solution. Coding this up directly or applying **sklearn.linear_model.Ridge**, we will obtain the **exact same coefficients**. The key takeaways are this: **we must prove that offensive ratings for a given stint must be a Gaussian distribution with identical variation across all stints, **and that **the variance of the posterior distribution comes along for free! **

In case you are worried about that last step, we applied a **completing the square** step. For a quick refresher, here’s that process:

There are several different ways to compute RAPM; there is no single true answer. Some folks will force predictive errors to be minimized, but this cannot theoretically happen thanks to the elbow-like distribution of errors. Similarly, by enforcing some scheme, over time, the weight becomes stale and must be recomputed. By recomputing, previous results become obsolete when compared to current results. But, for the sake of argument, we will treat this as a mere nuisance and ignore its existence.

Many other ways to end up with different RAPM scores is through the construction of the **design matrix**. This is the stint matrix obtained from possessions. Some old-school RAPM creators will not separate offense and defense and compute **RAPM**. Some new-school RAPM creators will separate offense and defense and compute **O-RAPM** and **D-RAPM. **The difference in the models are fairly strong, but the results are similar. It’s fairly intuitive to suggest that some players are more offensively inclined and some players are more defensively inclined. What’s striking is that **almost all RAPM creators sum the two values to obtain RAPM**. When they do this, they are assuming that the player has played an equal number of possessions on both offense and defense. Oops.

Regardless, we can look at the components of the RAPM process. Above, we see that there are **XtX**, **XtY**, and **lambda**. The matrix **X** is the **design matrix** with **N** rows of stints and **2p+1** columns that identify **p** players on the court. In our set-up, we place a **1.0** for the first columns (constant) and if the player is on offense. We also place a **-1.0** if the player is on defense. The first half of the matrix are offensive players while the second half of the matrix are defensive players. Of course, we will do what most people do and erroneously add the two RAPM values. Sure…. why not?

The value **XtX** is the **adjacency matrix**. The diagonal is the number of possessions a player has played on offense or defense. The off-diagonal components identify the number of **interactions** between a player and their opponent. Teammates are on offense or defense together and have positive values. Opponents are negative values are they are on offense-defense pairings.

The value **XtY **is the **ADDITIVE RATING** across all stints. I emphasized additive ratings as we are adding ratings regardless of the number of possessions. As an example, if a stint has played twice with a rating of **200 **and **100**. The resulting value in **XtY **is **150**. In truth, the rating is really **109.09** as the two ratings are derived from one stint with **2 points over 1 possession** and another stint with **10 points over 10 possessions**. As a flaw with RAPM, this is a commonly accepted atrocity whenever RAPM is computed. For this season, it happens **A LOT**.

**Edit Note: **If we introduce a diagonal matrix, **W**, with the number of possessions along the diagonal, we can rectify this additive rating problem. However, introducing this weighting may have unintended effects.

Finally, the value of **lambda** controls the betas. This value is really the variance of the stint ratings divided by the variance of the prior distribution. For most RAPM calculations, particularly on ESPN, RAPM was being produced with a lambda value of **2000**! This means the variance on the stints are 2000 times greater than the variance on the prior distribution. No rhyme or reason other than, “it passed an eye test” through a broad range using **cross-validation error. **In fact, for a single season, selecting any value between 500 and 5000 is perfectly acceptable; hence making the 2000 a subjective selection.

**Edit Note: **There was an argument brought forth from Joe Sill, the 2010 winner of the Sloan Paper competition for RAPM, and he explicitly indicates that he boiled it down to ~2222 for lambda based on a cross-validation error and point differential per 100 possessions argument. It’s fairly lengthy, and fair. However for the above set-up for O/D-RAPM, a range of 500 – 5000 is still too broad to state the same argument holds.

And in that presentation of RAPM on ESPN, they still threw out players with a minimum minutes threshold. But as lambda goes to zero, we obtain **adjusted plus-minus**. And as lambda goes to infinity, we obtain **all zeros** for everyone. This means everyone must find a **sweet-spot** for lambda to fall.

For our results, we compute stints as consecutive possessions played by a group of **ten** players. The way that I compute possessions is quite different than many other RAPM creators. For instance, the definition of possession is **not uniform across the league**. While my method of counting possessions matches the end of game results; distributing such possessions raise eyebrows. For instance, if a substitution is made during a free-throw and an offensive rebound occurs with a putback score; instead of **double counting the possession**, I say the second unit has given up **2 points over zero possessions** as they had an empty possession that they yielded a score. Other folks will **double count possessions. **Others will **count ****half-possessions**. Either way you slice that possession, you induce an implicit bias in the direction of either unit. My implicit bias is that you should be penalized for not securing an offensive rebound despite being placed in best position, given the rules of the league.

In a similar manner, I tack on technical fouls onto current possessions. Many folks treat technical fouls as new possessions. Therefore the games where three technical fouls occur on one possession, I count it as one possession; while another person will count it as **four possessions; three possessions with at most one point**. And if we look at the computation of **XtY** above, you wee this will have grave effects on the resulting distributions. In fact, the biggest discrepancy **year after year** is **Kevin** **Durant** benefits from technical fouls like no other. It happens again this year as the technical foul discrepancy I impose drops his defensive RAPM upwards of a **point per 100 possessions**. It’s crazy to see how minor possession definitions dramatically affect RAPM. But if you’ve been following along, we see exactly why.

Now, with these caveats out of the way, let’s look at a set-up…

Through December 24th, the **Golden State Warriors **and the **Milwaukee Bucks** have already played their two games this season. The starting units of the second game: Kevin Durant, Stephen Curry, Klay Thompson, Andre Iguodala, and Kevon Looney versus Giannis Antetokounmpo, Malcolm Brogdon, Eric Bledsoe, Khris Middleton, and Brook Lopez played a whopping **two stints **against each other. In fact, this five-some for the Warriors has played in a mind-blowing **27 stints over 34 games. **That’s a starting lineup with less than one stint played together **per game****…**

Regardless, we have the same problem indicated above. One stint is **short** while the other is a **starters’ stint**. The ratings? 66.67 for one and 100.00 for the other. Therefore, the weight stint is, of course **83.333**; when in reality it is much closer to **100**.

Excruciatingly, we must assume that these two values are enough to satisfy a **Gaussian assumption** and that the ratings do indeed form a Gaussian distribution with equal variance to all other stints. For grins, here’s the global distribution of all offensive ratings for this NBA season:

Crap…

Given how RAPM is clearly a Gaussian revisioning of a definitely non-Gaussian process, we can still compute the Top-50 RAPM players through December 24th.

if we compare this list to Ryan Davis’ Single Season Performers, we find there are some similarities. Of course, we are different thanks to possession counting, small-samples, extreme confounding and the whole PCA is rotational invariant thing… but the RAPM results are effectively the same.

Wait… effectively?

In our given profession of NBA analytics, if someone doesn’t report the associated standard deviation with their analytic result they are either lazy or being malicious. As a statistician, no one cares about the expected value; they care about the error associated with an expected value. It’s typically coined as **bias **and **variation**. We do the same in GPS… we don’t report the error estimate, but rather the **center of the error ellipse**. Which is not guaranteed to be the same.

In the computation process above, remember we obtained the variance “for free.” So let’s tack these on…

And there we have it… the **standard deviations are approximately 2 to 2.5 points per 100 possessions for each player. **So let’s see what this means for **Fred VanVleet** with respect to **Gary Clark**.

Since we are working with a Gaussian distribution, we can compute the test for comparison… we obtain a test-statistic of approximately 0.05; which has a ridiculously high p-value. This indicates the **difference between first and fiftieth is not discernible**. That’s right… being the top in RAPM is effectively meaningless from a statistical stand-point. And that’s the rub; RAPM is not an effective tool to significantly measure the impact of a player. It’s just a tool to rank guys and hope no one notices all the pitfalls along the way.

And it’s this primary reason that **three-year RAPM** becomes popular. In this case, the error variation drives down a bit, but the same problems exist. In fact, the tails will start to separate, but the middle of the pack still looks the same. For one team (over the years), I showed them that players between 100 and 300 were nearly identical.

So armed with this knowledge, what would you do to minimize the impact of the assumption fallacies and the associated standard deviations? Regardless… now you know!

**Disclaimer: Our discussion of RAPM over the previous year has been focused on offensive and defensive versions of the original model developed by Joe Sill. For the work developed by Jerry Engelmann, he focuses more on single possession stints; whereas Joe focuses on net-differential stints. Due to this, Jerry does not require weights and Joe does. Similarly, Jerry is able to produce O/D-Ratings and Joe does not, explicitly. **

**In the work presented here, we focused on unweighted stints with O/D-ratings. By pushing in weights, we rectify a couple addition problems but do not see much improvement on the confidence bounds. This is a function of majority of stints lasting 3 or less possessions; causing us to lose the Gaussianity assumption.
**

**This method of write-up is to avoid directly critiquing the work of Joe and Jerry; but alluding to potential issues when the models are tinkered with… such as possession counting, using single-seasons, or partitioning stints. Let alone, the biasedness of the results and the lack of interpretability of the coefficients; as they are indeed not points per 100 possessions for that respective player; but rather a biased estimate.**

**One takeaway I’d like to point out is that this methodology is a massive step forward from Adjusted Plus-Minus and is an important basis for further modeling; such as RPM and PIPM… and even some models I have developed directly for teams that are still in use today. However, understanding that coefficient confidence bounds are much more important than the estimates is key here. Especially if you are trying to use RAPM to help make a decision. **