One of the question marks coming into this season were about the isolation tendencies of players such as Jimmy Butler and Andrew Wiggins; particularly when it comes to spacing and the ability to create. In this article, we break down the offensive schemes of the Timberwolves, their rotations, and associated statistics indicating the quality of player interaction.

First, we take a look at the rotations of the Timberwolves. A **rotation** is defined a period of time played by five players. The collection of the first five players on the court is called the **starting rotation**. **Stability** is then defined as having rotations that typically last the longest on the court. Stability can either be a blessing or a curse for coaching staffs. If a team is stable, then the rotations that play lengthy periods of time are playing either because they are successful with limited fatigue **(solid rotations);** or the team is in dire straights and maintain a short bench **(stretching rotations)**. Similarly, unstable teams may either have several quality, yet interchangeable, players **(distributed rotations)**; or the team is platooning players in hopes to either gain experience or find players capable of earning minutes **(platooning)**.

To determine a team’s rotation, we take a look at their **common rotation**. A common rotation is defined as the rotation that typically players over the course of a second. For Minnesota, with 15 games played, at each second of game played, we query the rotations that are on the court. At most, there are 15 rotations. The maximum number of games played for that particular second of the game is defined as the common rotation.

The distribution of common rotations ranges between 12 at its minimum and 51 at its maximum. To give insight, this means that rotations, on average, ranges between **56 seconds** and **4 minutes**. The average rotation lasts **91.78 seconds**.

From the distribution of number of rotations, we see that rotations for teams are split into the two camps: **stable** and **unstable**. If we look at the win percentages for each team, we start to see the separation into **platooning** and **solid rotations**.

We see that teams that tend to have a high number of rotations are teams that are struggling to find optimal line-ups that can sustain high-level of performance. Similarly, teams that have a low number of common rotations are teams with stable, high-performance offenses such as **Houston**, **Golden State**, and **Minnesota**.

The upper right quadrant of teams are winning teams that have a high number of common rotations over their first 14-17 games. These teams? **Boston** (Hayward injury), **Milwaukee **(Bledsoe – Monroe trade), **Philadelphia **(Limiting Embiid’s minutes), **Cleveland **(Age, multiple solid players), **San Antonio **(Age, multiple solid players).

In the lower-left corner of the plot, we obtain teams with stable rotations, but find themselves in losing situations. We actually see a trend that heads downward, indicating the more losing a team, the more likely they are to start platooning. This is the case with the **Los Angeles Lakers **and **Chicago Bulls**.

Despite only having 12 standard rotations, the Timberwolves have played a total of **52 different rotations** across 15 games. In comparison, the Boston Celtics have played **157 different rotations across 16 games! **What this shows is that the Timberwolves have stability through player capability, injury, and roster changes; as well as that Tom Thibodeau maintains a fairly predictable rotation schedule.

The primary rotation for Minnesota is **Andrew Wiggins**, **Jimmy Butler**, **Karl-Anthony Towns**, **Jeff Teague**, and **Taj Gibson**. Together, this unit has participated in 19,837 seconds of action. That is, this unit has played together for **330 minutes and 37 seconds**. This equates to **6 games, 42 minutes and 37 seconds** of action. The most in the entire league.

Comparing how the starting rotation stacks up, the rotation has played in far less offensive possessions than defensive possessions. These situations commonly occur when free-throw shooting becomes a requirement late in games and we find the offensive-defensive substitution pattern take effect. Despite playing in 28 fewer offensive possessions than defensive possessions, the starting unit maintains a **plus 37** in scoring. In effect, the starting rotation scores **1.19 points per possession** while holding opponents to **1.09 points per possession**. While this is not the best in the league, the differential over the high volume of minutes played is promising.

Thanks to the physical abailities of Towns and Gibson, this rotation also dominates the boards; **out-rebounding opponents by 49 rebounds** **in 28 fewer possessions. **This would not be eyebrow raising if the Timberwolves were a terrible shooting team; **but they aren’t**. This rotation **only four more misses than their opponents** on field goal attempts. This indicates that the Timberwolves rebounding percentage is **54.19%**. For a large number of field goal attempts; this indicates a wildly high rebounding differential over their opponents.

The second most common rotation has played 3908 seconds together. This rotation consists of **Gorgui Dieng**, **Jamal Crawford**, **Nemanja Bjelica**, **Shabazz Muhammad**, and **Tyus Jones**. This rotation is considered the **second string rotation** as all players are bench players; but score the highest amount of time on the court after the starting unit.

**Note:** The third rotation is a mixture of the starters and second string: **Andrew Wiggins, Gorgui Dieng, Jamal Crawford, Nemenja Bjelica**, and **Tyus Jones**. This rotation plays a total of 1966 seconds.

We see that the second unit outscores opponents in high fashion much like the starting rotation; outscoring opponents by 26 points over 6 extra possessions: **1.16 points per possession vs. 1.02 points per possession**.

While stability is maintained by both the starting and second-string rotations, it’s the transition that creates problems for the Timberwolves. For instance, the mixture rotation of **Jamal Crawford, Jeff Teague, Jimmy Butler, Karl-Anthony Towns, **and **Shabazz Muhammad** have been outscored **1.45 points per possession to their 0.96 points per possession.** While this unit has only played together over 24 offensive and 22 defensive possessions. This equates to losing 1-3 points per game. Making a change of **Andrew Wiggins **in for **Shabazz Muhammad** is no better; losing at extra **0.25 points per possession**, costing the Timberwolves an average of a point a game.

As we look at the common rotation strategy employed by Thibodeau, we see the progression of scoring over the course of a game.

We see that the starting rotation typically starts the game, finishes the first half, starts the second half, and finishes the game. Their usual stretches are 9:05 minutes in the first quarter, the final 5:47 of the first half, the starting 8:30 of the second half, and the final 7:42 of the game. Needless to say, this is Thibodeau’s main unit in **every quarter**.

From the above plot, we also see that the Timberwolves are more of a second half team. Their standard rotations are consistently outscored in the first half; even mid-way into the third quarter. Despite this, Minnesota turns on the jets and outscores opponents in the second half; when their standard rotations are on the court.

With the primary players of **Jimmy Butler** and **Andrew Wiggins**, the Timberwolves has a pair of notorious isolation players. Similarly, with **Karl-Anthony Towns**, **Taj Gibson, **and **Shabazz Muhammad**, the Timberwolves also have a strong interior presence. In an effort to make these two components work, Minnesota requires deep threats to deter defenses from blitzing the **(obviously)** pick-and-roll offense. These shooters are **Nemanja Bjelica**, **Jeff Teague**, and **Jamal Crawford**.

Despite this, the Timberwolves are not a “bombs away” type of team. Minnesota has only launched 342 three-point field goal attempts, connecting on 129 for a 37.7% rate. This is only 22.8 three point field goals attempted per game. **This is the second lowest total in the league**, ahead of the **Sacramento Kings **(21.4 attempts per game).

What this indicates is that the Timberwolves play almost an entirely inside game.

If we take a look at the shot distribution of the Timberwolves, we find that the team clusters about the arc, as well as inside the paint. However, the Timberwolves has one of the highest rates of mid-range jump shots in the league.

Jimmy Butler takes approximately 13 field goal attempts per game and scores roughly 11.2 points per game from the field.

As a wing player on offense, we find that a hefty amount of field goal attempts come from the 15-18 foot range, primarily from the wing positions. Being a 40.0% field goal shooter, this is not a desirable case. If we color code makes and misses, we find that majority of those misses are coming from 12-18 feet out.

The other premier wing scorer is Andrew Wiggins, who shoots roughly 45.5% from the field, accounting for 15.4 points of production from the field. Wiggins game is eerily similar to that of Butler’s: isolation plays, attack the rim (if possible) but pull up from mid-range if contested.

Again, we see a slew of mid-range jump shots, with the majority being missed. While this bodes poorly for players like Wiggins and Butler, they have the added advantage of knowing either Taj Gibson or Karl-Anthony Towns are underneath the rim; able to stalk out offensive rebounds. Recall above that this is to a tune of +20 offensive boards over their opponents. That is, **28.9% of rebounds during the Timberwolves offense is back in Minnesota’s hands**.

To identify that Karl-Anthony Towns is a better three-point shooter than Wiggins and Butler is a testament to the under-development of both Wiggins and Butler as an outside scoring threat, than it is for the development of Towns as a perimeter scorer. Despite his size, Towns has a high probability or taking a mid range jump shot as he attempted 35 over the course of the first 15 games of the season. Compare this to his 54 attempts beyond the arc, and we find that over 40% of Towns’ field goal attempts come from outside the key. Both his field goal percentage from this range (33-for-89; .371) and his low propensity of obtaining rebounds from these positions on the court are desirable for defenses.

Given this, Towns scores approximately 17.3 points per game from the court over 14.7 field goal attempts, displaying decent efficiency from the field.

Jeff Teague is the other primary scorer on the Timberwolves. Averaging roughly 11.5 points per game from the field, Teague plays in a very centralized location on the court. As the primary ball handler on the offense, Teague obtains most of his attempts from the top of the key and as penetration into the key.

As we see his distribution of field goal attempts, there are only a handful of attempts outside of the 60-degree wedge from the basket. While Teague has better success from mid-range than Butler, Wiggins, and Towns, he finds himself with difficult shots in the paint; missing a majority of these attempts. Of his 113 field goal attempts from within the arc, a total of 45 attempts are taken inside the paint, outside of the charge circle. Of these 45 attempts, Teague managed to convert only 14 of these attempts: **a conversion rate of 31.11%**. Why are these shots important? These attempts from from the standard offensive pattern for Minnesota; which result in **floaters**.

The reason these jumpers are commonly taken in the mid-range is purely due to the offensive game plan. The standard offensive game plan is a low-motion screen-and-roll offense. This action will force a 2-on-2 game between the ball handler and the post inside the lane. As a direct result, Minnesota will either score in the post or obtain a mid-range jump shot. If both looks are well-guarded, then a pass to the perimeter opens up extra looks. However, the offense can be stagnant at times, as we shall soon see.

Minnesota attempts to play to their strengths of strong isolation wing shooters and a dominant low post scorer. In an effort to create spacing, Thibodeau leverages the pick-and-roll offense to pain. To give an example, in their recent game against San Antonio, Minnesota ran 91 offensive possessions and ran the pick-and-roll offense **74 times out of these 91 times**. The remaining 17 possessions included fast-break attempts and possessions that resulted in immediate fouls.

Minnesota creates spacing by remaining relatively stagnant on the perimeter while allowing their premier big man pull the defense into the paint. Their initial offense will look like a four-out, one-in motion offense, but it is designed to place two bigs in the same short corner of the court.

This initial offense allows for the post to set a screen at the top of the key. Since the other three players are out at the perimeter, this creates a 2-on-2 within the key. At this point, a mid-range jumper is taken, a slip pass to the rolling big is given, or a kick out to the perimeter is initiated.

Let’s see it in real time.

In this clip, we see Towns screen Butler. **LaMarcus Aldridge** and **Danny Green** cover the screen well, forcing a kick out to Towns as he is unable to roll. Picking up a one-on-one against Aldridge as Butler rolls out to clutter the left hand wing, Towns drives to the hoop, forcing the entire Spurs defense to collapse into the lane. This leaves **Taj Gibson **open on the corner for an open three. Not a primary three point option, Timberwolves bigs are trained to shoot the three. In this case, Gibson connects for the first basket of the game.

Here we find one of our first **wedge screens**, which are common in Thibodeau’s offense. Here, Towns sets the screen for **Tyus Jones** from the left elbow. Setting a second screen, Towns frees **Nemanja Bjelica. **Bjelica hesitates on the perimeter attempt and kicks out to **Jamal Crawford**. This results in a sideline screen and roll which leads to a Towns 2-point basket.

This is probably the most sophisticated version of the Minnesota offense. Again, the shooters are planted in the corners. This time it’s Teague and Wiggins. Here, Towns and Bjelica set a dual staggered screen on Butler. Bjelica breaks off a secondary pin down on Teague in a twist action to free Teague for penetration. **Joffrey Lauvergne **picks up Teague, leaving Bjelica free to float to the extended elbow. Teague kicks out to Butler, who swings back to the open Bjelica for three.

The above possession starts with an overloaded right side, but quickly morphs back into the standard formation with Towns setting the screen on Butler. Teague and Gibson float to the corners as Wiggins creeps up along the sideline. With **Pau Gasol** and Danny Green reading the screen and roll, they entice Towns to become the long range shooter. Towns obliges and hits a low percentage basket from 20 feet out.

Here is a classic action from the Minnesota arsenal. In this case, the transition offense looks for a quick post up for Towns but doesn’t find it. Instead, Gibson and Teague look for the pick and roll at the top of the key. Teague penetrates and flips the floater at the free throw line. As usual, the basket does not fall.

Here, Minnesota breaks from standard formation to run a Warriors style offense. Butler slips a faux screen and turns into a wheel cut underneath the basket, coming off a weak-side staggered pin down. Towns, in turn, sets the pick-and-roll. As Teague goes over the top of the screen, **Patty Mills** goes over the top of the screen to force an Aldridge 1-on-2 against Teague and Towns. The options here are to find Butler coming off the screen for a jumper or take the mid-range attempt. Again, Teague takes the floater in the lane; this time for success.

Back to the patented screen and roll action, Teague is caught losing his dribble as **Patty Mills ** and **Danny Green** collapse onto Taj Gibson. Teague skips to Green’s man, Jimmy Butler, which results in an open look for three.

Once again with the standard formation, Gasol is forced to cover Tyus Jones. This allows Towns to slip freely down the lane for an uncontested dunk.

We see that the secondary unit is once again running more sophisticated plays as they run a wheel screen with Tyus Jones. Gibson sets the screen on Crawford, however Crawford pulls the pick-and-roll along the three point line, allowing Danny Green to hedge the roll. Having to reset the offense with 4.9 seconds remaining on the shot clock, Minnesota goes into scramble mode, taking a difficult floater in the lane. As is the case with Karl-Anthony Towns on the floor, the rebound is left unboxed and Towns slams home the offensive rebound.

Back to classic pattern with the primary offense on the floor. Towns comes to set the screen as little motion occurs on the perimeter. Butler takes the jumper at the elbow, misses, but manages to collect the long rebound and get fouled in the process.

Again out of standard formation, Gibson sets the screen for Teague. Teague penetrates and finds Gibson, who is fouled on the ensuing attempt.

Here is the second time we see the staggered dual pin down screen play. We start from standard formation, but the San Antonio defense responds by not letting Towns roll. Butler kicks out to Towns, who resets the offense, waiting for the wheel screen from Wiggins on the pin downs from Gibson and Butler. As this weakside motion goes on, Towns and Teague run a pair of screens, allowing Teague to drive baseline. Instead of kicking out to Butler, as Green hedges back for the potential kick-out, Teague takes the reverse lay-up attempt and misses.

Another pick and roll with Towns and Teague. Another series of no movement on the perimeter. Results in a Teague floater, but a foul on San Antonio.

Another screen and roll with a mid-range jumper from Butler. Another miss.

With a slight wrinkle with Gibson and Butler, the offense starts 12 seconds into their possession with a Towns on Teague screen. With a nice slip pass from Teague, Towns gets to the rim uncontested for another dunk.

With these 15 plays of 94 possessions, we have provided some insight into the Minnesota offense. The reason the Minnesota offense works is due to the ability for the wings to penetrate and the posts to dominate the paint. Thibodeau’s offense is stagnant with little weak-side movement and it shows in several points above. Despite this the strong guard play, as Teague was 7-13 from the field; and the slick shooting from Towns as he went 10-18 from the field (including 2-for-2 from beyond the arc) kept the Timberwolves ahead in this game. Contrary to usual format, Minnesota shot 9-18 from three, giving them some leeway later into the game.

However, how could the offense improve to better leverage the stars on the Minnesota roster? So far it’s worked in Minnesota’s favor as they are currently 10-5 through 15 games. The question is, will it continue as the season wears on?

]]>

The most common model of this type is the **Bradley-Terry model**.

The basic form of the Bradley-Terry model focuses on pairwise match-ups between teams, dependent on location, and records whether the home team has come away with a victory. To model this, the **explanatory matrix**, **X**, is an **N-by-31** matrix of variables, where each row of the matrix represents a **game**. We use 31 variables to identify the 30 NBA teams, as well as an **intercept term** to correct for global mean of the observed results.

With the idea that each row of the explanatory matrix is a game, we indicate the home team as a “+1” and the away team as a “-1” value. For the intercept, we set the last value in the row to “+1.” For instance, last night the **Boston Celtics** defeated the **Los Angeles Lakers **107 – 96 in **Los Angeles, CA**. If we place each team in alphabetical order, the Celtics are the **second entry in the row**, after the Atlanta Hawks, while the Lakers are the **14th entry in the row**, after the Los Angeles Clippers. This means that the second entry in the row is “-1” as the Celtics are the visiting team and that the 14th entry is “1” as the Lakers are the home team. The final entry is the value **one** to correct for the intercept.

This means that the row of the explanatory matrix is given by

**(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1)**

Now, if we are able to write the result of the game in terms of a random variable conditioned on the teams playing and location played, a common question to ask is whether we can fit the response to a linear model. Why linear? Consider this fit:

Here, **Y_i**, is the **response for Game i**. This can be whatever the response we feel is right. Want to use the result of the game? Want “+1”** for home win, “-1” for road win?** Sure, go ahead. However, we must be careful of the **resulting distributional properties **after we make our decision. For instance, using least squares **will not work properly** if we use “1” and “-1.”

The **beta values** are the **weight** given to each explanatory variable in predicting, or explaining, the response. In the Bradley-Terry Model set-up, let’s apply the** Celtics-Lakers match-up:**

This means the linear model compares the Lakers and Celtics by looking at their coefficients. If the Celtics are the better team, then their coefficient will be **larger** than the Lakers’ coefficient; provided the **larger the response, the more likely a team wins the game**.

Using the explanatory matrix notation above, we immediately see that **beta_0** is the **league-average** **home-court advantage. **

The **epsilon term** is simply the additive error in the model. This means that while two teams play each other, and one is favored, there is some sort of associated error that could have us seeing a response that is not likely. **Accounting for resulting situations like when the Sacramento Kings defeated the Oklahoma City Thunder on November 7th**.

This given model is not quite correct, because the response is not well understood. Persisting with usual least-squares will lead us to predicting **real values** instead of **win-loss**; which is what we are ultimately after.

To account for this, enter **logisitc regression. **

Logisitic Regression is a methodology for identifying a regression model for **binary response data**. If you are familiar with linear regression, then the following explanation can be skipped down to applications to NBA data. If you are unfamiliar, strap in, it’s going to be a mathematical ride.

The Bradley Terry model makes an assumption that each game played is an **independent Bernoulli trial**. This means that it’s a coin flip, where the coin is weighted by a function of the teams playing and where they are playing. The distribution function is no different than that of Colley’s initial **independent model** without using the **beta prior distribution**. The Bernoulli distribution is given by

Here, **p** is the probability that the **home team will win the game**. The value, **x**, is actually the **response**. Not to be confused with the values of **X** above. The response is either a **1** if the **home team wins** or **0** if the home team loses.

The Bernoulli distribution falls under a class of models called the **exponential family model**. In regression, if we are able to write the distribution of a model in an exponential family format, we are able to identify a **link function** that allows us to build a **linear model** to understand the relationship between the explanatory variables and the response. The exponential family format is given by

The value **h(x)** turns out to be the marginal probability measure. This is not entirely necessary for our purposes. The value, **T(x)**, is called the **sufficient statistic**. This identifies a compacted way to collapse (or aggregate) the data in a manner that the distribution associated with the parameter space is unchanged. In simple terms, this is a **data reduction statistic**.

The value **theta** is called the **natural parameter**. This value identifies the link between a linear model and the parameter space identified by the sufficient statistic. The value, **A(theta)**, is the **moment generating function** associated with the distribution. If we were to take this function and compute derivatives, we will obtain the **moments ** of the associated distribution.

Let’s start by showing the Bernoulli distribution is indeed an exponential family model.

We find that **h(x) =1**, **T(x) = x**, **theta = log(p / (1-p))**, and **A(theta) = -log(1-p)**. The sufficient statistic shows that the data point itself contains all information about the parameter **p**.

This shows that the link is the natural log of the ratio of probability of success divided by the probability of failure for the home team. Think of this as the log of the **odds ratio** for a home team winning a game.

While we are here, let’s verify that the moment generating function indeed yields moments. The term –**log(1-p)** is not in terms of the natural parameter. So let’s first find that. We start with understanding what the natural parameter looks like:

We then substitute this in to the moment generating function to obtain a function of the natural parameter:

We needed that negative! Look at the exponential family model and notice that there is a negative lurking there. Here, we obtain it explicitly from rewriting the moment generating function in terms of the natural parameter! Now, let’s take the derivative with respect to the natural parameter.

We can check the second derivative as well:

So we are indeed in business as these are the first moment and second central moment (variance) of the Bernoulli distribution!

Now that we have an exponential family distribution with identity sufficient statistic, we can apply the link function. This is merely setting the **response** of the **linear model **above to being the **link function**. Explicitly, we have that

Note that since the probability of the home team winning the game is between 0 and 1, we have that the odds ratio is between 0 and infinity. Taking the natural logarithm, we obtain a value between negative infinity and positive infinity. We are in great shape for regression!

However, since we **do not know **p, we cannot just solve these equations for the number of games we have for the season. Instead, we look back at our exponential family for help. This process is called performing a **logistic regression**. The link function above, connecting **p** to **theta**, is called the **logistic link**.

To perform the logistic regression, we do exactly as we do in standard least squares regression. We look at “squared error distance” between the model and the results and minimize these errors. In linear regression, we assume the **Gaussian distribution**. When placed into the exponential family model, we get **squared error** loss. That’s not explicitly the case here.

Let’s look at how we do this in the Gaussian case. In the basic linear regression problem, we assume **homogeneity**. This means that the variances are constant and fixed. We still have to estimate them, but they are viewed as constants.

The **negative log-likelihood **of the exponential family identifies our loss function. In this case, for Gaussian distributions, we obtain

Viola! We have least-squares sitting in front of us. To recount, the first arrow simply applies negative logarithms to the exponential family distribution for the Gaussian. The second arrow just dusts off the constant terms that have no effect on the minimization procedure.

I will leave it to the reader to verify the exponential family form of the Gaussian, which results in the **identity link, theta = mu**. For now, let’s do the **exact same thing here for the Bernoulli distribution**.

Taking the negative log-likelihood for the Bernoulli distribution, we obtain

In the Gaussian case we simply take the derivative, set it equal to zero, and solve. In the Bernoulli case, we took the liberty of taking the derivative; which is the second arrow above. Setting this equal to zero and solving **does not work**. Thanks, Logistic function…

Therefore, an iterative scheme must be adopted. The simplest one out there is **Newton’s Method**. Newton’s method is a calculus based method that uses tangent lines to iteratively solve for a root (zero value). This inherently requires the functions we are maximizing to have **nice tangent (derivative) behavior**. To compute Newton’s method, we take the function in question with a **good starting point** and compute the tangent line at the function evaluated at the good starting point. Where this tangent line **intersects the x-axis** gives us an update for where the zero most likely is.

Since our function we are attempting to solve the root for is the derivative, we need to take the second derivative of the negative log-likelihood function:

The final line is the Hessian value. Piecing this together, we obtain Newton’s method for solving for the coefficients of Logistic Regression!

After choosing a good starting point, we run this until a desired convergence. The output is the **beta** vector of weights for each team. The larger the weight, the **higher associated probability** that team has of winning.

With the Celtics playing at the Lakers on November 8th, we can look at all games up until November 7th and compute the Logistic Regression using the Bradley-Terry formulation. In this case, we obtain the following rankings:

Here, we find that the Boston Celtics’ coefficient is **2.0004** while the Los Angeles Lakers coefficient is **0.4828**. Not shown above is the intercept; which is **0.2229**. Piecing this together, we take note that the Celtics are the visiting team. Therefore the linear model is given by **0.2229 – 2.0004 + 0.4828 = -1.2947**. Placing this into the **logistic link function**, we obtain a **Los Angeles Lakers probability of winning this game to be 21.5058%. **

Here, we must make a note that in certain cases, we obtain outrageously unrealistic cases. Let’s take for instance a proposed game between the **Boston Celtics **and the **Atlanta Hawks **in **Boston**. Using the coefficients above, we have that the natural parameter response is expected be **0.2229 + ****2.0004 + 1.2877 = 3.5110**. This leads to a **Boston Celtics probability of winning this game to be 97.10%**.

While the Celtics are expected to win, why is this absurd probability so high? This is in part due to **large variation** associated with the model. The Bradley-Terry model imposes an **iterative reweighted least squares **model where the weights are the associated Bernoulli variances for each game. This is identified immediately by placing the above gradient equation in terms of matrices.

From this construction, we obtain a method for approximating the error associated with the estimation of each teams’ weight, **beta_j**. In this case, we obtain a variance for each team to be roughly **11,200,00,000,000,000. **This is the exact same problem we run into with **Adjusted Plus-Minus!!!!**

This shows that while we identify a ranking, despite being statistically correct, the associated variance effectively says that the ranking is simply just that; a numerical ranking. Furthermore, this suggests, through Wald type testing, all teams are equal.

Why did this variance inflation happen? First off, **all teams have not played each other**. This identifies that the **support** for the model is missing observations. When this occurs, the model is only fit for games that have at least one sample. if a game has not been seen, the **explanatory matrix enforces that the win-loss response does leverage information between the capabilities of those two teams. **Information lost due to the absence of observable information.

Second, the model assumes we have enough responses for each match-up in order to estimate the variance. One observation? Variance is “**zero**” in the estimable sense. In the model sense, this is effectively infinity as no information of variability exists.

The explanatory matrix is effectively a schedule matrix. Therefore, for each match-up, we need to see multiple observations in order to adequately estimate variation of results within that schedule.

One way to correct for issues is to do the exact same methodology as in **adjusted plus-minus.** That is, apply a **regularizer**. This will control variance inflation, but also perform a singular value decomposition type construction, effectively muting a team’s weight. This in turn turns the team into a **baseline** for other teams to compare against.

In this case, we will be able to gain stable estimates, but at the cost of **interpretation**. In the model above, the value **e^beta** is the **odds ratio ** for a team’s chances of winning. In the regularized setting, this is no longer the case.

Another way to correct is to play around with features. Try to use something other than schedule to build a Bradley-Terry model. Many folks over the years have attempted this. However, when going down this path, keep in mind the signal-to-noise problem that reared its ugly head above.

Finally, we leave you with some basic Python code to reproduce the results above.

First, we process the data. Assume files where each line is date, winner, score, loser, score. Then we simply open the file, read the lines and hold them in memory. Similarly, we create an NBA team dictionary to use to indexing and identify the numebr of games played.

Next, we populate the explanatory matrix and response matrix. This is just a simple sweep through the data file. We also perform some of the basic linear algebra functions that will be needed later; such as a matrix transpose, some multiplication, and rankings initialization.

Next, we perform Newton’s method to identify a set of coefficients that in turn give us our team rankings.

The resulting rankings vector yields our team rankings. We can simply return this as a function for a later block or display accordingly.

Also note as this is an 31-dimensional walk with a random initial start, each time we run Newton’s method, we will get different (but effectively the same) results. To tighten this component, we either have to find an optimal hot start location; or utilize a different convergence metric.

As of the morning of November 9th, 2017, there has been a total of 163 NBA games played. Applying Bradley-Terry to these games, we obtain the current rankings:

How would you build your own model?

]]>

This requires construction of a **Team Defensive Rating**, a **Defensive Points Per Scoring Possession**, and a **Stop Percentage**. In this article, we take a look at the construction of defensive rating. But more importantly, as it is a box score calculation, we look to see how it compares to truth by using play-by-play data.

The first calculation, **stop percentage**, attempts to identify the percentage of possessions that result in no points: **blocks, steals, defensive rebounds**. Since blocks do not necessarily end possessions, there must be some form of estimation to identify the percentage of blocks that result in termination of a possession.

Stops, as defined by **Dean Oliver**, is a two-part process. The first part is the **individual part**. This portion attempts to identify stops generated explicitly from the player through their **blocks**, **steals**, and **defensive rebounds**. The second part is the **team part**. This portion attempts to identify stops generated by the team when the player is on the court.

Individual stops is calculated as

Note that this is a three part equation for **steals**, **blocks**, and **defensive rebounds**; in that order. Let’s break down this somewhat intimidating equation through each of these three parts.

The first part is **steals**. If a steal occurs, the possession ends. This is the primary reason the possession has ended.

The second part is **blocks**. In this case, blocks do not necessarily ends a possession. In fact, a block also may end a possession as a **defensive rebound** that may or may not be obtained by the player who obtained the block. So how do we break down a block?

The first parentheses deals with rebounding relative to shooting attempts. This can actually be written down in terms of a tree diagram of **conditional probabilities**.

This term looks for only two of the four instances: **defensive rebounds when field goals are made** and **offensive rebounds when field goal attempts are missed**. The latter condition identifies possessions that **continue** after missed field goal attempt. The former term of **defensive rebounds on made fields goals** should never happen. Right? **Wrong**. They do happen, but require free throws.

The last term for blocks takes an opponents rebounding percentage and increases it by **seven percent**. This percentage increase is to correct for team rebounds. Therefore, one minus this corrected offensive rebounding percentage yields a **defensive rebounding percentage**. Therefore, the blocks calculation identifies the **number of blocks that result in either defensive rebounds (second term) or continuation of play that result in made field goals (first term)**.

The **defensive rebound** portion identifies rebounds when field goal attempts are missed. We again see the continuation of play with made field goals percentage from the blocks calculation. This time, we find the opposite values, which are missed field goals. Multiplying the missed field goal percentage, relative to continuation and made FG, we obtain the **number of field goals terminate in defensive rebounds. **

Piecing these together, we have steals, field goals missed and defensively rebounded, blocks that are defensively rebounded, and blocks that eventually lead to baskets.

Next, we focus on the team contribution of stops when a player is in the game. It is given by the formula

Again there are three terms. The first term focuses on field goal attempts that are not made nor blocked. Due to the inclusion of missed field goals and blocks, we have the same correction with eventually made field goals and defensive rebounds. As we look explicitly at missed field goals, the made field goals inclusion come from possessions that **terminate on defensive rebounds** despite having a made field goal.

The second term focuses on turnovers that are not generated by steals. These are bad passes out-of-bounds, shot clock violations, traveling, double-dribbling, et cetera. The first and second terms are **scaled by minutes played**.

Since these are box score calculations, there is an assumption that a **uniform distribution of field goal attempts per second** is upheld. The scaling by minutes played leverage this uniform distribution.

The third and final term is the percentage of free throws off of fouls that result in zero points. The squared term is **two consecutive misses**. There is an assumption of **two free throws** on average as one free-throw possessions are either **continuation of possession free throws on made baskets** or **empty-possession technical fouls**. The value of **0.4** is the 15-year old constant of **percentage of free throws that are possession ending**. This value has since been updated to 0.44 in some cases; or learned to a random value near 0.43 through the use of play-by-play data.

This time instead of scaling by minutes, we scale by fouls.

Adding Individual and Team Stops, we obtain **stops** for when a player is in the game. This is the most complicated portion of identifying defensive rating. We can then calculate **stop percentage** for a player. This is given by

This formula calculates the number of stops per possession and scales by the minutes played. Think of the formula as the following: Stops per minutes played for an individual divided by the possessions per minutes played by the team. This yields the estimated stops per possession when a player played.

Recall that possessions is a complex computation that is found in offensive ratings.

Defensive points per scoring possession is as it sounds. We compute the number of points scored and divided it by the number of terminating possessions with points scored. This is also known as **chances** by other folks (thanks Seth :P). In this case, we have the estimated scoring possessions given by **field goals made** plus **free throws that result in at least one point**. The defensive points per scoring possession is given by

I use the term **DPpSP** as defensive points per scoring possession because it saves space on the formula graphic.

Team defensive rating is simple to compute. In this case, it is merely

This identifies the points given up per 100 possessions.

We are finally able to calculate defensive rating. Recall that formula to start the article? If not, here it is…

If we substitute in our short-hand terms, we obtain the exact same equation:

So we can work in reverse and use the above formulas to compute defensive rating. In order to compute this explicitly, we can simply identify these box score elements:

- Steals
- Blocks
- Defensive Rebounds
- Opponent Offensive Rebounds
- Team Defensive Rebounds
- Opponent Field Goals Made
- Opponent Field Goals Attempted
- Team Blocks
- Team Minutes Played
- Opponent Turnovers
- Team Steals
- Minutes Played
- Personal Fouls
- Team Personal Fouls
- Opponent Free Throw Attempts
- Opponent Free Throws Made

Let’s consider the **October 30, 2017** game between the **Philadelphia 76ers **versus the **Houston Rockets**. This game resulted in a 115-107 victory for the Philadelphia 76ers. The box scores for the game are:

Let’s look at **Joel Embiid’s **defensive rating. Here, we will not count anything other than box score statistics. In this case, we have the following:

- 2 Steals
- 1 Block
- 7 Defensive Rebounds
- 10 Opponent Offensive Rebounds
- 41 Team Defensive Rebounds
- 33 Opponent Field Goals Made
- 83 Opponent Field Goals Attempted
- 4 Team Blocks
- 240 Team Minutes Played
- 15 Opponent Turnovers
- 10 Team Steals
- 24 Minutes Played
- 5 Personal Fouls
- 31 Team Personal Fouls
- 38 Opponent Free Throw Attempts
- 28 Opponent Free Throws Made

First, we will use the common possession estimator, **Possessions = FGA + 0.44FTA – OREB + TOV**. For the Houston Rockets, this is 83 + 0.44*38 – 10 + 15 = **104.72 possessions**. Of these 104.72 possessions, a total of 107 points were scored; resulting in **1.0218 points per possession**. This gives us a team defensive rating of **102.1772 points per 100 possessions**.

Computing the defensive points per scoring possession, we get **DPpSP = 107 / (33 + 0.4*(1 – (1 – 28/38)^2)*38**. This results in **2.269478 points per scoring possession**.

In the game, **Joel Embiid **recorded 2 steals, 1 block, and 7 defensive rebounds. This results in 4.465804 stops in the game. Computing the team stops portion of stops, we obtain 3.323866 stops. Combining these we obtain **7.789670 stops**.

This results in a stop percentage of

That’s a stop percentage of **74.3857 percent**. Note that this does not mean that Embiid stops 74% of possessions. This is contribution scaled by a factor of **five** and includes team effort. This is the rationale for the 80%-20% split in defensive rating.

We can now finally calculate the defensive rating for Embiid. We obtain **DRTG = 0.8*TDR + 0.2*100*(1-STOP%)*DPpSP = 0.8*102.1772 + 0.2*100*(1-0.743857)*2.269478 = 81.74176 + 11.626218 = 93.367978.**

This indicates that Joel Embiid obtained a **93.37 defensive rating**, which is significantly better than the **102.18 team defensive rating**. We interpret this as Embiid improves team defense by an estimated total of 9 points per 100 possessions. Explicitly to this game, this indicates that Embiid has **saved the** **76ers roughly 4.5 points** **against the Houston Rockets**.

Through the use of play-by-play data, we are able to walk through the entire game and see all actions occur. The first order of business is to look at the number of possessions. Through counting all possession termination actions, we obtain a total of 204 possessions. This results in **102 possessions for both the Rockets and 76ers**. Recall that the estimated possessions was 104.72 possessions.

Of the 204 total possessions, we find that Embiid participated in 105 total possessions. Of these 105 total possessions, Embiid played on 51 defensive possessions while playing in 54 offensive possessions.

Getting into foul trouble with five fouls, Embiid found himself playing in six spurts throughout the course of the game.

Embiid started the game, playing the first 18 possessions of the game. During these 18 possessions, Embiid played 9 defensive possessions that resulted in 7 points for Houston. The playing time resulted in 3 minutes and 43 seconds of playing time, an average of 12.39 seconds per possession.

Of the 9 defensive possessions, Embiid recorded one defensive rebound only as the Rockets scored on three of the nine possessions. **Robert Covington **was the star of this stint, recording a defensive rebound and two steals of these nine possessions.

At the end of this stint, **Embiid’s defensive rating is 77.78.**

Coming back in to close out the first quarter and start the second quarter, Embiid participated in 23 possessions over the 5 minutes and 27 seconds of playing time. This resulted in 11 defensive possessions that resulted in 14 Houston Rockets points. The games slowed pace a little, resulting in 14.21 seconds per possession.

Of the 11 defensive possessions, Embiid recorded 1 defensive rebound and 1 steal; terminating two of Houston’s 11 offensive possessions. Of the 11 possessions, Houston converted on seven of the possessions, two of which were 1-for-2 on free throws. One of the scoring possessions, Embiid committed a foul that resulted in an extra free throw made. This was a particularly weak stint for the Sixers defense as Houston left off a couple points due to free throws and still managed 1.27 points per possession.

At the end of this stint, **Embiid’s defensive rating is 105.00.
**

For one last stint during the first half, Embiid participated in 9 total possessions, 4 of which were defensive possessions. During these four possessions, the Houston Rockets picked up 3 total points on a single three point attempt by **James Harden**.

Despite the one scoring possession in four attempts, Embiid had little to directly attribute to the stops. Rebounds by **Jerryd Bayless**, **Dario Saric**, and **Ben Simmons **terminated the other three possessions.

The average possession was 13.78 seconds over Embiid’s 2 minutes and 4 seconds of actions. At the end of this stint, **Embiid’s defensive rating is 100.00. **

Embiid started the second half for his fourth stint, which lasted 3 minutes and 45 seconds. During this time, Embiid played in a total of 14 possessions; 7 of which were on defense. During these seven defensive possessions, Embiid recorded nothing on the defensive end. Up to this point Embiid managed **two defensive rebounds and one steal**.

Houston managed to score only six points over the seven possessions, converting only three possessions. A fourth possession, Philadelphia was bailed out by two consecutive misses from the foul line by **Clint Capela**. The other three possessions were terminated by Covington (steal, rebound) and Simmons (rebound).

The average possession lasted 16.07 seconds. At the end of this stint, **Embiid’s defensive rating is 96.77. **

Embiid entered the game late in the third quarter for a short one minute and 36 seconds for a total of six possessions. With an average possession of 16 seconds per possession, Embiid played in three defensive possessions.

Houston converted on one of the three possessions, again bailing out the 76ers by missing both free throws after an Embiid foul. With the three points coming on another Harden three point attempt, the Rockets only mustered one point per possession during this stint.

At the end of this stint, **Embiid’s defensive rating is 97.06.
**

Embiid closed out the game with a significant eight minute and eleven second stretch of time. This stretch witnessed 35 total possessions, which started to speed up, thanks to free throws late. The average possession was 14.03 seconds. Of the 35 possessions, Embiid participated in 17 defensive possessions.

It was during this time that Embiid collected many of his stats. During this stretch Embiid picked up 5 defensive rebounds and one steal. Embiid also picked up his only block of the game. Unfortunately, Houston retained possession as the block went out of bounds.

As Houston only converted on seven possessions, one of which Embiid sent Houston to the line for two free throws. Fortunately for Philadelphia, Houston failed to convert on two free throws, leaving two points on the line. Due to this, Embiid’s defensive rating improved and finished with 47 points over 51 defensive possessions for a **defensive rating of 92.16 points per 100 possessions**.

Compare this to the team defensive rating of **104.90**, we find that Embiid’s presence indicates an improvement in defense; however, his individual stats only seem to appear in his final stint. This implies that Embiid is not the sole reason for the defensive improvement. Seeing the actual statistics from when Embiid is on the court, **Robert Covington** turns out to be the premier defender. This in turn indicates that the combination of **Covington and Embiid** identifies a solid defensive tandem for the 76ers.

As we have seen that the 76ers’ defensive rating is actually **104.90**, we had an estimated team defensive rating of **102.17** from the Oliver equations above. That’s not a terrible estimate by any stretch. However, this comes from estimation of possessions.

In turn, Embiid’s defensive rating was estimated to be **93.37** points per 100 possessions. In reality, Embiid managed a **92.16 **points per 100 possessions.

This shows that the estimation process varies about the truth, and while it is a method for approximating points per 100 possessions, we are able to compute the actual defensive rating by performing play-by-play calculations.

The reason why the estimation process manages to miss by roughly 1-3 percent is due to many factors. First, possessions are estimated using coefficients that are not proper for the estimation process. Second, possession times are assumed to be **uniform**. This means that if 104 possessions are estimated for both teams, then every possession is estimated to be 13.85 seconds long. We see that this is not the case. Third, points per possession are assumed to be **uniformly distributed** over possessions. This again is not the case.

For the possessions issue, we have seen that possessions are grossly over-estimated in the past. For the uniform distribution assumption, if we are able to obtain **thousands of possessions per game**, then we may have a chance to argue uniformity assumptions. However, in small sample games… yes 100 defensive possessions is a small sample relative to the possible ways to terminate possessions… any deviation from uniformity will violate uniformity. And this explicitly happens here to induce these variations in points per 100 possessions.

Despite these flaws, if the user only manages to have box score data, we see that defensive rating is not egregious in estimation. Instead, it’s a carefully thought out process that leverages assumptions of uniformity to get close to the truth.

If we are to compare two players using defensive rating, we must perform a **test of hypotheses**. We cannot simply sort the players by defensive rating using Oliver’s defensive rating. This is because we are using an **estimation process** using the **uniform distribution**. Therefore, if a player has a defensive rating of 92.65 and another player has a defensive rating of 94.01; can we say that the first player is better? **Most likely not.**

Instead, we may say that the players are the same, as the uniformity assumptions may lead to large enough variances such that both scores are realistic for both players.

]]>

In this article, we take a look at Colley’s methodology and attempt to understand the associated statistics with the procedure.

First, we start simple: let us only look at one team and their resulting wins and losses over the course of the season. Currently, the **Atlanta Hawks** have completed eight games into the season and have compiled a 1-7 record. The most basic model will consider each game as a **random draw** from a **Bernoulli Distribution**.

If you are unfamiliar, a Bernoulli distribution is a success/failure distribution for a given event. Here, an event for the Hawks is a game played. A win is considered a success, a loss is considered a failure. Then, we are interested in the **probability of success**, identified by the value **p**. The distribution for a Bernoulli random variable is given by:

The value, **x**, is merely and indicator of whether the Hawks won their game or lost their game. If we now consider every game as an **independent Bernoulli trial**, then we simply identify the number of wins as the sum of each x-value.

Now you may pause for a moment and ask, “**doesn’t the probability of winning change from game to game?**” While the answer to this is **“YES!” **we are interested in only the most basic model. We will expand on this in a bit.

Now, if we are to sum these 0 (loss) and 1 (win) events over **N **games, we obtain a **Binomial distribution**. A binomial distribution counts the number of successes (wins) over the course of N trials (games). There are a couple of key assumptions here:

- Games have the same probability of success.
- Each game is independent. Meaning back-to-backs don’t influence capability.

The distribution barely changes. In this case it becomes

In this case, we made a slight change. The value **x** is no longer 0 or 1; but rather is the **number of wins in N games**. For the Hawks, this would be a value between 0 and 8.

A simple indicator of how well a team plays is to identify their probability of success. In reality, each team has their **true value**. In this set up, we don’t care about the opponents. Instead, we care only about the Hawks wins and losses. If we perform the **maximum likelihood estimator for p**, we are able to identify an estimated value of success for a team.

Maximum Likelihood is the process of taking observed data and computing the probabilistic model for the data **evaluated at the observed data**. This process is called building a **likelihood function**. We then maximize this function with respect to the parameter of interest. This will identify the **highest probability of p from observing the data**. In the general case, the likelihood function is the Binomial distribution. We then write the likelihood function as

One way to maximize this function is to take the derivative and set it equal to zero. We have to be careful and ensure that the second derivative is negative; or else we obtain a **minimum**… which is the **lowest probability of p from observing the data**. That would be bad. Here, we make a slight one-to-one transformation to make our life easier before we apply the differentiation. The resulting differentiation gives us

If we set this to zero and solve, we get

This is actually **win percentage!!! **This means that if we consider this ultra-simple set-up, the easiest way to compare teams is to look at a teams win’s percentage and call it a day. However, there are a lot of issues with this.

One of the first issues we run into, **particularly for small samples**, is that the maximum likelihood operator has **large variance** and tends to be only a **sample path** of the actual state. This means the combination of both large variance and a single sample pass can give us wildly varying results in true probabilities of success. Instead, we might want to apply a **filtering technique** to control for spurious wins. To give example, during the **Detroit Pistons versus Los Angeles Lakers **game on October 31, 2017; with the Lakers handily ahead, the announcers made a comment that since the **Detroit Pistons defeated the Golden State Warriors two days prior**, that the **Los Angeles Lakers are better than the Golden State Warriors**. This flaw in transitive logic stems from the similar result above. If we isolate both games and apply straightforward win percentages, the Lakers are indeed better than the Warriors!

To remedy this, we apply a filtering technique. A common one is placement of a **prior distribution** on the parameter space. This is a **Bayesian procedure** that attempts to use prior information to control observed information, like we saw above. One of the requirements necessary for a prior is to have a **domain identical to the domain of the parameter space**. This means that if we put a prior on the probability of success, we obtain a value between zero and one.

The most common technique for the Binomial distribution is to place a **Beta distribution** prior on **p**. The formulation for a resulting **posterior distribution** is given as **(Likelihood x Prior) / marginal**. The marginal distribution is the **normalizing constant of the posterior distribution**. In this case, we have the posterior distribution to be:

Let’s break down this quantity and show that it’s not as scary. First, let’s factor out the terms that have nothing to do with the probability of success. These terms all cancel out! Next, let’s combine like terms. This leaves us with

You may recognize this. This is a **Beta distribution** but with the parameters updated by the observed data! The values of **alpha **and **beta** are used to **filter** the probability of success. The more important thing here is that if we compute the **expected value**, or **posterior mean**, we obtain an estimator for **p** given the data. In this case, the expected value is

This means the **win percentage** is stretched to a competing value by these parameters of alpha and beta. If we set **alpha = beta = 1**, we get the **Uniform distribution**! This means that every team is equal before the season begins, and as games are played, their resulting probability of winning changes from uniform. Let’s plus in one for both alpha and beta:

If you made it this far, **congratulations! **This is the starting point for **Colley’s Method**. Colley’s Method is the process of assuming that every team follows a **game** **independent, unrelated to other teams, unaffected by schedule, with uniform prior building** to estimated the probability of success for a team.

What does this mean for the Atlanta Hawks? Well, while their 1-7 record would give them a 0.125 win percentage, their filtered win percentage is 0.222. Meaning that they are simply on a sample path slightly worse than their actual probability of success.

From this point, Colley makes some subtle changes in attempts to incorporate scheduling and team-versus-team interactions. Let’s dive in finally!

One of the first steps that Colley performs of this posterior mean is to rewrite the number of wins for a team. This is a straightforward, **but misleading**, process! Instead of following Colley’s footsteps, let’s apply Colley’s ideas in the general sense.

First, we take the number of wins and write them as a **weighted sum of wins, losses, and games played**. This is given by

Let’s break this down. First, we split the number of wins into a **gamma** percentage of wins plus a **1-gamma **percentage of wins. We then add and subtract in a **1-gamma** percentage of losses. This is effectively adding zero and not changing the equation.

The third step collects like terms. The final term is **1-gamma** times the **number of games played!** Since the number of games played is constant, multiplied to a constant **1-gamma**, we can rewrite this as a sum. This means **each game played by that team is identified by 1-gamma**.

This means the final term is merely the weighted number of wins. The weighted number of losses. and the weighted **strength of schedule**. If we now return to Colley’s footsteps, Colley applies **gamma = 0.5**. In this case we obtain

This is effectively the **win percentage** **given half weight ** and the **strength of schedule being uniform across all teams, given half weight**.

We can apply a different weighting scheme. Suppose I want to give only **ten percent to strength of schedule**. Then, all we need to do is set **gamma = 0.9**. This would give us

The next step in Colley’s Method is to change the underlying assumptions of uniformity. Ignoring the requirement, Colley replaces the **1/2** term in the strength of schedule to be **r_i,** or the **ranking of team i**. This leads to the equation for a win to be

This is now a single equation with **q unknowns**. Here q denotes the the **number of unique teams played against** by the, in this case, Atlanta Hawks. In an effort to build a linear system of equations, allowing us to solve for these unknown ratings, we now consider this equation for **all teams**.

To understand how all teams work, let’s index every team by index j. This means **j=0 ** is the **Atlanta Hawks**, **j=1 **is the **Boston Celtics**, and so on. Then we obtain the system of equations identified as

There are a total of 30 of these equations, with 30 unknowns of r_i^j. The value, **r_i^j** just means the **rating for ****game i opponent of team j**. If we rewrite this in terms of a set of linear equations, we obtain the following:

Let’s walk through this. Line one is the original definition from the **Beta-Binomial posterior distribution**. Colley states that the posterior mean **(probability of success)** is indeed the **team ranking score**.

The second line substitutes in the rewritten line for wins; using the ignoring of the uniform assumption.

The third line rearranges the terms such that all the team rankings are on the left hand side, while the terms that do not contain the team rankings (constants) are on the right hand side.

Finally, the fourth line rearranges the sum to identify the number of times team j has played team i**. **Therefore we can walk the sum over all teams as opposed to all games!

What this now gives us in the following: **For team j, we associate to them N+2 games**. This is the filtered number of games from the Beta-Binomial distribution. **For opponent i of team j, we associated -n_ij**** games**. This is the number of games played between team i and team j**. For team j, we associate the response to be 1 + (number of wins – number of losses)/2**. Think of this as reflection of the win-loss ratio for the team.

Putting this all together we can obtain a **matrix representation** of this system of linear equations.

The matrix representation is given by

This is written in short form as **Cr = b**. Here, **C** is the **schedule matrix**. This portion identifies the strength of schedule for a given team. Each row is a team’s schedule with the diagonal being the number of games played (plus two with the Beta filtering), and the off-diagonal begin the number of games played against the opponent that the column represents.

The matrix (vector), **b**, is the number wins minus the number of losses **(win differential)** divided by 2. We add on the value of one from the Beta filtering. We think of each entry as effective win percentage for each team. To be clear, a .500 team results in 1 + 0/2 = 1, as the wins and losses cancel each other out.

Therefore the solution of rankings, **r**, is just the inverse of **C** times **b**. This is merely strength of schedule adjustment on the win-percentage. There is no closed form solution for the generalized matrix, **C**, and closed form solutions for the NBA is excessively messy. Due to this, we cannot identify explicit bounds on the rankings.

Instead, let’s look at a couple simple examples. This will help show some issues with the Colley rankings system.

Let’s start simple where we have four teams split evenly into two team conferences. For the league, a total of four games are played. Therefore each team plays their same opponent all four times. The resulting Colley matrix system is given by

There are a total of 25 possible outcomes for the season. Since the season is completely partitioned, we can focus on one conference. In this case, we identify the five following scenarios:

- Team A wins all four games:
**b_1 = 3, b_2 = -1** - Team A wins three games:
**b_1 = 2, b_2 = 0** - Team A wins two games:
**b_1 = 1, b_2 = 1** - Team A wins one game:
**b_1 = 0, b_2 = 2** - Team A wins zero games:
**b_1 = -1, b_2 = 3**

The other conference has the same breakdown.

Now, if we invert the schedule matrix, we see how much weight the schedule places on the win-loss records for each team. Here, the inverse of the schedule matrix is given by

What this shows is that there is a **clear conference bias** as only in-conference games are weighted. We interpret that the schedule places **sixty percent weight **on the team’s win-loss record, while placing **forty percent weight** on the team’s lone opponent’s win-loss record. Therefore, if the two best teams are in the same conference, we could very well see them split games and finish 2-2. While in the other conference, one team is a complete cupcake, resulting in a 4-0 versus 0-4 record. Let’s mark these teams as **A, B, C, **and **D**, respectively.

This results in a **b** vector of **[1, 1, 3, -1]**. Multiplying by the inverse of the schedule matrix we obtain the rankings vector, **r**, as **[0.5, 0.5, 0.7, 0.3]**. A good sign is that we obtain all rankings between zero and one. **This is not always the case in the general situation**.

In this case, we find that Team C is ranked first despite never playing a difficult opponent. This is due to **conference biasing** through the scheduling (sampling frame). Therefore, let’s mix things up.

If we change the schedule slightly and have two games in conference and two games out of conference, we obtain the following scheduling matrix:

In this case, it is impossible to obtain two undefeated teams. Instead we obtain more than 25 possible outcomes. While we still have less than 64 possible outcomes, there are 60 in total, we will not state all the use cases. Instead, we focus on the **strength of schedule impact**.

Inverting the schedule matrix, we obtain

If we consider the old problem of the two best teams in the same conference and they split their two games, but win out of conference we obtain the following **b** vector: **[1, -1, 2, 2]. **This corresponds to a respective record set of: 2-2, 0-4, 3-1, 3-1. In this case, the rankings, **r**, are given by **[0.46, 0.21, 0.67, 0.67]**.

This is the best scenario possible as we have a complete sampling frame of the entire league. We may have issues with ranking as Team A will be down weighted by Team B merely due to scheduling; however this is remedied by playing both Team C and Team D.

Finally, let’s look at another out of conference situation. In this case, we have three games in conference and one game out of conference. In this case, we can set the schedule matrix as

In this case, the inverse is given by

We see that there is now rating introduced to teams from teams they have never played. For instance, Team A obtains **11.25% of their ranking from a team they never played**. If this team is poor, then Team A is penalized **every time that team plays**.

In this scenario, assume Team A is the best team in the league and every other team is mediocre. Then suppose we obtain a **b** vector of **[****3, 0, 0, 1]. **This results in a record set of 4-0, 1-3, 1-3, 2-2. This results in a 128/160 = 0.8. The overall rankings are **[0.80, 0.45, 0.30, 0.45].**

What’s interesting here is that this is the first instance that the average of all rankings **is not 1/2**. This violates the expected requirement and is a result of the fractional sampling frame of the schedule.

Let’s verify this: Team A plays Team B three times. In the end, Team A is 3-0, Team B is 0-3. Team A play Team D while Team B plays Team C. This results in Team A 4-0, Team B 1-3, Team C 0-1, Team D 0-1. Team C plays Team D three times and this results in Team C 1-3 as Team D goes 2-2. This results in the **b **vector of **[3, 0, 0, 1]**.

This shows that despite the assumed proof finding an average ranking of 1/2 in Colley’s paper, here is a counter-example that proves otherwise true.

If we have a larger schedule and large population of teams, we run into other major issues. For instance, let’s consider Week 14 of the 2011 NCAA Football schedule. In this case, the 2012 sampling frame (schedule matrix) yielded such a weighting that we obtained probabilities larger than one!

**Note: **Ranking is determined to be the **probability of success** as defined in equation 1 in Colley’s paper. **Therefore it must be between 0 and 1**.

This happens due to the fact the uniform distribution is used as the **filter** and later, the mean of the filter is thrown away in favor of a **general mean**. Since this general mean does not match the filtering distribution used, **any scheduling bias will pull this filtered probability outside of possible ranges.**

What this indicates is that Colley’s Method is **schedule/conference biased **as witnessed by penalizing a second best team due to scheduling **(Examples 1 and 2),** that the assumed weighting will result in a global mean of 0.5 is also false **(****Example 3)**, and that the ignored assumptions violate the definition of probability of success **(Thanks LSU…). **

The reason this all occurs is due to the fact that the original assumption is that **every team’s sample path of games is independent of every other team!** This was the very start of this article. Recall we focused on one team. This assumption is purely false and the correction should focus more on a **graph-based/network approach**.

With the weighted win-loss versus strength of schedule partition of wins, we obtain 30 independent equations. To impose the correlation between the teams played, the general values of **r** are injected without any pure rationale other than “it makes sense.” We have viewed above, despite conference biasing (which is an experimental design issue, not a Colley issue), we still require a **carefully thought out sampling frame**. In NCAA college football, this is not the case. For the NBA, we are fine as teams play everyone in their conference 3-4 times, everyone in their division 4 times, and everyone outside of their conference twice. Despite the distributional assumptions being incorrect, we still obtain an interpretation ranking.

If you’d like to play around with your own Colley Matrix, here’s the associated Python code:

For the NBA, considering games through November 3rd, we have witnessed 130 total games played. The current records are given by:

The resulting Colley Rankings are given by

And Atlanta has since lost an 8th game since starting this article, falling to 1-8 and the bottom of the Colley Rankings.

]]>

The easiest way to capture these quantities are through the use of **kinematics**. Kinematics is a calculus-based representation of physical movement in the field of physics.

Consider a situation where a team collects the ball after a drive and score. In this example, let’s consider a **Russell Westbrook** drive to the basket against the **San Antonio Spurs**. Westbrook is able to convert the basket, with two crashing offensive rebounders. To generate spacing, the **Oklahoma City Thunder** spreads the court with a corner three position and a high wing, who is now sagging in after the converted score in an effort to check his man on **transition**. Let’s quickly illustrate this.

Here, Westbrook’s momentum carries him out of bounds as a post defender collects the basketball to quickly step out of bounds. The point guard for the Spurs, **Tony Parker**, slips between the defenders on a curl to get a pass and initiate a fast-break in attempts to catch the Thunder defense sleeping.

In an effort to do this, realizing the sagging wing, **Danny Green** sets the screen on the corner three player to free **Kawhi Leonard** for a fast-break. This leaves the sagging wing for Oklahoma City to play catch-up. The transition immediately turns into a 2-on-1 or a 1-on-**none** fast break depending on the **kinematics** **of the players on the court**.

The **primitive question** is this: Suppose that sagging wing is **Andre Roberson**. Is he able to catch **Kawhi Leonard**? The **secondary question** is this: How do we **adjust** the offensive and defensive scheme to **optimize **the team’s **defensive **or **offensive posture**?

The most important aspect of the above example comes down to **acceleration**. Acceleration is the quantification of a player’s ability to change their **velocity**. Similarly, velocity is the quantification of a player’s ability to change their **position**. Therefore, in effort to understand acceleration, we must understand a player’s velocity and, in turn, position.

A player’s **position** is their location on the court. This is simply the **x,y-coordinate** represented by some coordinate system. Typically, the **origin** of the basketball court is the ”lower left-hand” corner of the court. If we use feet as our unit if measurement, we then have a rectangle with a maximum value of **50** for the **y-axis** and a maximum value of **94** for the **x-axis**.

We consider position as a **vector**. This means we represent position by its x,y-coordinates, where the tail of the vector is the origin and the head of the vector is the player.

**Velocity** is a **vector** quantity that captures the **speed** and **direction** of a player. It is viewed as the **derivative** of position. To be described through a derivative, the notion of **time** must be introduced. Here, time is defined as the quantification of the sequence between sampling between two points. In high school and college, we learned this as ”taking two points and checking the **rate of change** as the distance between those two points converges to **zero**.” Therefore, rate of change is viewed as time.

Let’s define position by the vector as:

Here, **r_x** and **r_y **are the x- and y- coordinates of a player, respectively. Velocity is then defined as

This definition identifies the rates of change in both the x-direction (along the sideline) and the y-direction (along the baseline). While the definition may not yield much light, especially if you’re not familiar with calculus, we want to emphasize the point that velocity is a **vector** that looks at the **rate of change** of a player.

**Acceleration** is the quantification of changes in a player’s acceleration. For instance, if a player is stationary and starts to move, their acceleration is positive. As they speed up, their acceleration continues to be **positive**. However, as the player reaches their physical apex in speed (or becomes tired), they begin to slow down. While their **velocity is positive**, they are in reality **slowing down** and therefore yielding **negative acceleration**. The equation is given by:

Therefore, acceleration is viewed as the second derivative of position with respect to time; as it is the first derivative of the velocity vector.

This means, ultimately, we are interested in acceleration of a player. Understanding acceleration, we immediately know how velocity changes, and in turn, understand the curvature of position. Unfortunately, we obtain only positions and need to identify a methodology for **estimating velocity and acceleration**.

As we incorporate time, we can now write the position vector as

Taking the derivative, we obtain the velocity vector

Similarly, the acceleration vector is given by

If we assume the goofiest motion on the court: **constant acceleration**, we can easily recover the equations for the derivatives. To understand this goofy motion, a positive constant acceleration means the **player gets faster and faster** as time plays out. Similarly, if acceleration is **zero**, the **player maintains this exact same velocity**.

Let’s define this constant acceleration as **a_x** for the x-direction. Here, we will look only at the x-coordinate; the y-coordinate is analogous.

The anti-derivative of **a_x** is given by

Here, we included a term of **v_x0**. Why does this term exist? We are interested in understanding the velocity of a player at a time, t. Taking the anti-derivative of acceleration only gives us the **change in velocity**; not the velocity. To obtain the velocity, we must use an **initial condition**. The initial condition, **v_x0**, is therefore the initial velocity of a player.

Continuing in this manner, the position is given by

These three equations: constant acceleration, anti-derivative of acceleration, and anti-derivative of velocity are the **kinematics equations**.

As acceleration is not constant, we can **condition** on time to obtain an **instantaneous constant**** acceleration** over the sampling period. For **SportVU** tracking data, this instantaneous sampling period is every **0.04 seconds**. Therefore, for unknown instantaneous acceleration, we look at **average acceleration** over the time period of interest. This is still a time-evolving acceleration that is non-constant, however it is viewed as a piece-wise polynomial of acceleration.

Plugging this in to our equations above, we obtain the **Kinematics Equations:**

Since the game has already played out, we can cheat and leverage all data points from the game. We will call this the **naive method**. Using this methodology, we can use **average velocity** at a point. For instance, we are interested in **Andre Roberson’s **acceleration in recovering **Kawhi Leonard** on a fast-break.

In this case, as the break starts at time **t**, we **do not want to measure the velocity at time t**. This seems odd to say, but calculating velocity with Roberson’s position as an endpoint **actually calculates a velocity prior to that position**.

Instead, we take time points **t+1 **and **t-1** and observe the positions, **x(t+1) **and **x(t-1)**. Since the time steps are small, 0.04 seconds, we can get away with this approach. The average velocity is then given by

**Note: **While we are using the t-1, t, and t+1 notation, these are simply **indices**. Here, t-1 = t_0, t = t_0 + 0.04, and t+1 = t_0 + 0.08.

In a similar fashion, we can compute average acceleration from position estimates. We can compute average acceleration in the same manner as computing the average velocity from position. In this case, we have

Therefore, we must require knowledge of the movement of Roberson over a five-point sequence in order to naively recover the acceleration at time, t.

If we apply this simple methodology to the Roberson-Leonard attack, we would prefer to **learn** the **expected** acceleration curves of a player. Instead, in this case, we only have the single geo-tracks; therefore we can effectively only apply the naive method. Let’s start by isolating Roberson and Leonard on the transition:

We can identify the difference in acceleration changes between Roberson and Leonard by looking directly at the separation between the player dots. We see, visually, that Leonard has a higher acceleration. However, let’s actually apply the naive method.

Smoothing the acceleration using the naive method, we find that Leonard has a much higher acceleration; in part due to breaking first in the transition. We see that Roberson starts later and manages to close the gap and decelerates at roughly the rate rate as Leonard.

As we see, Leonard accelerates and maintains a higher velocity than Roberson, however Leonard’s initial position allows Roberson to close in around mid-court. With the expansion into further tracking data, we can gain better insight into understanding **terminal velocity** of a player. That is, the maximum speed a player can reach before decelerating. If Leonard’s is indeed higher, then he may get the extra step and find himself with a lead pass from Parker for a lay-up.

We saw that we can begin to model player movement by understanding their kinematics. While we used a naive method to estimate motion, ideally we would like to understand **instantaneous movement** as opposed to **look-back movement**. In this article, we used the future observations to look back at player movement. If we want to identify instantaneous movement, we should move to a more familiar model for motion: **Kalman filtering** (parametric hierarchical models) or **particle filtering** (nonparametric bootstrap hierarchical models).

]]>

In their 2014 Sloane Sports Analytics paper, Second Spectrum focuses on defining position and leveraging their definition of position to obtain a **probability** **of the rebound falling into their position**.

The quantity of hustle is defined as the percentage of **opportunities of obtaining a rebound**. This is slightly different than the positioning probability, as players are able to move and the closest player may not get the rebound. An **opportunity** is defined as the closest player to the basketball after the ball has fallen below the rim. Therefore, multiple opportunities exist for players, despite the basketball not falling within their bin.

Finally, the conversion probability is merely the proportion of rebounds obtained out of the total number of opportunities. Piecing this together, we obtain the formula

There are some assumptions that are made here. In this article, we break down assumptions when particular mathematical components are used. To be clear, these assumptions may be corrected for by Second Spectrum; but are not explicitly identified in their paper. Here, take a look into what to consider when we use similar mechanics.

So let’s begin…

If you’re not familiar with this statement, welcome! A common flaw in spatial analytics is that analysts sometimes think that the distance between two points is the pythagorean theorem. Unfortunately, that is merely a mathematical representation of the physical process we are interested. Let’s drive this point home.

Consider two players located **“exactly the same distance apart” **from a rebound opportunity. Let’s further suppose player one is **DeAndre Jordan** while the other player is **Slowy McSlowyton**. Is it fair to say they are both equally likely to obtain the rebound? If you said no, then you understand that distance is measured by time!

A more concrete example is found in **Global Positioning Systems**, or GPS. In GPS, our watch has no idea where you are on the Earth. Instead, it listens for a particular code from each GPS satellite it can hear. Each satellite, in turn, has their own code that they effective repeat. Suppose this is **“I am satellite one! I am satellite one! I am…” **When the watch hears the GPS message, it uses an **almanac**, which is a lookup table when the satellites should be saying their messages, and identifying where in space they should be. Your watch then merely **counts the time difference between the satellite barking its message and when the watch receives the message**. This time difference can reverse the famous **D = RT** formula to recover our understood notion of distance. If we obtain four satellites, we can measure the three-dimensional distance from each satellite, as well as the time bias in our shoddy, cheap, watch!

Rebounding opportunities are no different. Instead of measuring distances, we’d instead look at **reactive speeds of players**.

If we assume that all players are equal, then we can use a **Voronoi Tesselation**. This mathematical construct is a **partitioning algorithm** for a surface of interest. The partitioning is conditioned on a set of observed points and answers the question: **where are the regions of my surface that are closest to each point**?

To build a Voronoi Tesselation, we take each point and grow circles. If any circles intersect, we obtain a **boundary** between the two points. This continues until the entire surface is covered. A Wikipedia gif file best illustrates this process:

If we apply this to a particular field goal attempt, we obtain a similar partitioning. Here, we only have ten players on the court and therefore will obtain 10 partitions. We are able to partition for each player; however, we will focus on team partitioning.

Note that we used a discretization to obtain the Voronoi regions on the court for each player. This serves as a dual-purpose. First, we are able to display the Voronoi Tesselation on the court to give a sense of the partitioning on the court. Using the **scipy.spatial **package, we can quickly obtain the necessary boundaries, but are unable to plot on top of a court easily.

Second, using the discretization, if we know where the field goal attempt was taken, we can leverage a **trained distribution of misses** and simply aggregate the probabilities of the rebound falling within each partition. This aggregated probability is called the **value of real estate at the time of the shot**.

A simple piece of code to build the spatial partitioning, we can simply walk over a meshgrid on the court and simply compute distances. **This is for illustration only**. Computing the actual Voronoi Tessilation would use a different computation and leverage the **in** function from the **scipy.spatial** package.

In light of the personnel on the court, we can take a simple adjustment and train player movement around the basket. For instance, guards tend to be faster than posts and have a better chance of chasing rebounds down if the ball goes astray. Similarly, a post player may be quicker than another post player, allowing him to gain access to a region faster than another player.

Therefore we look into **speed**** of players** as opposed to **distance of players**. Therefore, much like GPS, distance is measured in seconds; or more analogously, **which player will arrive at the location first**.

If we were to look at common speeds of players with respect to their regions, we have trained that **Anthony Davis tends to move at a rate of 5.999 feet per second within the lane during rebound attempts**. Similarly, **Brandon Ingram tends to move at a rate of 6.673 feet per second as a crasher**. We can then apply these trained values to obtain a more realistic positioning region.

Given this simple update to building a Voronoi Tessilation, we can look into developing the positioning value in rebounding. Positioning is defined by two quantities: **initial rebounding position** and **terminal rebounding position**. Initial and terminal are defined by the field goal attempt.

As a player takes an attempt, other players may crash into the lane, effectively shrinking other players’ regions. Similarly, other players may **leak out** and effectively give more real estate to other players. The initial position is then when a field goal is attempted, while the terminal position is then the ball is closest to the center of the rim. In this case, we obtain two different Voronoi Tessilations; which in turn yield an entirely different probability set.

A regression is then computed to measure the expected change in rebounding positioning probability. This will serve as the adjustments for players during a particular possession (such as quantifying crashing, boxing out capabilities of each player).

The second part of rebounding, according to Second Spectrum, is **hustle**. These situations are relatively self-explanatory and tend to be the argument from using player-defined Voronoi Tessilations. To argue this argument is indeed not true, we first shall define hustle.

Hustle is defined as **opportunities created after initial positioning**. This would identify factors such as boxing out and crashing. An **opportunity** is defined as the number of times a rebound is available for the closest player when the ball is below ten feet. There can be multiple opportunities for each rebound as the first closest player may not secure the rebound. If a player is able to obtain an opportunity outside of their initial positioning, then they obtain a hustle value.

Again, a regression is performed. However this time instead of looking at terminal position versus initial position, we look at Opportunity Percentage versus the Position Value. This would indicate how likely a player is able to gain an opportunity **given their movement in positioning**.

A slight word of caution here is that a linear regression is being performed. While this may yield answers, this is easily a place to improve on **as much of the data is discrete**.

The final portion of rebounding is then **conversion**. Conversion is the process of obtaining a rebound when an opportunity exist. Therefore it is a percentage. As a word of caution, a player **can grab every rebound and still have a below 100% rebound rate**. Consider jockeying for position and the initial opportunity out-hustling a secondary opportunity to gain a **third opportunity**. These types of rebounds happen when long range attempts miss long; the initial rebounder misses the rebound, a struggle ensues as the second rebounder is unable to secure the ball and the initial rebounder secures the rebound.

Since there is a high correlation between conversion and initial position, a regression is imposed. While this does not clean the correlation, it helps smooth the correlation with respect to the positioning.

The final rebounding value is then given as the formula

As location data is becoming better understood; understanding how to train each of the three components based on **player tendency** will slowly begin to be adapted. One simple correction is to stop thinking of distance as a ruler and instead think of distance as a **time**. This common engineering trick allows for us to redefine quantities such as positioning.

Similarly, regressions are good when the **data satisfies the criteria for regression**. However, if this is not the case, alternative methods should be explored. For instance, here is a screen cap of one regression used by the Second Spectrum team for adjusting hustle.

The R-squared for this is small as the data piles on the left hand side of the plot and largely varying sparse data populates the right hand side of the regression. Either some form of **leverage correction **must be used, or an entirely different correlation capturing mechanism needs to be employed.

Any which way we look at this, the rebounding quantity is a valiant effort in capturing the idea of rebounding ability of players. However, we should take the probability with a grain of salt as many published nuances are not quite true; leading us to realize that a 60% rebounding chance may actually be anything from 35 to 85%.

If given the task, how would you develop a a rebounding metric that captured player capability?

]]>

In this article, we mildly walk through the process of sketching to identify why it’s a fairly neat idea, where some of the pitfalls are, and how it can be applied. In this post, unfortunately, there will be no direct code for three reasons. First, the fun examples come from data that is not releasable to the public and therefore releasing the code to process the data will give insight to the updated technology and data structure of Second Spectrum data. Second, my reconstruction of the code would poorly represent the actual work performed by both Miller and Bornn. While I am indeed capable of reconstructing the code; being the nuanced statistician as I am, I will take extra liberties as I see fit that may not align with the vision of the authors. Third, I am unable to go into details that are not purely released to the public through Miller and Bornn’s work, as I have an NDA in place (not connected to either Miller or Bornn) about discussing this particular topic in detail.

So why go into detail about sketching? **Because it’s cool**. And my personal background in differential geometry and spatio-temporal statistics urges me to identify to anyone interested in advanced statistical analysis in NBA data about this type of research.

In this article, we will break down the three components of Sketching: **segmentation**, **templating, **and **modeling**. Heavy focus will be given to the first of the three elements. Discussion will be given to the second and third of the three elements.

The goal of sketching is to take player tracks from motion data and develop a template that statistically identifies offensive motion in an NBA offense. To begin, we focus on the player tracks. A player track is a **trajectory**. That is the path an object takes over a desired period of time. We can represent a trajectory as the set of points **x_1, …, x_t**, for **t** consecutive time steps. Let’s take a look at a particular play.

The illustrative play we look at comes from the Golden State Warriors. In this illustration, we are concerned with two components: the **Sideline Wing Twist** and the **Elbow Pin Down. **The Warriors will occasionally use the Wing Twist with **Steph Curry **and **Klay Thompson** in an effort to tangle defenders and either open a lane towards the basket or break for the three point line. A secondary **elbow pin down** screen typically comes from the top of the key; either a Zaza Pachulia or JaVale McGee to pick up a wing scorer. The pin down will either free up a jump shot at the perimeter, or a drag-action screen into the lane. By combining these two actions, we can identify four different types of actions; dependent on the response of the defense.

In the diagram, we see the wing twist action. Stephen Curry (blue) feigns popping to the perimeter and doubles back to Klay Thompson (black) for a screen. Instead of screening, Thompson and Curry weave. As they weave, McGee steps in to show a potential **elevator doors** action to spring Thompson to the lane.

After Thompson breaks for the lane, Curry doubles back towards McGee, who takes the extra step to pin down Curry’s defender at the elbow for a perimeter look. After the pin down, McGee finds a completely open lane as Thompson receives the ball in the lane. The fourth action finds McGee nearly uncontested at the rim for two points.

From the illustration, we see that the collection of blue segments is the **trajectory** of Steph Curry’s offensive possession. Similarly, the collection of black segments is the **trajectory** of Klay Thompson’s offensive possession.

We can first start by animating the location data. In this process, we can extract out the points of interest, building a simple matrix of entries that contains **player, team, x-location, y-location, time**. By leveraging the **matplotlib.animation **package, we can build an animation updater than simply reads **scatterplots** conditioned on time. Including some other basic analytics, we can build a player movement animation.

In the animation, we see the twist action at the bottom right of the court. After the exchange, we see Curry double back and take on a pin down from McGee. As Thompson was double teamed and Curry’s defender tracked Curry to the perimeter, McGee was able to read the open spot on the court and break for an eventual lay-up.

Once we are able to visualize the play, the next step is to identify the trajectories of each player. A simple way to represent this is to plot the movement as a static plot. In doing this, unfortunately, we will get a gigantic mess on the court. Instead, we opt to animate this process while **holding the offensive possession trajectories** in memory.

I lovingly call this a **Snail Plot**. This is due to the trajectories being mapped out as the player moves around on the court. After enough motion, there will be an indistinguishable noodle soup. However, as the tracks play out, we can start to see each player’s motion in relation to every other player on the court.

Let’s take a look at JaVale McGee’s trajectory. In this case, McGee collects a made basket on defense and passes the ball in to Curry. He then heads down court, lingering at the top of the key, waiting for the play to develop and make his move. After the twist action occurs, McGee sets a quick screen and breaks for the basket.

We now have an isolated trajectory for McGee. How do we understand his spatio-temporal role within this offense?

The first step in the sketching process is to break apart segments of an offensive play. Recall, there are two primary actions that make up this Warriors’ play. The method used by Miller and Bornn is to use **low-velocity movements**. A low velocity movement is defined as approximately .25 seconds of less than 0.1 feet per second movement. In this case, seeing as we have JaVale McGee’s trajectory, we should take a look at his speed on the court.

We immediately see all the segments of McGee’s trajectory that breaks apart due to sustained low velocity movements. If we look at the **half-court possession only** we see that there are actually three segments to McGee’s motion. Each value on the x-axis represents a mere 0.04 seconds; which indicates that .25 seconds is six contiguous time steps.

In this vein, the half court action starts at time step 200; this is eight seconds into the possession. From there, we see McGee remain relatively stationary; which is his presence at the perimeter.

The first burst of motion is the elevator door action with Curry during the weave, to let Thompson slip into the lane, drawing McGee’s defender. This is the large spike on the quick movement to knock Allan Crabbe to the back of Thompson; forcing Maurice Harkless to chase Thompson.

After the screen, there is a small burst which indicates McGee’s flow into the elbow pin down. The brief moment after the screen, McGee actually goes beneath 0.1 feet per second for **nine** time steps; forcing a cut point and leading him into the backdoor cut after reading the defense.

We then see the acceleration to the basket, the quick step out to the short corner, and then the burst for the basket on the Thompson miss and put-back. **Note: **Thompson was blocked. McGee picked up the missed attempt and made the uncontested lay-up.

What we have now, mathematically, are **three **segments of the sketch for JaVale McGee. Each segment has its own length. The first segment is approximately 25 points. The second segment is approximately 130 time steps. The third segment is approximately 120 time steps.

The second step of Miller and Bornn’s sketching methodology is construction of the **action template**, which is the hardest part of the sketching process. Here, their goal is to develop a bag-of-words approach to use LDA to identify **topics** associated with offensive mechanics. To build topics, we must have a mechanism for defining **words**; or a **vocabulary**.

Their methodology starts with a clustering technique by computing a sequence of 250 **Bezier curves** that serve as **cluster centers** for each of the potential clusters of segments of trajectories.

A Bezier curve is a piecewise curve constructed in a similar vein as splines or localized polynomials. The attractive properties for Bezier curves are the resulting curve rests within its own convex hull and they can be defined in a recurse, efficient, manner. Due to these two primary properties, Bezier curves are commonly found in animations; or movements in the computer graphics world. **Note: **Tackling a discussion on Bezier curves here will add another 1,000 words (as this was a two-course lecture when I learned about these in graduate school). Feel free to dive in on Wikipedia. It’s actually not too bad on development; but light on detail.

However, finding these Bezier curves helps transform the segment data into a **vocabulary**, where the index of the Bezier curve serves as the **word** of the segment. This pretty slick idea allows us to now place a discrete distribution on the bag of words through the use of LDA!

It should be noted, that the clustering algorithm here must have high amounts of repetition in order to gain enough signal to cluster. As noted in Miller and Bornn’s paper, they leveraged 190,000+ possessions. Given this, they have less than 75% of a season. Of that, a majority of players may only have a couple actions that classify properly. Due to this, an assumption of **exchangability** on player motion (Curry is the same as Thompson) may be sitting within this process.

The final part of the model is the use of Latent Dirichlet Allocation. This process is a well known model introduced in 2003 by David Blei, Andrew Ng, and Michael Jordan. This is a known as a **topic ****model** framework that attempts to identify words to topics gleaned from documents. A common example is the use of words such as “tumor” and “benign” that map to the topic “cancer.” The goal of topic modeling is to identify a suite of topics from each document and identify a probabilistic distribution of words associated to the topics.

Here, we assume that documents are offensive possessions, topics are the** strategies**, and the Bezier curve segments are words. Therefore, the goal is to extract out **strategies** and identify the associated curve segments to join them.

Given the topic modeling that can occur, Miller and Bornn took another step in understanding pairs of words within a strategy. This is, how two actions compare between two players within a strategy. This analysis would help shed light into interactions such as Hammer screens with a PnR strong side and a back screen on the weak side. Similarly, this can look into a Warriors twist with an ensuing elbow pin down. I encourage you to head back to the top of this post and download the paper to read.

]]>

In this article, we focus on the spatial distributions of assists for teams and identify some simple tests to identify interactions of players in attempt to identify primary, secondary, and tertiary options for scoring plays given particular units on the court.

The definition of an assist is fairly complex, as it has changed over the years. Originally, an **assist **was defined as a pass that led **directly to a score**. This meant any moves by the shooter would automatically nullify the assist. Something as simple as a dribble to even a pump fake. Over the years, the definition has relaxed as such examples of a steal and a pass to a leaking guard would not count as an assist if that guard took a single dribble; **but counts now**. In fact, in some leagues this is still the norm. However, the relaxation is not as you would think.

Assists are akin to errors in baseball. While the definition is fairly straightforward, the application of the definition is subjective. Here, we will be at the will of NBA stats and accept assists as-is; meaning the definition is applied uniformly across all games.

The first thing we look into are a team’s spatial distribution of assists. Let’s start by comparing a few teams: **Orlando Magic** (1820),** Houston Rockets** (2070),** Sacramento Kings** (1844),** New York Knicks** (1786)**, **and **Milwaukee Bucks** (1984).

We see from almost every distribution that teams have effectively adopted the **layups and three’s** mentality as mid-range jumpers are small samples. whereas the perimters and the rim are effectively blobs. The most successful team of this lot, the **Houston Rockets**, are the most egregious with very little assisted field goals (only 82 **for the entire season and playoffs**) in the mid-range.

We do see some other curious effects, such as the **Milwaukee Bucks** having a more dominant right-hand distribution of mid-range assisted scoring situations. Similarly, the **Orlando Magic **have a hole in the left-hand short corner, indicating that any points scored in that position are not part of the **passing plan**. Or if it is part of the passing plan, points are not being scored there by the Magic.

Understanding the spatial distribution of assists only can go so far in understand the team interaction. Instead we focus more on the tendencies of players. If we are primed with more information, we can make even further analyses of players.

Here we will focus on three primary data sets: play-by-play data, SportVU data, and Synergy data. We will primarily work off of play-by-play data, identify how to enhance data by using SportVU, and use Synergy for partitioning and cues.

Our case study will involve our first example: **Orlando Magic**.

The first partitioning of the data is to look at how the Orlando Magic units work together. The 2017 Orlando Magic had a total of 19 different roster players who logged at least one minute of NBA action. For any unit on the court, this would indicate a possible **162,792** possible rotations to play on the court. For the 2017 NBA season, the Magic played a total of **282** different rotations over the course of the year.

The rotation with the most assists is the **Aaron Gordon, Elfrid Payton, Evan Fournier, Nikola Vucevic**, and **Terrence Ross** rotation with 460 of the team’s 1820 assists. The second top rotation? **Aaron Gordon, Elfrid Payton, Evan Fournier, Nikola Vucevic**, and **Serge Ibaka** with 290 of the team’s 1820 assists. These are effectively the starters for the Magic after and before the Ross-Ibaka trade. While there are 56 games with Ibaka and 26 games with Ross and Ross’ rotation has roughly 70% more assists that Ibaka’s rotation, this is due in part to roughly **sixteen missed games by Fournier** and a **twenty game stretch where Payton and Vucevic were not starters**. This is evidenced by the third highest assist total rotation being **Aaron Gordon, Bismack Biyombo, D.J. Augustin, Evan Fournier**, and **Serge Ibaka** with 189 assists; the fourth highest assist total rotation of **Elfrid Payton, Evan Fournier****, Jeff Green, Nikola Vucevic**, and **Serge Ibaka** with 163 assists; and the fifth highest assist total rotation of **Elfrid Payton, Evan Fournier****, Jeff Green, Nikola Vucevic**, and **Serge Ibaka** with 128 assists. In fact, here is the Python dump for every rotation with 20+ assists:

So let’s see where these assists go when the top unit is in.

Here, we see the Orlando Magic’s distribution of assists for their top rotation. The plot does not look much different than the team plot with majority of assists at the rim and several others outlining the three-point line. If we parse out quarter-by-quarter action, we see that the distributions change over time.

Here we see the the units are stabilized in the first and third quarters; meaning that they effectively follow the team distribution. However, in the second and fourth quarters, the offensive scheme dramatically changes to be a **left-hand dominant scoring team** with a significantly high probability of obtaining an assist beyond the arc. This indicates a **change in scoring philosophy between the first and second quarters**; **similar to the third into fourth quarters**.

Let’s quantify this difference. In the third quarter, this rotation obtains **72 assists within 6 feet of the basket**. Compare this to the **55 assists on three point field goals** and we see that this rotation is 1.31 times more likely to score at the rim off an assist than from beyond the arc.

In the fourth quarter, we see a different result. Instead we see 16 assists within 6 feet at the basket and 23 three-point field goals. This effectively shows an inversion of game plan as now we see that this rotation is 0.70 times more likely to score at the rim off an assist than from beyond the arc.

As a side note, there is one assist by this unit in overtime. It is a lay-up.

While we have seen that the top rotation takes a significant change in scoring situations between odd and even quarters, one may attempt to say **low sample size** and therefore we are **seeing faces in clouds** / false correlations. Well, to test this, we can look into comparing the two spatial distributions through the use of a commonly known **spatial test** known as **K-function tests**.

A **K-function** take a particular spatial location, **s**, and counts the number of observations within a radius, **t** of **s**. Think of this as building a circle of radius, **t**, with **s** as the center and counting the number of observations within that circle. The function is given by

This is the **expected number of observations** divided by an **intensity**. The intensity is merely the distribution of points **that should exist within the circular region of interest**. For example, if we assume a **Poisson distribution of assists;** that is, randomly scattered attempts at uniform in the circle, we obtain

Recall that the uniform distribution is one-divided by the area of the region of interest. If we expected an observation per region uniformly, we obtain** K(t) = 1 / ( 1 / pi*t*t)**, which is exactly the Poisson noise model above.

While we do not know the true intensity of the distribution of assists, we can estimate them. In order to approximate the K-function, we perform the calculation (brace yourselves)…

Let’s break this down…

The first fraction is merely the **one-over-intensity, **where |D_s| is the **area of the region of interest**. This will be the **NBA court**. This means the nasty sum is the **expected number of observations** **to appear in the circle with radius t**. The points, **s_j**, are the actual assist locations (observations). The values **d_j** are the **distances from s_j to the nearest boundary point in D_s**. The **indicator function, 1, **counts the number of observations**, s_j**, that are within the circle of **radius t **but farther than a distance of **t** from the boundary of the region of interest, **D_s**. Let’s illustrate this with all the parts:

The inclusion of this **d_j > h** means that the point can be captured by a circle of radius t from **every location in the space of interest**. Dividing by this sum we obtain a **estimated distribution of spatial points ****within a circle of radius t**.

Applying this to the top assist rotation for the Orlando Magic, we obtain 381 of the 460 assists to occur more than five feet from the out-of-bounds region. This number is relatively high as the basket is located at **5.25 feet from the baseline**. Therefore all dunks and lay-ups are included. This finds the denominator to be 381 in the formula of estimating a K-function. Of the 211,140 possible comparisons between all 460 spatial locations, we find that 18,456 pairwise spatial comparisons are within five feet.

This gives us a K-Function value of **248.6768**.

As a quick note, for every lay-up, these will match perfectly to all other dunks and lay-ups as a distance of **zero feet**. Hence, anything less than **5.25 feet** for a K-function should have a large total for rotations with a large amount of assists.

Varying the spatial dependence value, **t**, we find the following K-Function plot.

How do we interpret this plot? Recall that the K-Function identifies the expected number of points within a circle of radius **t**. If this radius gets larger, the number of points should increase. This is exactly what we see in this plot.

Note that there is a big drop off at 5.25. This is the point where the dunks and lay-ups are close to the boundary. The steep drop off shows that **assisted field goals between 5.25 and roughly 8 feet are much less frequent than most other shots**. In fact, as we creep out to three point distance, 22 feet, the expected number of assists continue to climb.

Let’s understand this…

At 22 feet (three point line range), we have a K-Function value of 725.43478. This means that we expect 1,704 spatial-combinations to be made within 22 feet of each other. This doesn’t mean that assists continue to climb by this rate, but as we start to include three point region shots relative to the **basket**, we see that more assists are added in.

Once we extend further out from the three point line, we see that number once again drop off and become unstable. This is due to few shots coming from 23+ feet out, and the boundary starting to cause an effect; as the furthest away from the boundary a player can be is 25 feet.

The first step in testing an offensive scheme for a particular rotation is to understand how to develop a test using the K-Function. The most basic test is a **test of uniformity**. Those who are familiar with a test of uniformity will know immediately that the **distribution of assisted field goals is NOT UNIFORM**. We can see this in the plot.

If the distribution is indeed uniform, then we would see roughly as many shots taken at the rim as we would at half court. In fact, we can perform an MCMC to generate a sample of data points from a uniform process and simple compare. Or… we can look at the K-Function.

Recall that uniform noise on the court is simply **shot noise**, or the **Poisson model **we highlighted above. If we plot the K-Function for the uniform model with our data model, we immediately see the disparity.

The statistical test is effectively a **likelihood ratio test** that considers the distribution of the **observed** K-Function and the **theoretical **Poisson distribution. When we take this ratio, however, we subtract the distance, **t**, to obtain an indication of **clustering **within the spatial data. This clustering, in turn, identifies that there is a particular preference of spatial actions performed by the offensive rotation. The uniformity test is given by

Plugging in observed values of the K-Function, we obtain the L-Function curve (set of test values for the K-function with respect to uniformity.

Let’s interpret this plot. Values of **zero** indicate **complete spatial randomness**. In this case, we actually see a value of zero at roughly 8 feet. However, without plotting the estimated confidence bounds (they linger closer to two until we arrive at roughly 15 feet, and then they “balloon” towards 5) we have significance **of non-randomness**. The value zero here indicates a change from **clustering** to **regularity**.

Any significantly positive values indicate clustering that occurs on the court. As we are under 5.25 feet, we naturally will see the clustering at the basket. The rate at which the function heads back to zero, the more spaced out assists are (indicating randomness). However, as we head into ten or more feet, we obtain regularity. This means that we identify regions of activity such as three point attempts, shots from the elbows, and at the rim. However, the data is not tightly clustered in their locations.

Now that we understand the concept of a test, we can then focus on comparison two rotations. In particular, the top rotation during odd quarters and the top rotation during even quarters. In this vein, can difference the two resulting K-Functions for the two rotations of interest. Let’s extrapolate on this:

**If two rotations have the same offensive schemes, they will have the same resulting K-Functions (within variation). **Similarly, **if two rotations have statistically differing K-Functions, then the offensive schemes CANNOT be the same**.

In this case, we are interested in that odd and even quarter change of offensive schemes for the Magic. K1 is then the K-Function for the fourth-quarter observations. Similarly, K2 is the third-quarter observations. Therefore the difference can be illustrated as follows:

Here, we see that by taking the difference, we are nowhere close to zero in effectively every location, **except for the perimeter**. What this indicates is that the fourth quarter unit indeed look for more perimeter attempts on the pass. This doesn’t indicate that the shooters are effectively better at this time period. To investigate that, we need to look at per-possession and per-chance statistics.

That all said, we have developed a spatial methodology to statistically distinguish changes in offensive schemes either within rotation or across rotations. This type of analysis can help us clue in on particular patterns in offensive flavor or…

… develop statistical methodology to identify how offensive schemes react to defensive personnel. Note, in order to perform this partitioning, we are giving up **many ** degrees of freedom and may quickly find ourselves in high variances situations. More specifically, situations where **Simpson’s paradox may arise**.

If we have a Synergy database, we may be able to correlate assist distributions on the court with type of offense; such as Pick-and-Roll plays or Catch-and-Shoot plays. Again, be wary of partitioning noise above the the signal in these situaitons.

Another way we can enhance this data set is to develop a **passing **indicator. This would yield **passing locations** on the court. In these situations, we obtain **spokes** of data and can now look at how ball movement would work within an offensive scheme.

In fact, there’s many things we can do now that we are armed with being able to test spatial data directly. Feel free to try developing this test and see how other team’s schemes play out. **Or more investigatively… which teams run similar styles of rotations…**

]]>

This means that our methodology needs to perform well in classifying Hall-of-Fame players. So how do we go about looking at the performance of such a methodology? In this article, we investigate a common machine learning technique and see how it applies to a given algorithm for classifying Hall-of-Fame players. This, in turn, allows us to understand the quality of our metric.

Here, we will use the Kidd Score as the classification metric is simple to understand and translates easily as an illustration. Also, we provide the Python code, because, you know… learning is fun!

Kidd Score is an analytic developed by Sixers Science (2017) that looks at the product of assists per 75 possessions and rebounds per 75 possessions and multiplies them together. This is a cross-product statistic that identifies the contributions of NBA players on off-ball possessions. High scores indicates that players are either successful in both assists and rebounds; or wildly successful in either of those particular categories.

As noted in a recent Sixer Science podcast (#9), the analytic performs relatively well in discriminating hall-of-fame players. Looking at the distribution of players with Kidd Scores above 7.0, many Hall-of-Fame players make that list.

Here, instead of looking at a **“Top N” **list and saying that holds weight, let’s actually identify how well this classifies players. Here’s the Python commands for generating Kidd Scores from per 100 possessions data obtained from basketball reference.

This chunk of code opens up the file, checks the year and then identifies players that have played within the last five years. These players cannot be in the Hall-of-Fame and we therefore eliminate these players from the system. Similarly, we hold a **CurrentKidds** list of players to be sure to remove them from seasons previous to 2013.

This will allow us to only look at players who have labels** as non-eligible players cannot have a HoF label**.

This chunk of Python code operates in the same loop and picks apart Hall-of-Fame players from non-Hall-of-Fame players; each storing their Kidd Scores. Notice that I personally scaled the Kidd Scores by **percentage of games played**. This was used to clean the data as certain players, like **Danuel House**, post amazing Kidd Scores due to seeing only minutes of action and getting 1-2 assists and rebounds. So I got that noise out of here.

In every binary classification process, we become interested immediately in sensitivity and specificity. The reason for this care comes from the **hypothesis test** that we propose for classification. In a hypothesis test, we have what is called **type I error**, which is “**bad error”** that arise from believing a set of assumptions that the data **truly follows **and upon observing the data, **rejecting the assumptions**. For shame. Similarly, from hypothesis tests we have **type II error**, which arises when the assumptions are indeed not true, but the data looks like they follow the assumptions **just enough** to say they follow the assumptions. This error isn’t as bad, but we’d like to correct for this.

How tests of hypotheses relates to machine learning classification is through the error process. **S****ensitivity **is the process of classifying a group of interest **correctly**. This term is also known as **recall** or **true positives**.

Here, our hypothesis test states a relationship that identifies a piece of data as a Hall-of-Fame player. In the case of Kidd Score, this is a value above a score of **K**. In the Sixer Science podcast, they use the value of **K = 7**. Which shows some really good names!

Therefore, sensitivity identifies the **percentage of Hall-of-Fame players correctly classified**. This is effectively mimics the Type-I error of a test. We have Hall-of-Famers. We built an analytic that classifies Hall-of-Famers. And sensitivity identifies how well we predicted Hall-of-Famers.

Similarly, **specificity** is the process of classifying a group of non-interest **correctly**. This means, in the Kidd Score context, we ensure for players that are not in the Hall-of-Fame they are not accidentally classified in the Hall-of-Fame. Whenever this error occurs, we have what is called a **false positive**. This in turn shows us that specificity identifies the **percentage of non-Hall-of-Fame players correctly classified** while the false positive rate is the **percentage of non-Hall-of-Fame players incorrectly classified as being Hall-of-Fame players. **Think of this as akin to type-II error as the methodology is attempting to control the hall-of-fame players and the class for which the assumptions aren’t geared towards are being incorrectly classified.

Therefore, in a binary classification process, we are interested in **maximizing sensitivity** while **minimizing false-positive rates**. Therefore, in finding this maximum and minimum, we change the threshold score until we find a happy medium. This value is then determined to be our **optimal classifier**.

Note that we didn’t state how well it classifies, nor did we mention that it is a good classifier. We merely stated, **for the given methodology**, it is the best we are going to get. And this leads us to a **Receiver Operating Characteristic (ROC) Curve.**

A ROC curve is a graphical tool that allows a data scientist to look at the quality of their classification procedure. It can also be used as a tool to help compare competing classification models. In this case, we will perform two classification procedures and compare them using ROC Curves.

Graphing a ROC curve is simple. The x-axis of a ROC curve is the **false-positive rate**. This value ranges between zero and one. A value of zero means that **no players outside of the Hall-of-Fame are classified as being in the Hall-of-Fame**. The y-axis of a ROC curve is the **sensitivity**. This values also ranges between zero and one. A value of zero means that **every Hall-of-Fame player is incorrectly classified**. This is a disaster.

To build a ROC curve, we start by looking at the threshold that discriminates the players. For Kidd Score, this is the score **K**. Here, we set a linear space of possible thresholds and walk through the thresholds. For example, if we consider a score of **zero**, we see that **every player is a Hall-of-Fame player!** This is bad news. Here, our **sensitivity** is perfect with 100% but our **false-positive rate** is an atrocious 100%. Therefore a score of **zero** gives us the upper most right hand point in the ROC curve: (1.0, 1.0).

Correspondingly, suppose we have a value of **K = 15**, a score no one surpasses. Then every player is seen as **not being in the Hall-of-Fame**. This results in a **sensitivity** of 0% and a **false-positive rate **of 0%. This is the lower left-hand corner of the plot.

Continuing in this process, we obtain a sequence of points drawing the remainder of the ROC curve. Now, we consider two competing methodologies using Kidd Score. The first methodology states that **any player with a Kidd Score over K is in the Hall-of-Fame**. The second methodology states that **any player with a career average Kidd Score over K is in the Hall-of-Fame**. Let’s break these two methodologies down.

The first method is to consider when a player as a single season above a score of **K**. In this case, if consider **Nikola Jokic** from the **Denver Nuggets**, Jokic posted a single season Kidd Score of **9.1217** last season (adjusted to 8.1205 when considering games played) and that followed his rookie season of **7.0795** (adjusted down to 6.9068 when considering games played). This indicates that Jokic spent one season as a Hall-of-Fame player, while the other not. If Jokic were eligible and **not in the Hall-of-Fame**, this would be a 1/2 contribution to the **false-positive rate**.

Here is the Python code for raw seasons:

Before we draw the plot, let’s look at the ROC curve for average seasons.

In the average season case, we simply build the dictionaries, **KiddHOF** for Hall-of-Fame players and **KiddNoHOF** for non-Hall-of-Fame players. The keys are the player names and the values are arrays of Kidd Scores over their career. In this case, we again throw out players who are not eligible for the Hall-of-Fame. Before we build the ROC curve, we must take the average Kidd Score for each player. This is given by:

Then we perform the exact same **K** walk in building the sensitivity and false-positive rate for the average Kidd Scores. This is performed by:

Now we are ready to plot!

Plotting is simple in Python. We just use **matplotlib.pyplot** using the following code:

This code yields the ROC curve of interest:

Here, we see some good things! First off, there is predictive power in both using raw seasons and using career averages. **To be clear**, these plots were much uglier (and worse) without the game smoothing. That said, the career average tends to be better than the raw season breakdown.

To understand what optimal value of **K** should be used, we look for the point on the curves that are closest to the upper left corner. This corner indicates that we have** zero false positives** while obtaining **perfect recall**. This is the Holy Grail spot on the ROC curve.

In order for a competing methodology to perform **“better”** than these methodologies, the corresponding ROC curve must **dominate** the above ROC curve. Domination here means for every point on one algorithm’s ROC curve, the dominating ROC curve **must be above**.

Using this definition, we see that the **average Kidd Score **approach is effectively better than the **raw season Kidd Score**.

Note that we used a hard threshold for separation. In most cases, there is an algorithmic formulation that requires **distributions of labels**. In those cases, **we must perform a cross-validation method to construct this ROC curve**. Since Kidd Score does not require this process, we are fine in continuing in this manner; **as there is no learned decision boundary!**

Finally, we take a look at how well Kidd Score discriminates Hall-of-Fame players. While it is significantly better than random chance; there is a lot of room for improvement. For instance, the optimal thresholding score in the raw season case is a Kidd Score of **4.7605**. This value manages to correctly classify **70.17% **of Hall-of-Famers. However, it also incorrectly classifies **34****.01%** of non-Hall-of-Famers.

Similarly, looking at the career average scores, the optimal thresholding Kidd Score is **4.1761**. Using this cut-off, we have that **85.19% of Hall-of-Fame players being correctly classified!** However, this is at the cost of incorrectly classifying **30.30%** of players that are not in the Hall-of-Fame.

What this shows is that Kidd Score is good at predicting Hall-of-Fame players, but it’s Type II Error still misses the mark a bit.

That said, this may be a function of **recency bias**. Meaning, that in order to be in the Hall-of-Fame, a player must be out of the league for five or more years. After that, they must be voted in. For instance, the namesake player of the statistic, **Jason Kidd**, is currently not eligible for the Hall-of-Fame as his last NBA season was in 2013. He is eligible starting this season.

Similarly, players take time to be voted in. Players such as **Mitch Richmond**, who waited roughly 12 years to be inducted. Similarly, in recent times, players with less than distinguishable careers statistics wise are included due to their impact on the game: **Drazen Petrovic** and **Sarunas Marciulionis** are two examples. This is not to deny them of being capable players and worthy of being in the Hall-of-Fame; however their respective statistics are quite lower than most Hall-of-Fame players. Which in turn indicates that a pure statistics-based approach (statistics meaning accumulated values in particular playing categories) may not be the best way to approach this classification process.

In fact, if we look at season **when Hall-of-Fame players played**, we indeed see an incredible drop off.

So how would you construct a classifier? In the linked post at the top, I used random forests, but left out a couple key variables; such as **international impact**. That would have separated players like **Drazen Petrovic** from players like **Alexander Volkov**.

Either way, in order for your method to be better, you must **dominate** the ROC curves above. If you don’t…. you lose.

]]>

Kidd Score, named after **Jason Kidd**, takes a player’s assists per 75 possessions, **a**, and rebounds per 75 possessions, **r**, over the course of a season and creates the square root of the multiple:

To understand this analytic, let’s test out a few players.

LeBron James played in a total of 74 NBA games in the 2017 season and amassed 646 assists and 639 rebounds. Scaled to 100 possessions, according to stats.nba.com, James’ per 75 possessions translates his totals to 8.625 assists and 8.55 rebounds. James’ associated Kidd Score is then **8.5874**.

Russell Westbrook became the first player since Oscar Robertson to record an average triple double in a season; roughly **forty years** after Robertson had achieved that feat. Westbrook managed to dole out 840 assists while nabbing 864 rebounds over the course of 81 games in 2017. This translates to 11.025 assists and 11.325 rebounds per 75 possessions, resulting in a Kidd Score of **11.1740**.

Demetrius Jackson, as opposed to Westbrook and James, is a limited time (LT) player for the Boston Celtics, who managed to play in 17 minutes across five games in the 2017 NBA season. During his small amount of time, Jackson managed to pick up 3 assists and 4 rebounds; which is solid for a rookie seeing only 17 minutes of action. Due to this, his per 75 possessions stats are fairly strong, resulting in 6.6 assists and 8.775 rebounds. This results in a **Kidd Score** of **7.6101**.

While these three players are different with respect to usage rates and time played, they have one distinguishing characteristic in common: **they rebound at roughly the same rate as they pick up assists**. So let’s look at two players who don’t share that trait.

DeAndre Jordan is considered one of the best rebounders in the league. During the 2017 NBA season, Jordan grabbed 1,114 rebounds over 81 games. However, Jordan was not known for dishing out of the post; only managing 96 assists. This translates to 16.275 rebounds and 1.425 assists per 75 possessions. This results in a Kidd Score of **4.8158**.

Ricky Rubio, currently of the Utah Jazz, posted 682 assists and 305 rebounds over 75 games for the Minnesota Timberwolves in the 2017 NBA season. This translates to 10.5 assists and 4.725 rebounds per 75 possessions, resulting in a Kidd Score of **7.0436**.

So we see how this score works and interacts with all types of players. So let’s take a deeper look into what this statistic is doing.

If you listened to the Sixers Science podcast on Kidd Score, you would have noticed that the word **metric **was bandied about. Before we begin, let’s lay down terminology to understand what the Kidd Score is doing.

A **statistic** is a function of data. That’s it. Simple. A statistic can be meaningful or it can be absolute garbage. Regardless, a statistic is merely a function of the data. An **analytic** is an algorithm that applies algebra or calculus to obtain a **result**. Typically an analytic is developed to provide insight, but an analytic is merely an algorithmic framework. Analytics and statistics almost go hand-in-hand, as statistics are frequently the **output of an analytic**.

A **metric**, on the other hand, is a **standard of measurement**. This inherently identifies that a measurement can be made; which is a loaded statement as measurements need to be consistent. That is, the distance between a point and itself **better be zero**. Similarly, the distance between two points **are the same regardless of the direction we measure. **And most importantly, **the shortest distance between two points is a straight line**. Note that straight line is defined with respect to the space of which we measure. Euclidean space is indeed a straight line, we all know and love; Spherical space straight lines are **great circles**.

In an exercise in testing if an analytic is a metric, we just have to check the three requirements of being a metric: **non-negativity / identity**, **symmetry**, and the **triangle inequality**.

Obviously, Kidd Score is non-negative as assists and rebounds are non-negative. The product of two non-negative values is non-negative. And, finally, the square root of a non-negative number is non-negative.

Now note, that this “measurement” is effectively measuring the distance from zero by definition. As the **farther** away from zero a player is, the **higher** the Kidd Score. In fact, a Kidd Score of **zero** indicates that the player **must have 0 assists and 0 rebounds**. This shows that **non-negativity / identity** are satisfied.

Next, we look at symmetry. Here, we must swap the distance from (0,0), ** that is no assists, no rebounds**, with the actual per 75 possession totals. This will give us negative distances within the square root; but since multiplication is being applied, the negatives cancel out and symmetry is indeed upheld. Mathematically speaking, we have

Finally, we check the triangle inequality. This asks if we can find a point in space that re-routes the metric in such a way that distance is **minimized**. In this case, we see the following:

This is very promising as we obtain the parts we need for seeing that Kidd Score satisfies the triangle inequality, with two extra **hopefully** non-zero terms. And this indeed satisfies the **triangle inequality if and only if a player always has more rebounds AND assists then their comparison**. Unfortunately, in the examples above, we see this is not true.

Therefore, Kidd Score is **not a metric**. Which is totally OK, because that means it’s a **statistic**. And it’s not only just a statistic, but it’s from a class of statistics called **cross-product statistics**.

To understand the concept of a cross-product statistic, we first look at the definition of **cross-product**. The cross-product is a mathematical operation that helps us identify the **area of a parallelogram spanned by two vectors**. What this means is, is two vectors are orthogonal in two-dimensional space, then the cross-product looks for the area of the rectangle. In this case, for self-containment, orthogonal means that the two vectors meet at a right angle.

Now, if the two vectors do not meet at a right angle, then we get a diamond shaped image (parallelogram). The cross product still finds the area of this parallelogram, but it requires knowledge of the angle between the two vectors. This angle, **describes the amount of correlation between the two vectors****!!** This means, that the **co****sine of the angle captures the correlation between two variables within the statistic**.

To compute a cross-product, we adhere to some simple calculus. We consider the two vectors as being embedded in three-dimensional space: **a = (a1, a2, a3)** and **r = (r1, r2, r3)**. We then form the determinant:

Now, if we consider assists and rebounds, these are effectively two vectors that we can write in the following manner: **a = (a1, 0, 0)** and **r = (0, r2, 0)**. The way we think of this is assists run along the x-axis and rebounds run along the y-axis. If we ensure that the two variables are **orthogonal**, then we can apply this statistical framework. We will look into this in a moment. For now, assume these are orthogonal.

Plugging in the assist and rebound vectors into the cross product, we obtain

Which is merely the product of the number of assists and the number of rebounds! The value **k** tells us that we are projecting the product along the **normal vector** to assists and rebounds. We have effectively **recovered the Kidd Score**. Applying a square root to a set of positive numbers is a 1-to-1 transformation and is only necessary to make the numbers look to be the same scale as the original variables of assists and rebounds. But that’s another story.

In fact, another formulation for the cross-product is given by

That is, the cross product is the magnitude of the first vector, times the magnitude of the second vector, multiplied by the anti-correlation between the two vectors. Since we assumed rebounds and assists to be orthogonal, theta is 90 degrees and therefore the sine is **one; **whereas the **cosine **(amount of correlation) is **zero**.

So let’s actually look into this correlation thing…

A quick way to look into the relationship between assists and rebounds for the 2017 NBA season is to look at the scatter plot. From these, we can compute the sample correlation and perform a simple statistical test.

We see that there is some sort of skew between obtaining assists and obtaining rebounds; that somewhat indicates the more assists a player gets, the less likely they are to obtain rebounds.However, on our plot, please note that the x-axis is **not at the same scale** as it runs up to 12, while the y-axis (rebounds) runs up to 20. Regardless, we see there is a downward trend as the upper quarter quadrant is sparse with players.

To be sure, we can compute a **correlation test** to identify if we do indeed have correlation between assists and rebounds. In forming this statistic, we should at least address one hidden elephant in the room. **Danuel House (Washington) ** tops the list of rebounders at 49.3 rebounds per 75 possessions. This is due in part to playing one minute of action and picking up one rebound. We will take a look at correlation **with **and **without **House just to show his effect on the distribution of this statistic.

The simple test for correlation is the Pearson t-Test. Computing the correlation for the entire 2017 NBA season, we obtain a correlation coefficient of **-0.3207**. This suggests the downward trend that we notice, however is this within variation of the values zero?

The Pearson t-Test for correlation is a statistic that captures the distribution of the standardized correlation statistic centered at zero. Therefore, any correlation that falls to the tails of the statistic’s distribution will be seen as a low-probability event and therefore identified as not being from the distribution of a zero-centered correlation. This in turn states that the observed correlation is **significant**.

The formula for the standardized correlation statistic is given by

where **t** is the test statistic, **n** is the number of players observed, and **r** is the sample correlation. In this case, we obtain a test statistic of **t = -7.2287**. This has a small p-value of **2.247×10^(-12)**. Indicating that there is indeed a downward trend.

So let’s remove House and see what happens. It turns out that deleting house actually **decreases the correlation**, making it **-0.3208**. This indicates that House barely has any effect on the correlation; despite being an outlier.

What this test has shown is that assists and rebounds are **not orthogonal** and therefore the Kidd Score biases players in the direction of information loss. We will touch on information loss in a moment.

The geometric interpretation of correlation is that it is the cosine of the angle in-between two variables. In fact, those familiar with **Principal Component Analysis** (PCA) understand how to transform the data with high correlations into orthogonal variables; which in fact builds off the notion of cosine similarity between variables of interest by correcting through an eigenvector decomposition. (There’s much more to this, but out of the scope of this article).

Therefore the cross product statistic, such as Kidd Score, should look into incorporating the assist-rebound relationship through the anti-correlation term, **sine theta**. Note that this term is **exactly in the denominator of the Pearson t-Test!!! **This is seen as**:**

In this case, we have **r = **-0.3207, which yields a multiplier of **0.9472**. This indicates that assists and rebounds, while proven to not be orthogonal, are near-orthogonal. As we have a negative cosine term and a positive sine term, we obtain a **negative tangent **term, which is the Pearson t-Test ratio; not including the **n-2** term. For a negative tangent value, since assists are the reference axis (cosine here dictates direction of assists) and assists are seen as **negative valued**, then the Kidd Score biases in the direction of **assists**; as they are effectively absorbing roughly 4% of rebounds, according to the anti-correlation statistic.

The simple correction? **Compute **

In this case, we see the according adjustments for the 2017 NBA season:

- LeBron James: 8.5874 ->
**8.1338** - Russell Westbrook: 11.1740 ->
**10.5838** - Demetrius Jackson: 7.6101 ->
**7.2082** - DeAndre Jordan: 4.8158 ->
**4.5614** - Ricky Rubio: 7.0436 ->
**6.6716**

This adjustment correct for the correlation seen within the NBA season and allows us to compare across seasons.

At this point, there’s still an issue with using rates, as Demetrius Jackson currently sits in “HoF’er territory.” The common practice to set a threshold and say “minimum number of possessions required.” However, as scaling through number of possessions may help rectify this. However, sitting at 2000+ words, I’ve already hit the goal of showing how to break out a cross-product statistic and build it with respect to the geometry of the analytic methodology imposed on the data.

If you haven’t checked it out already, I encourage you to check out their podcast: located here. The statistic captures many hall of fame players at the raw level and they use it in attempts to predict seasons for players such as Ben Simmons. It’s a neat little intuitive statistic that captures two integral parts of the game and players that rate high have a tendency to take away possessions from teams while increasing their teams point totals **without having to score themselves**.

]]>