A common methodology for NBA analysts to develop a metric that quantifies scoring ability of a player through the position a player is put in when taking a shot, and the association of a closest defender when the shot is taken. Some such metrics are kernel density plots of field goal percentages on the court, assist adjacency matrices, measure of closest defender, or even collision metrics of defenders upon shooters for shot contests; the last being a metric I developed for a Western Conference team.
However, let’s take a step back from enforcing mechanical models; such as modeling the movement of a player and representing the physics of a field goal attempt, and take a look at the interactions between three players: the shooter, the passer, and the shot defender. If we consider these three players together; we can easily represent data as a tensor.
What is a Tensor?
A tensor is a generalization of a matrix. The number of indices are commonly called “ways.” Hence a three way tensor, T, can have a single element be written as T_ijk. Using a library idea where we have books on rows of shelves across columns of bookcases with multiple aisles, we think of the the i index represents the row, the j index represents the column, and the k index represents the aisle. This looks like a cube of data!
We can continue on with this idea and express further N-way Tensors by just expanding out the indices.
Placing NBA Data in a Tensor
For this data, we look at a sample of data points given for the 2015-2016 NBA season; in which we identify locations of shots, the passers that led to a shot, and the closest defender to the shot. We filter out situations where the shooter takes the ball a long distance; here defined as the shooter travels further than six feet and the passer is not credited with an assist. The sample size we obtain has 22,302 field goal attempts across 436 different NBA players.
The resulting tensor built is a 4-way tensor. The four ways are passing, shooting, defense, and field goal result. The goal is to find the optimal match-ups. These results may seem counter-intuitive when initially given, but we will explain their results.
We set the first index to be the passer in this situation. The index will be the Player Identification Number. For instance index number 353 is J.J. Hickson from the Denver Nuggets and Washington Wizards. Similarly, the second index is the shooter, while the third index is the defender. The index values are the same as the passer index values. Finally the fourth index is the field goal results. This is a miss or a make. Since the indices in MATLAB are required to run from 1; a miss is labeled 1, while a make is labeled 2.
This means that we could have at most 8,303,773,472 interactions! But, we only have 22,302 samples; even less interactions. In fact, we have only 21,311 interactions. Roughly 0.0002566 percent of all possible interactions. Hence, consider those analytics developed that measure player tendency… If you are interested in their player vs. player comparison; buyer beware. Those analytics may not be statistically sound from a sampling framework.
Tensor Decomposition: Measuring Relationships in Data
A common statistical procedure in estimating linear dependencies in data is to perform a singular value decomposition. The most common form of this SVD is the Principal Component Analysis. It’s the underlying model for recommender systems, prior to collaborative filtering; and is a common technique used in hundreds of data science applications. In the tensor setting, we do not have the luxury of computing an SVD. Instead, there are decomposition methods called the Tucker Decomposition and the CANDECOMP – PARAFAC Decomposition. In the former method, the decomposition assumes there is a lower rank tensor decomposition that can be linearly added to form the original tensor. In the latter method, the decomposition assumes that the tensor can be linearly decomposed into a sum of one-way effects that are outer-producted and summed to form the original tensor.
In both cases, the coefficients of the decomposition yield an ordering to the collection of decomposition bases. These coefficients will help identify the different “scenarios” of interest. Let’s apply a CP Decomposition with rank 31. That is, we find 31 sets of passer-shooter-defender-result groupings.
Application to 2015-2016 NBA Data
Applying the decomposition mentioned, we obtain the following distribution of coefficients.
I selected a rank 31 decomposition only because there are 30 teams, and I felt that adding one more rank collection may give some sort of freedom. In fact, if we change the rank decomposition size, we may find a dramatic change in values for the coefficients, but not a dramatic change in the collections themselves.
So let’s take a look at the first scenario; with the largest weight. We should have four vectors within our first collection. The first vector is the passing effect. The second vector is the shooting effect, the third vector is the defensive effect, and the fourth vector is the expected field goal percentage.
In this case, we immediately see clustering. In the top two plots for passers (left) and shooters (right), we see two teams primarily cluster. The left hand cluster are the Orlando Magic, while the right hand cluster is the Indiana Pacers. Recall that the Magic and Pacers were middle-of-the-road teams, finishing 11th and 7th, respectively, in their conference. This clustering is natural as teams tend to pass to each other, and not to the their opponents (hopefully).
The defensive panel is more noisy as this is due to the fairly balanced NBA schedule for every team and the limited number of interactions. However, this scenario identifies that Derrick Favors (Utah Jazz), Darrell Arthur (Denver Nuggets), Kyle Lowry (Toronto Raptors), Alec Burks (Utah Jazz), and Goran Dragic (Miami Heat) are the most influential defenders against these teams. the question is then what is the influence?
From the fourth panel, we have the expected field goal percentage. Recalling that the left hand side identifies misses, we find that these teams are expected to make just under 48.5% of their field goal attempts against these defenders. This means, shooters are expected to miss their shots against theses primary defenders after receiving a pass from their teammate.
More specifically, the interactions are Elfrid Payton passing to Nikola Vucevic or Evan Fournier will most likely end in a missed field goal attempt when defended by Derrick Favors, Darrell Arthur, Kyle Lowry, Alec Burks, or Goran Dragic.
Best Scoring Option For a team
If we look for the highest scoring option, scenario 25 yields a 53.36% chance of scoring on any type of play. So let’s break out this scenario.
Here, we see this is the Miami Heat offense. We also see that the 2015-2016 Miami Heat passing offensive capability for scoring was primarily through Goran Dragic, Chris Bosh, and Dwyane Wade. In fact, these three players are three of the top four on the team in assists and points scored. Luol Deng is the one player left off from the top four, as he finished fifth on the team; behind Justise Winslow, who had he fourth strongest relationship.
Similarly, Hassan Whiteside is left off the top four from scoring despite being the second leading scorer on the team. He does fall to fourth within the Tensor realm, but is significantly lower than the top three players. In this case, this is due to Whiteside’s ability to score on offensive rebounds and on possessions where assists are not common; such as post-up moves and high-low movement.
The defensive players that impact scoring against the Miami heat passing offense are Jose Calderon (Atlanta Hawks, Los Angeles Lakers), Paul George (Indiana Pacers), Bismack Biyombo (Toronto Raptors), Joe Johnson (Brooklyn Nets), and Bojan Bogdanovic (Brooklyn Nets). With the fourth tensor showing the expected field goal percentage, the Miami heat offense should aim their offensive schemes to go through these defenders for the highest opportunity to score.
Extending to Game Plan Methodology
Taking this tensor relationship to a next level, we can then look at the clusters for each team and isolate these defender relationships. For instance, we find that the Miami heat cluster is best against particular players against the Raptors, Nets, Pacers, Hawks, and Lakers. These teams were selected due to the strength of the defensive coefficients in relationship to the offensive pairing of two players.
We can then partition each team to find secondary and tertiary options on offense and defense. For instance, from scenario 25 (Miami Heat). The top option is Goran Dragic to Dwyane Wade. A secondary option may be Goran Dragic to Chris Bosh. A tertiary option may be Chris Bosh to Dwyane Wade.
Similarly, as we saw that Paul George is the biggest liability against the Heat; who is the second highest liability on the Pacers team? For George, his value was (0.01655). Comparatively, Monta Ellis is the second highest liability with (0.01204). We can then partition on each defense to find relative defensive liabilities.
Finally, we take a clustering look. In this case, we will look only at the shooting vectors. The passing vectors are almost identical, with a couple players shifted around within the teams.
Here, we see that teams cluster in blocks. I attempted to color code each new rank collection of shooters to show how they come in blocks. There are a few overlapping blocks in this clustering; due to a few key factors. Some factors are trades. Players played on multiple teams throughout the season.
Similarly, there are teams that have similar effects against the same defensive units. Scenario 1 is of this type: where the Pacers and Magic suffer similar low scoring ability against Derrick Favors and company.
However, we do see the utility in tensor decomposition in estimating relationships from a mathematical point of view when statistical methodologies are too noisy; or have assumptions violated.
So what do you think? Have data that can be shaped into a tensor analysis? Feel free to contact me if you’re interested in how tensors can help you.