# An Example in Kullback-Leibler Divergence

Consider the following table:

This is Kevin Durant‘s percentage of field goal attempts, aggregated by specific distance for the first two seasons of his career. This table gives some information, indeed, however does it really paint the picture of where Durant takes his shots? More importantly, are we able to make proper decisions about the style of play for Kevin Durant?

The short answer is, well… not really.

Commonly, we find that much of the analysis about player tendency and capability stops here. We talk about at what distance a player takes their shots and then typically jump to effective field goal percentage and translate that to rudimentary calculations of expected point value per field goal attempt. Some analysts attempt to take this one step forward and produce a shot quality metric to identify the quality of shot, which actually doesn’t use the above information explicitly.

What happens if we produce another player with almost an identical table? Are these two players the same? Sure, we could build a Chi-Square Test to compare the players, but we may be rudely woken up to the fact that neither player is the same. Let’s take a look at these two players:

Distribution of FGA

Can you guess the two players? They have very similar distributions and, while still being significantly different according to the Chi-Square test, it’s mainly due to the failure of the Normal assumption for the small values in the table. 14 versus 45 causes 73% of the test statistic. But who are these players?

On the left we have P.J. Tucker of the Houston Rockets. On the right, we have Brook Lopez of the Milwaukee Bucks. They are both three-ball-dominant shooters with a tendency to attack the rim. As Milwaukee has modeled their offense much like the Houston Rockets, it’s no surprise these two shooters appear to have the same distribution of field goal attempts. Or do they?

## Brook Lopez Shot Distribution

If we take a quick glance at Brook Lopez’s shot distribution, we find that he primarily takes attempts between the -45 degree to 45 degree range along the top of the key.

Field Goal Distribution for Brook Lopez through February 5th, 2019.

We see the ghost town of field goal attempts in the mid-range, as well as the string of short-range attempts that litter the key.

## PJ Tucker Shot Distribution

Comparing this to PJ Tucker and we obtain an entirely different story.

Field Goal Distribution for PJ Tucker through February 5th, 2019.

We see that almost all FGA occur in the corners. We also see the ghost town of mid-range attempts. The shots in the lane? More along the baseline than being a steady stream towards the free throw line.

It is clear that the distributions are no longer the same. But how do we measure their difference? One solution is to use shooting zones.

## Shooting Zones: One Step Better

A shooting zone is a region of the court that encapsulates field goal attempts at different locations on the court. It’s a step in the right direction as we can now differentiate between a corner three and a top-of-the-key three. Similarly, we are able to differentiate between a left-corner three versus a right-corner three.

Take for instance, Brook Lopez’s shot chart from NBA Stats. It’s a little misleading only due to the fact that they combine both frequency and efficiency. The colors indicate efficiency while the fractions indicate frequency. Here we see the high volume along the top-of-the-key zones.

NBA Stats Zone Distribution for Brook Lopez

We see the same misleading representation with PJ Tucker and again focus on the fractions.

NBA Stats Zone Distribution for PJ Tucker.

And we see a nearly “inverted” plot as majority of PJ Tucker’s three-point attempts are located in the corners.

While this “one step further” plot helps us, there’s still a ton of information left on the cutting room floor. For instance, Brook Lopez is a -45 to 45 degree shooter. The zonal plots do not capture that activity. Right elbow and left elbow are not differentiated, where almost every player favors one over the other. A dunk is also values as much as a hook shot according to the zone distributions.

There’s just a lot of information still being lost.

## Swap Over to Density Plots

We turn over to the next step further. Basketball shot charts have been around for decades. Kernel density plotting of basketball shot charts, too has been around since decades. In 2001, I had to write code for a list of x,y-coordinates into a kernel density algorithm algorithm using a seemingly newfangled programming language called MATLAB (It wasn’t new and I wasn’t alone). And when the KDE revolution finally started to take hold in the media nearly a decade later, being called heat maps at this time, there were still significant flaws in some people’s designs. For instance, old plots would not include distance skewing such as a log-transform, a requirement in effort to show actual three-point effects in scoring. Yes, that is a post from four years ago as a knee-jerk response to poorly displayed ESPN shot charts at the time. That shows the log-transform representation.

If we apply the density function formulation here,we can obtain kde plots for both Lopez and Tucker.

Out-of-the-box kde estimate, using Python, for Brook Lopez. No transforms applied.

Out-of-the-box kde estimate, using Python, for PJ Tucker. No transforms applied.

Of course, we’d like to play with the bandwidth to make the charts “prettier.” This is simply an out-of-the-box method using Python. We of course use the jet color map option from Python, a MATLAB classic color map, to display the heat associated with a field goal attempt.

We immediately are able to surgically identify locations of every field goal attempt by both players. And more importantly, we have an nonparametric approximate distribution for each shooter’s field goal attempts. And unlike the “second step further” plots that we skipped over with scatter (hexagon) plotting, we’re not solely dealing with empirical data points, which by the way, are noisy to being with.

And armed with this distributional knowledge, we can finally start to say something intelligent with shot chart data. Yes… there’s been negligible intelligence obtained thus far.

## Kullback-Leibler Divergence

Our discussion started by asking about the similarities between two players. While this is helpful in understanding where players are positioned, this is rarely the question that we would like to answer. In order to understand the question we really want to answer (and we haven’t asked just yet), we will tackle this thought exercise first in an effort to understand Kullback-Leibler Divergence.

Kullback-Leibler Divergence is a method for measuring the similarity between two distributions. Developed by Solomon Kullback and Richard Leibler for public release in 1951, KL-Divergence aims to identify the divergence of a probability distribution given a baseline distribution. That is, for a target distribution, P, we compare a competing distribution, Q, by computing the expected value of the log-odds of the two distributions:

Here, we used the one-dimensional notion, the two-dimensional notion is similar; just use a double integral with t := (x,y) and dt := dxdy. It’s obvious that if the two distributions are identical, then the integral is zero.

Also, with a little bit of work we can show that the KL-Divergence is non-negative. Meaning, that the smallest possible value is zero (distributions are equal) and the maximum value is infinity. We obtain infinity when P is defined in a region where Q can never exist. Therefore, it is common to assume both distributions exists on the same support.

### Very Brief History

The KL-Divergence is a technique that spawned from research performed at the National Security Agency. Richard Liebler, who would eventually become the Director of Mathematical Research, and Solomon Kullback, who then focused on COMSEC operations, developed the methodology while analyzing bit strings in relation to known coding algorithms. The aim was to identify shared information in effort to exploit weaknesses shared between known crypto-algorithms and crypto-algorithms in the wild. Since its public release, KL-Divergence has been used extensively across many fields; and still is considered one of the most important entropy measuring tools in cryptography and information theory.

### Application to Shot Charts

If we apply KL-Divergence to shot charts, we can immediately begin to compare the spatial representation of the two shooter’s tendencies. To do this, we must build a quadrature to estimate the integral from the KDE. This is a relatively straightforward method that can be exploited using the scipy.integrate.dblquad package in Python, or crudely using the midpoint rule. Either way, the answers are similar. Just be sure to assign the shot charts to be numpy arrays.

For the case of Brook Lopez and PJ Tucker, we obtain a KL-Divergence of 0.0929. This is a relatively small KL-Divergence, but it could be smaller! Let’s compare this to Rudy Gobert of Utah. As Gobert rarely shoots three point attempts, we expect a much larger KL-Divergence. In fact, the divergence of Gobert from Tucker is 47.5551!

Immediately, we gain an idea of differentiation between the players’s shot location tendencies. In order to identify where players differ, all we need to do is look at the integration process; exactly like we did with the Chi-Square Test above! And it’s here that we see it’s specific locations that we mentioned above that differ between Lopez and Tucker.

### Interpreting KL-Divergence

Now that we know how compute KL-Divergence, we need to understand what it is telling us. First, KL-Divergence is not a metric! A metric, by definition, is a measurement function that satisfies three conditions: symmetry, non-negativeness with equality at zero, and the triangle inequality. KL-Divergence only satisfies the second condition. Due to this, we call it a divergence instead of a measurement.

Since the divergence is not symmetric, we must specify the baseline distribution. This distribution is Q. This seems counter-intuitive since the expectation is taken with respect to P. But there’s a simple explanation for this.

We think of Q as prior knowledge. Either a known cryptosystem in 1945, or a current player of interest. We then introduce a new observation: a new bit sequence or a new player. Now, given knowledge of the current player, how “alike” is the new player to the old? In order to understand the new player, we consider the new player as new information introduced to the old player. Therefore, the new player is a posterior distribution. If the posterior does not change, then the new player is exactly the same as the current (prior) player.

Therefore, the 0.0929 indicates how much PJ Tucker diverges from Brook Lopez in shooting frequency.

Now… that’s not so much the intelligence part. Let’s get to that.

## Modeling Player Decision Making

### In BLUE situations…

We can leverage the KL-Divergence in an effort to understand changes to offensive schemes and reaction to defensive maneuverings. The most explosive revelation leveraging KL-Divergence is measuring field goal attempts with respect to BLUE action. That is, when perimeter defenders in PnR situations move to a seemingly unfavorable defensive position in an effort to divert the PnR into a favorable defensive match-up. This past year alone, BLUE situations on the left wing led to a KL-Divergence of 10.373 when compared to non-BLUE situations. That’s almost entirely generated off the changes in shots becoming left-wing / left-wing  in BLUE situations versus right-side/at-rim from middle-lane location in non-BLUE attempts.

### … and Perimeter Defense…

We can also begin to analyze changes in shot frequency, a bane for understanding perimeter defenders. Using the KL-divergence, we can start measuring the changes in frequencies due to close-outs and quality perimeter defenders to help understand when teams are not taking the three they usually take. Granted, we cannot simply use defensive three point shooting as a metric and we certainly cannot use simple frequencies of shooting (they’re too few in a game). But we can build a distribution and measure the KL-divergence, which helps borrow strength from nearby field goal locations and allows us to start asking which features lead to changes in KL-Divergence.

In doing this, for this given year, you’ll immediately start seeing the defensive differences in two former Spurs: Danny Green and Jonathon Simmons. One being significantly “better” at perimeter defense than the other.

### … to Personnel Used

Similarly, if an offense uses a PnR action that leads to a rim-running event, where are the field goal attempts likely going to be generated. If DeAndre Jordan is swapped with Enes Kanter, we will see a ridiculously different result. This indicates that the same action with different personnel yields different results. We can peel back the integral and see exactly where the spatial locations vary and understand how those locations impact the divergence.

Combining this knowledge with those players’ efficiencies, and we start gaining insight of where we want to push the ball on defense. And, more importantly, how we might want to rotate on defense.

### But Note:

Remember though that changes in KL-Divergence does not mean good or bad. It simply means change. It’s not a target variable, but rather a methodology to quickly run through several iterations of teams and players, giving insight as to which players are similar in which situations and which teams are similar in others in certain situations, and even (if applied to a same team) how a team makes adjustments over the course of the game.

To gain insight of good or bad, we must then build the analytical model that identifies good and bad. Be it an expected point value, or some other win-shares type action.

## 2 thoughts on “An Example in Kullback-Leibler Divergence”

This site uses Akismet to reduce spam. Learn how your comment data is processed.