Visualizing NBA Summary Statistics Using Radar Plots

A simple method of visualizing NBA offenses and defenses is through the use of summary statistics. This gives us a look at the average offensive and defensive game for any given NBA team. For sports such as MLB, NFL, and NBA, this gives us a representative glimpse at a team’s ability to play. In other sports where the schedules do not well-cover the entire league, such as NCAA basketball and football, we may not want to look at averages.

In the NBA, the schedule well-covers the entire league at a minimum of two games. Thus statistics such as average points per game, average three point field goals, and average number of rebounds makes sense to compare teams. A common set of summary statistics are then

  1. Field Goals Made
  2. Field Goals Attempted
  3. Three Point Field Goals Made
  4. Three Point Field Goals Attempted
  5. Free Throws Made
  6. Free Throws Attempted
  7. Rebounds
  8. Steals
  9. Assists
  10. Turnovers
  11. Blocks
  12. Points

It is common for folks to display percentages for teams instead of counts, but this does not represent the number of attempts well. For instance in a simple example of a game of all two-point field goal attempts, if a team shoots 20-for-40 compared to a team that shoots 21-for-50; the team with the lower field goal percentage wins.

One simple visualization is to simply place numbers side by side. This results in a 30-by-12 table where the rows are Teams and the Columns are statistics. The problem with this display is that it is un-sortable. Teams that leads the league in average points may not lead the league in average rebounds. Thus, the only way to sort teams is by an interactive table, which is what stats.nba.com uses. Unfortunately, for each new sorting, the previous sorting is lost.

Another method of visualizing data is through the use of profile curves. In this cases, we take a bar chart and place a curve that matches each count for each statistic. This gives a better glimpse at comparing two teams without having to scan all over a table or constantly sorting. As an example, let’s look at the Golden State Warriors’ statistics over the course of the 2015-16 NBA season.  Here, the Warriors’ offense is in team color yellow and their defense is in red.

Warriors

Profile Curve for the Golden State Warriors over the 2015-16 NBA Season. Offense statistics is in yellow. Defense statistics is in red.

We can then compare the offense and defense by comparing the lines. To compare teams, we can layer lines on the same graph. The ability of a team is dependent on the integral (area under the curve) of the profile curve. Large positive values are good; small negative (turnovers) are bad.

A different way to compare team profiles is through the use of radar plots or rose plots. Radar plots take the profile curves of above and wrap them in a circular format (rose plots do the same). This method gives a continuous look at the data and instead of area under the curve, we look at the spread (area in the plot) of the team statistics. For instance, the Warriors’ radar plot is given by the following.

GSW

Radar plot for the Golden State Warriors Offense (Green) and Defense (Red) for the 2015-16 NBA season.

Here, green denotes offense and red denotes defense. The values in the interior of the plot are the maximum values across all thirty teams’ offense and defensive team averages. To interpret this plot, we see that the Warriors make about 1.5 times as many three-point field goals, about 1.67 times as many blocks, and 1.33 times as many assists compared to teams they play against. As the red is contained almost exclusively inside the green, we see that the Warriors should win many more games than their counterparts. This is indeed the case.

If we change the spectrum and take a look at the Phoenix Suns, we see exactly what we expect: green primarily contained in red; indicating a losing record.

PHX

Radar plot for the Phoenix Suns Offense (Green) and Defense (Red) for the 2015-16 NBA season.

As the plots are all the same size, we can merely layer the curves to compare teams. Here, we take a look at all the teams set side by side.

In case the images are difficult to parse, we can put them into a slideshow presentation.

This slideshow requires JavaScript.

 

Python Code Implementation

To finish out the article, we figure to put in some of the code needed to generate the plots. First, you will need to have a dataset with every game logged for each team as an offense and defense.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns

This header block will call the pandas dataframe, matplotlib for the plotting tools, and seaborn for advanced color schemes. Numerical Python (numpy) is called in case we perform advanced math later.

df = pd.read_csv(‘teamOFF.txt’)
df1 = pd.read_csv(‘teamDEF.txt’)

#print df.head(30)

#print df.mean()

teams = {1:’ATL’, 2:’BOS’, 3:’CHA’, 4:’CHI’, 5:’CLE’,6:’DAL’,7:’DEN’,\

8:’DET’,9:’GSW’,10:’HOU’,11:’IND’,12:’LAC’,13:’LAL’,14:’MEM’,\

15:’MIA’,16:’MIL’,17:’MIN’,18:’BKN’,19:’NOP’,20:’NYK’,21:’OKC’,\

22:’ORL’,23:’PHI’,24:’PHX’,25:’POR’,26:’SAC’,27:’SAS’,28:’TOR’,\

29:’UTA’,30:’WAS’}

The pandas.read_csv() function reads in a csv file and turns it into a data frame. We have commented out a couple easy dataframe manipulation commands, such as head(30) and mean(). The head command displays the first 30 rows of the dataframe. The mean command displays the means for each column in the dataframe. We also build a dictionary that maps team indices to team names (our data had teams indexed by number).

d = df.shape[1]
d1 = df1.shape[1]

gp = df.groupby([‘Team’])
gp1 = df1.groupby([‘Team’])

ranges = [(0.1,float(gp.mean().max()[0])), (0.1,float(gp.mean().max()[1])), (0.1,float(gp.mean().max()[2])), (0.1,float(gp.mean().max()[3])), \

(0.1,float(gp.mean().max()[4])), (0.1,float(gp.mean().max()[5])), (0.1,float(gp.mean().max()[6])), (0.1,float(gp.mean().max()[7])),\

(0.1,float(gp.mean().max()[8])), (0.1,float(gp.mean().max()[9])), (0.1,float(gp.mean().max()[10])), (0.1,float(gp.mean().max()[11]))]
ranges1 = [(0.1,float(gp1.mean().max()[0])), (0.1,float(gp1.mean().max()[1])), (0.1,float(gp1.mean().max()[2])), (0.1,float(gp1.mean().max()[3])), \

(0.1,float(gp1.mean().max()[4])), (0.1,float(gp1.mean().max()[5])), (0.1,float(gp1.mean().max()[6])), (0.1,float(gp1.mean().max()[7])),\

(0.1,float(gp1.mean().max()[8])), (0.1,float(gp1.mean().max()[9])), (0.1,float(gp1.mean().max()[10])), (0.1,float(gp1.mean().max()[11]))]

ranges = [max(ranges[0],ranges1[0]),max(ranges[1],ranges1[1]),max(ranges[2],ranges1[2]),max(ranges[3],ranges1[3]),\
max(ranges[4],ranges1[4]), max(ranges[5],ranges1[5]),max(ranges[6],ranges1[6]),max(ranges[7],ranges1[7]),\
max(ranges[8],ranges1[8]),max(ranges[9],ranges1[9]),max(ranges[10],ranges1[10]),max(ranges[11],ranges1[11])]

variables = list(df)[1:13]
variables1 = list(df1)[1:13]

Afterwards, we group by teams to get 30 tables of offensive statistics (df) and defenseive statistics (df1). Each table will be Games-by-Statistics tables for each of the 30 teams. We then build a ranges array that searches for the maximums for each the average offense and defense statistics. This will serve as the maximum numbers in our resulting radar plot.

Finally, in this block of Python code, we extract the variable names. We do this twice to error control (for display purposes in case the variables don’t match somewhere).

for i in range(30):

a = gp.get_group(i+1)
a1 = gp1.get_group(i+1)
data = (float(a.mean()[1]), float(a.mean()[2]), float(a.mean()[3]), float(a.mean()[4]), \
float(a.mean()[5]), float(a.mean()[6]), float(a.mean()[7]), float(a.mean()[8]), \
float(a.mean()[9]), float(a.mean()[10]), float(a.mean()[11]), float(a.mean()[12]))
data1 = (float(a1.mean()[1]), float(a1.mean()[2]), float(a1.mean()[3]), float(a1.mean()[4]), \
float(a1.mean()[5]), float(a1.mean()[6]), float(a1.mean()[7]), float(a1.mean()[8]), \
float(a1.mean()[9]), float(a1.mean()[10]), float(a1.mean()[11]), float(a1.mean()[12]))

fig1 = plt.figure(i+1, figsize = (4,4))
radar = ComplexRadar(fig1, variables, ranges, ‘green’)
radar.plot(data, ‘green’)
radar.plot(data1, ‘red’)
p0 = mpatches.Rectangle((0,0),1,1,fc=’white’)
p1 = mpatches.Rectangle((0,0),1,1,fc=’green’)
p2 = mpatches.Rectangle((0,0),1,1,fc=’red’)
plt.legend([p0,p1,p2],[teams[i+1],’offense’,’defense’], bbox_to_anchor=[0.1,1.1])
radar.fill(data, alpha=0.2)
saveFigName = teams[i+1]+’.png’
fig1.savefig(saveFigName, bbox_inches=’tight’)

This block of code walks through each team and calculates the average team offensive statistics (df) and average team defensive statistics (df1). We then call a series of plotting tools, called by a radar function.

def _invert(x, limits):
“””inverts a value x on a scale from
limits[0] to limits[1]”””
return limits[1] – (x – limits[0])
def _scale_data(data, ranges):
“””scales data[1:] to ranges[0],
inverts if the scale is reversed”””
for d, (y1, y2) in zip(data[1:], ranges[1:]):
#print str(d)+’, (‘+str(y1)+’,’+str(y2)+’)’
assert (y1 <= d <= y2) or (y2 <= d <= y1)
x1, x2 = ranges[0]
d = data[0]
if x1 > x2:
d = _invert(d, (x1, x2))
x1, x2 = x2, x1
sdata = [d]
for d, (y1, y2) in zip(data[1:], ranges[1:]):
if y1 > y2:
d = _invert(d, (y1, y2))
y1, y2 = y2, y1
sdata.append((d-y1) / (y2-y1)
* (x2 – x1) + x1)
return sdata

class ComplexRadar():
def __init__(self, fig, variables, ranges, keke,
n_ordinate_levels=13):
angles = np.arange(0, 360, 360./len(variables))

axes = [fig.add_axes([0.1,0.1,0.8,0.8],polar=True,
label = “axes{}”.format(i))
for i in range(len(variables))]
l, text = axes[0].set_thetagrids(angles,
labels=variables)
[txt.set_rotation(angle-90) for txt, angle
in zip(text, angles)]
for ax in axes[1:]:
ax.patch.set_visible(False)
ax.grid(“off”)
ax.xaxis.set_visible(False)
for i, ax in enumerate(axes):
grid = np.linspace(*ranges[i],
num=n_ordinate_levels)
gridlabel = [“{}”.format(round(x,2))
for x in grid]
if ranges[i][0] > ranges[i][1]:
grid = grid[::-1] # hack to invert grid
# gridlabels aren’t reversed
gridlabel[0] = “” # clean up origin
gridlabel[1] = “”
gridlabel[2] = “”
gridlabel[3] = “”
gridlabel[4] = “”
gridlabel[5] = “”
gridlabel[6] = gridlabel[12]
gridlabel[7] = “”
gridlabel[8] = “”
gridlabel[9] = “”
gridlabel[10] = “”
gridlabel[11] = “”
gridlabel[12] = “”
ax.set_rgrids(grid, labels=gridlabel,
angle=angles[i])
#ax.spines[“polar”].set_visible(False)
ax.set_ylim(*ranges[i])
# variables for plotting
self.angle = np.deg2rad(np.r_[angles, angles[0]])
self.ranges = ranges
self.ax = axes[0]
def plot(self, data, keke, *args, **kw):
sdata = _scale_data(data, self.ranges)
self.ax.plot(self.angle, np.r_[sdata, sdata[0]], color = keke, *args, **kw)
def fill(self, data, *args, **kw):
sdata = _scale_data(data, self.ranges)
self.ax.fill(self.angle, np.r_[sdata, sdata[0]], facecolor=’#eeefff’, *args, **kw)

This radar function sorts the ranges to go from min to max and performs all the formatting and coloring. Once we bounce out of this script, we then plot the images using plt.show().

 

 

So that’s all there is to building radar plots. Try this out an let me know your thoughts. The base radar code is available on stack exchange. We can then update the code to fit the problem needed; as we have done above. Let us know what you think!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s