Hypothesis Testing: Is NBA Scoring Up This Year?

So far this season, the NBA has played 38 total games of the season’s 1,230 possible games. For these 38 games, we observe a total of 76 game scores across all teams. The observed 76 scores yielded some high results: 139, 136, 134, 132, 122, and 120; as well as some low results: 85, 78, 76, 75, 71. 39 of the 76 scores (51.32%) resulted in triple digits. Furthermore, 26 of the 76 scores (34.21%) resulted in 110+ points.

Comparing this feat to last year, the high score for the first 38 games of the 2014-15 NBA season was 127. Last year we saw 3 scores above 120 points, 6 more scores above 110 points, and exactly 50.00% of games (38 total) of 100 points or more. On the other end of the spectrum, there were 7 sub-85 point scores; only two more than this year.

So the question is… is NBA scoring up this year?

The simplest measure to test whether scores are up or down is to use the sample mean and sample variance. The sample mean is a measure of location that finds the “center” of the distribution of scores. In fact, the reason the sample mean exists is so that we can find a center that minimizes the squared distances of data about a single point. There are competing measures out there too, like the median, which minimizes the absolute distances of data about a single point.

From these simple measures, we find that the average points per game for an NBA team in the 2015-16 NBA season is 102.4605 points. For the 2014-15 NBA season, the average is 98.8553 points. So by this simple measure, scoring is up by 3.5 points. Using the median, we currently have 102 points per game versus the 99.5 points per game for the 2014-15 NBA season. Therefore, under the median measure, scoring is up 3.5 points.

Now these are only measures of center. This tells us what “center” of the distribution is doing. This in no way indicates how large swaths of teams are doing, but rather the middle of the pack. To encapsulate groups of teams while trying to use measures of center is to identify the spread of the distribution of scores. For the sample mean, this is merely the sample standard deviation. For the sample median, this is the interquartile range.

The sample standard deviation for the 2015-16 NBA season is 13.5375 points per game, while the sample standard deviation for the 2014-15 NBA season is 11.1615 points per game. The sample standard deviation serves as a “measuring stick” for identifying how likely it is for the data is to be sampled relative to the mean while sampling from the respective distribution. We make a note here to say that our scores will be distributed in a unimodal fashion, centered somewhere near the mean. This note makes sure our standard deviation comment is actually correct.

Using the standard deviations over 38 games, we find that the 1.96-standard deviation range for the mean of the 2015-16 NBA season is from 99.4169 points to 105.5041 points. For the 2014-15 NBA season, it is from 96.3459 points to 101.3647 points. There’s a specific reason we used the value of 1.96; we’ll get to that soon. In the interim, we note that the intervals overlap! The mean of the 2014-15 NBA season is barely outside of the range of the 2015-16 NBA season interval, which would indicate that scoring is indeed up this season. But let’s take a look at the corresponding test of equality.

studentized t-test identifies whether two means are the same*. To perform this test, we take the two means and subtract them. Here we have mean(15-16) – mean(14-15) = 3.6053 points. If this number is significantly positive; that is, far away from zero, then the mean score of the 2015-16 NBA season is statistically larger than the mean score of the 2014-15 NBA season. We must, therefore standardize our data so that they are on the same scale.

To standardize, we divide by the standard deviation of the difference of means. That sounds like a lot, but it’s not. We assume that scores from last year do not reflect scores from this year, and vice-versa. We then take the variances for each mean (21.0219 for this season; 14.2902 for last season) and add them: 35.3121. This then gives us a standard deviation of 5.9424. Hence the t-statistic is 0.6067. So… is this far enough away from zero?

To check this, we assumed that *the scores come from a normal distribution. Inverting this score into a point on the normal distribution’s cumulative distribution function (probability distribution), we obtain a value of 0.7280. Since we are testing if the current scores indicate an increase compared to last year, we have a p-value of 0.2720. This is a large p-value! This indicates that there is really no difference in scoring between the two seasons, despite the first look measures suggest otherwise.

Is Our Data Really Gaussian?

The assumption we made was that our data can be modeled as a Gaussian (normal) distribution. How do we know if this data is Gaussian? A Gaussian distribution is a parametric distribution. That is, the probability model is defined through its parameters. Gaussian data is centered at the mean, and has a very specific way for tailing off that involves values called variance, skewness, and kurtosis. A t-test is then a comparison of two Gaussian distributions where the true variances are not known. How do we know if our data is really Gaussian?

The first thing we can do is look at the histogram of the scores for each season. When we put the histograms side-by-side, we see that there really doesn’t seem to be a large difference in scores. In fact, the 2014-15 NBA season appears to be quite Gaussian, where the 2015-16 NBA season looks to be skewed with some weight above the 130 point mark. So the question is whether or not the 2015-16 NBA season meets the Gaussian requirements for a t-test.

Histograms of the distribution of 76 scores from the first 38 games of the 2015-16 NBA season (Green) and the 2014-15 NBA season (Red).

Histograms of the distribution of 76 scores from the first 38 games of the 2015-16 NBA season (Green) and the 2014-15 NBA season (Red).

To check for normality of data, we can take a look at a QQ-Plot of the data. This is a Quantile-Quantile plot of the data. This plot takes the quantiles for each data point and plots them against the theoretical quantiles of the Gaussian distribution. If these two plots match relatively well, then it is reasonable to suggest that the data follows a Gaussian distribution.

QQ-Plots for scores from the first 38 games of the 2014-15 NBA season (left) and the 2015-16 NBA season (right).

QQ-Plots for scores from the first 38 games of the 2014-15 NBA season (left) and the 2015-16 NBA season (right).

The QQ-Plots look reasonably well. So how doe we measure goodness-of-fit for these fits to the Gaussian distribution? To do this, we can look at a plethora of tests that use various aspects of the data to measure goodness-of-fit.

First there is the Kolmogorov-Smirnov goodness-of-fit test that uses the maximum difference between the cumulative distribution function of the theoretical normal distribution and the empirical distribution function generated by the dataset. This yields excessively conservative results as p-values are both above 0.5000; suggesting that both distributions come from a Gaussian distribution.

There is also Liliefors test that expands on the Kolmogorov-Smirnov tests by checking all Gaussian distributions, as opposed to requiring the data to be normalized; as in the Kolmogorov-Smirnov test. Here, the p-values fall down, but only to about 0.2500; suggesting that both distributions come from a Gaussian distribution.

We can also perform a Chi-Square test, which measures cumulative squared deviations of standardized data from their relative points on the Gaussian distribution. Here we are no longer looking at a maximum, single point test. Instead we are considering the entire data set; point by point. These tests are not as conservative and yield even smaller p-values of 0.12.

Similarly, the Anderson-Darling goodness-of-fit test is another data point comparison test that yields a 0.22 p-value for the 2014-15 scores and a 0.11 p-value for the 2015-16 scores.

In academic environments where data is simulated and theoretical distributions are used, typically p-values of 0.01, 0.05, and 0.10 are determined to be reasonable p-value cut-offs. In applications that require intensive data collection processes, such as mechanical engineering, political polling, aeronautics, printing presses (etc.), p-values are commonly thresholded at 0.20 or 0.30, depending on the application. So if we follow common practice standards, it is reasonable to suggest that the 2015-16 NBA season data does not entirely follow a Gaussian distribution.

So the t-test could be argued away. Now what?

All we have done is identify that the parametric test used may not meet the sufficiency criteria of the tests. Instead, we can assume no parametric tendencies in the data and apply a nonparametric test. In fact, nonparametric tests assume that the data do not follow any distribution and allows us to make several comparisons. However, one caveat is that these tests tend to be a little more conservative than their parametric counterparts.

To see how a nonparametric test works, we will compute a Wilcoxon-Mann-Whitney Sum-Rank Test. This is a neat little test that orders the two datasets and counts the number of times a data point from one set is larger than each data point in the other set.

One small example is assume that two groups of four kids fun a distance. These are their times: (40,40,39,37) and (41,40,32,31). Which group is faster? For group one, we see that the data point 40 is greater than two other points (32,31), and tied with the other (40). This gives a sum of 2.5. For the point 39, we see it’s larger than two points (32,31). Hence the ranks for group one is (2.5, 2.5, 2, 2). Their sum is 9. Repeating this for the other group, we get ranks of (4, 3, 0, 0). Their sum is 7. We check we did this right by adding the two sum-ranks to get 16. If this is the same as the product of the two group sizes (4×4 is 16), then we distributed the scores reasonably well. The resulting p-value is 0.9143; which is a large amount to evidence to suggest the two running groups are from the same distribution; and therefore their is no speed difference between the two groups as a whole.

Using this test, we obtain a p-value of 0.0844 for the two NBA scoring distributions. Since nonparametric tests are fairly conservative, this is a relatively strong piece of evidence to suggest that the scoring distributions are not the same. This would then suggest that there is a slight increase in the scoring distributions in the NBA 2015-16 season so far.

So let’s recap our measures and tests:

  1. Sample Mean: Scoring is up this year.
  2. Sample Median: Scoring is up this year.
  3. Studentized t-Test: Scoring is same as last year.
  4. Rank-Sum Test: Scoring is up this year.

It looks like that scoring is indeed up so far this season. However, the season is only three percent complete and things can drastically change over the next six months. How do you feel about the scoring this season? If you happened to catch the 139-136 double overtime game between the Oklahoma City Thunder and the Orlando Magic, I hope you’re excited for this season as I am.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s