Curious Tale of 3's Versus 2's in the NBA - Squared Statistics: Understanding Basketball Analytics

Over the last decade, the subtle changes from offenses revolving around mid-range jump shots to long-range three point attempts have become much more explicit as teams such as the Houston Rockets had come off a 2017-18 campaign launching over 50% of their field goal attempts from beyond the arc. The mathematics is simple: “3 > 2” and “Shooting 40% from the mid-range is the same as shooting 26.67% from beyond the arc.” While we can’t argue with the first point, the second point is actually nuanced. Let’s consider the actual argument for the latter comment.

Effective Field Goal Percentage (eFG%)

Effective field goal percentage (eFG%) is a correction made on traditional field goal percentage in an attempt to adjust for three point attempts. The motivation is fairly straightforward from a traditional probabilistic argument:

Suppose a player shoots m-for-n from two point range and p-for-q from three point range. The player then has a (m+p)/(n+q) FG% and a (m+1.5*p)/(n+q) eFG%.

If we multiply FG% by the value of 2, we obtain 2(m+p) points; which is not the correct number of points. However, if we multiply eFG% by the value of 2, we obtain 2(m + 1.5p); which is indeed the correct number of points.

To completely illustrate, consider the 131-117 Memphis Grizzlies victory over the Atlanta Hawks on Friday, October 19, 2018. In this game, Jaren Jackson Jr. shot 8-12, 2-4 for a total of 18 points. Using the eFG% formula above, Jackson shot (8 + .5*2) / 12 for 9/12; or .75. Multiplying this by 2*12, we obtain 18 points.

The reason we make adjustments such as eFG% is to better encapsulate the game. It is indeed true that a three point field goal is worth 50% more than a two point field goal; therefore it makes sense to weight it as such. However, this leads to a rather nuanced question in reverse order: If a player shoots 40% from the three point line, how many points do we expect that player to score after N attempts?

Expected Point Values

The result leads us to what is called an expected point value; or EPV. EPV is a common term that models points scored using a probability distribution. It can either be complex such as the models shown at the Sloan Spots Analytics Conference every so often. Or it can be something primitive such as looking at end-state results; as example: shooting distributions / shot charts. Regardless, for a particular possession, we ask what is the expected number of points scored. For the shooter who shoots 40% from the three point line? Well, we expect him to score 1.2 points per possession.

The faux pas that occurs here is that analysts tend to get into moment matching and immediately suggest that if the shooter instead takes a two point attempt, then they must shoot 60% to match the productivity of the three point shooter. This makes sense as the expected point value for this field goal attempt is now 1.2 points per possession. Hence we have a tie.

The only problem is, a shooter can never score 1.2 points per shot. This is a major misconception. We say a misconception; as the remainder of this post will even prove to you that by hedging bets to enforce 120 points per 100 possessions; the two distributions identifies a that one shooter is indeed better than the other… despite the exact same expected point values.

Statistical Set-Up

To start, let’s consider a team that only shoots three pointers, where each possession is independent of the last, and they are stable shooters. Similarly, we have the identical set up for a two point shooting team. We then pit these teams against one another, impose that neither team turns over the ball, gains no offensive rebounds, and no fouls occur. We also impose that the teams have the exact same number of possessions. And yes, we impose that the team’s EPV’s are identical.

Due to equivalency of EPV’s we have that the two-point team shoots P percent, while the three point team shoots 2*P/3 percent. For example, if a two-point team shoots 60%, the three-point team shoots 40%.

From the statistical set-up above, we note that each team follows their own Binomial distribution with N trials and respective probabilities of success being their field goal percentages. For simplicity, let’s set N = 100 possessions. The distributions for the number of field goals made by each team are given by

Now if these two teams play, we can keep track of the score. Let X be the number of two point field goals made. Let Y be the number of three point field goals made. Then the final score after 100 possessions is then 2X – 3Y.

Under this transformation, we are now able to start writing the probabilities of every score possible in the game. Let’s illustrate this with the three simplest scenarios.

One Possession Game

Under a one possession game, we change N from 100 to 1. In this case, each team gets one shot at scoring. This results in only four possible outcomes for the final score X – Y:

0 – 0: Tie
0 – 3: Three Point Team Wins
2 – 0: Two Point Team Wins
2 – 3: Three Point Team Wins

In the first case, both teams miss their attempts. This is a 40% chance for Team X and a 60% chance for team Y. This means we have a 24% chance of resulting in a tie. Similarly, Team X has only one chance to win the possession, they make theirs and stop their opponent from scoring. In this case, we have a 60% chance of a made two point field goal and a 60% chance of a missed three point field goal, resulting in a 36% chance of Team X winning the possession.

Unfortunately for Team X, Team Y has two options of winning the possession. All they need to do is score on their one possession. In this case, they have a 40% chance of scoring. If we’d prefer to carry out the full Binomial structure, we have .4*.4 + .4*.6, which is .16 + .24 = .40. Any which way we do the math, Team Y (Three Point Team) is favored to win the possession. This is despite the the expected point values being identical!

What this tells us, at a cursory level, is that three’s are better than two’s in the single possession. But what about multiple possessions?

Two Possession Game

In the two possession game, we end up with NINE possible outcomes in the game:

0 – 0: Tie
0 – 3: Team Y wins
0 – 6: Team Y wins
2 – 0: Team X wins
2 – 3: Team Y wins
2 – 6: Team Y wins
4 – 0: Team X wins
4 – 3: Team X wins
4 – 6: Team Y wins

In this situation, we have one possibility for a tie, five possibilities for the three point team to win, and three possibilities for the two point team to win. Here we see once again that the three point team has more options to win. Despite this, the probabilities tell a different story.

Ties: Once again we only have one way to end up tied. The probability is smaller than in the one possession scenario, we now the teams must miss more attempts. In this case, we have .6^2 for the three point team to miss both attempts and .4^2 for the two point team to come away empty handed. In this case, we end up with a .36 x .16 = .0576 chance of a tie. That’s only a 5.76% chance; drastically reduced from the 24% chance in the one possession scenario.

Twos: For the two point team wins, we have only three scenarios to work with. The first case is 2 – 0, which requires the two point team to make one basket and the three point team to miss all theirs. In this case, we have two different ways for the two-point team to make their basket: make the first or make the second; but not both. Therefore the probability of a 2 – 0 victory is 2*.4*.6*.6^2 = .1728. That’s a 17.28% chance of the score being 2 – 0 after two possessions each. Similarly for 4 – 0, we have a 12.96% chance; and for 4 – 3, we have a 17.28% chance. In total, the two point team leads with a probability of .4752.

Doing the math, this means that the three point team (Team Y) only has a .4672 chance of winning. This indicates that taking the two point attempts are more beneficial than taking two three point attempts; therefore making two point attempts more valuable.

Three Possession Game

If we breakdown one more small possession, we can consider a three possession game, which results in 16 different outcomes. To save space, we will leave it as a homework exercise for you to prove the probabilities for each

0 – 0: Tie (.013824 chance)
0 – 3: Team Y wins (.027648 chance)
0 – 6: Team Y wins (.018432 chance)
0 – 9: Team Y wins (.004096 chance)
2 – 0: Team X wins (.062208 chance)
2 – 3: Team Y wins (.124416 chance)
2 – 6: Team Y wins (.082944 chance)
2 – 9: Team Y wins (.018432 chance)
4 – 0: Team X wins (.093312 chance)
4 – 3: Team X wins (.186624 chance)
4 – 6: Team Y wins (.124416 chance)
4 – 9: Team Y wins (.027648 chance)
6 – 0: Team X wins (.046656 chance)
6 – 3: Team X wins (.093312 chance)
6 – 6: Tie (.062208 chance)
6 – 9: Team Y wins (.013824 chance)

Again we see Team Y with the upper hand in number of outcomes with nine winning outcomes; but once again they are on the short end of the stick with a probability of winning being only .441856; a meager 44.1856%. Team X’s chances? .482112, or 48.2112%. Four full percentage points more likely to win over a three possession game.

More Possessions!

Now the argument goes that the more possessions we have, the closer the probabilities of winning after N possessions become. And that’s correct to a point. In fact, the age-old Central Limit Theorem tells us that if we have approximately 30 observations, then we can throw away the Binomial distribution and start using the trusty Gaussian distribution for estimating probabilities of winning. Let’s just jump straight into 100 possessions. To start, let’s simulate 10,000 games:

[sourcecode language=”python”]

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import math
import random

sn.set(color_codes=True)

def nCr(n,r):
f = math.factorial
return f(n) / f(r) / f(n-r)

twowins = 0
threewins = 0
twoscores = []
threescores = []
numPoss = 100

for i in range(10000):
twoscore = 0
threescore = 0
for j in range(numPoss):
r1 = random.random()
r2 = random.random()
if r1 < .60:
twoscore += 2.
if r2 threescore:
twowins += 1.
if twoscore < threescore:
threewins += 1.
twoscores.append(twoscore)
threescores.append(threescore)

sn.kdeplot(np.array(twoscores))
sn.kdeplot(np.array(threescores))
plt.title("Comparison of 100 possessions of 2's and 3's")
plt.xlabel("Points Scored")
plt.ylabel("Frequency")
plt.show()

[/sourcecode]

Note: There’s an error being caused by WordPress. For some reason, line 27-30 deletes themselves. Here’s a screenshot of the actual code:

The result gives us the following plot:

Screen Shot 2018-10-21 at 12.53.54 AM.png

Distributions of points scored by Team Y (Green) and Team X (Blue) over 100 possessions. Central Limit Theorem is trying to take hold.

We indeed see that the Central Limit Theorem is indeed trying to take hold as the expected point value is hovering right about 120 for both teams. Despite 10,000 simulations, we still see that the three point distribution is considerably shaky while the two point distribution, while apparently tighter and smoother, is slightly biased beyond 120 points; more specifically to the right of the three point distribution. While we may claim this is merely a sampling issue (IE: run another application, get a shaky graph that be biased in the other direction), we can simply just compare the distributions directly using the binomial distributions.

To do this, we can write a script to compute the probabilities. All we need to do is properly write the mathematical equation for a two point team defeating a three point team in 100 possessions.

Screen Shot 2018-10-21 at 1.17.11 AM.png

Warning! Math!

And we can turn this math into a couple lines of code:

[sourcecode language=”python”]

prob = 0.
probtie = 0.

for k in range(numPoss+1):
for l in range(int(np.ceil(2.*k/3.)),numPoss+1):
if l > 2.*k/3. :
part = nCr(numPoss,l)*nCr(numPoss,k)*.4**(numPoss-k+l)*.6**(numPoss-l+k)
prob += part
else:
part = nCr(numPoss,l)*nCr(numPoss,k)*.4**(numPoss-k+l)*.6**(numPoss-l+k)
probtie += part

[/sourcecode]

Running this code, we find that over 100 possessions, the two point team is favored to win with a probability of .4907 compared to the three point team’s chances of .4867. In fact, we wont see these numbers converge within .0001 until we get to upwards of hundreds of possessions; a near impossible feat in the NBA. Central Limit Theorem be damned.

Running this over every possession, we are able to see how convergence works.

Comparison of the probability of winning for Team X (Blue Line) compared to Team Y (Red Line). Probability of a tie runs quite low (Green Line).

Combating Phenomenon: Two’s are “better” than Three’s But…

Performing the math above, we find explicitly that taking equivalent expected point value field goals, we end up favoring the team that takes two point field goals. However, there are situations were taking two point field goals are excessively worse than taking three point attempts. As example, down ten points with four possessions remaining.

In this situation, the two point team is guaranteed to lose. Whereas the three point team musters a .00065536 chance of winning the game. It’s not great, but it’s better than Team X’s chances!

Real Estate is Key: Elephants.

The elephant in the room during this argument that has not been discussed as of yet is the practicality of the statistical model. While we started with the argument that 60% from two is equivalent to 40% from three and then proceeded to be proven that this is not true thanks to Central Limit Theorem assumptions failing and discreteness creeping in; the question remains: Are teams really likely to shoot 50% better from two than from three in games? The short answer is, not really.

In fact, taking glances at any shot chart, we find that teams tend to shoot sixty percent from within three feet of the hoop. Therefore, simply forcing teams to shoot more than three feet out while maintaining a near-40% clip from three will almost guarantee victory for a team. And there’s one team that does that: Golden State Warriors.

Screen Shot 2018-10-21 at 1.45.36 AM.png

Basketball Reference Table: Displaying 2017-18 NBA Shooting Percentages Based on Distance.

It actually becomes quite interesting comparing the frequency and efficiency of each team. We can start mangling up the Binomial distribution to impose percentage of shots from certain locations; better representing their probabilities of winning. Instead of going down this rabbit-hole, we instead take the note that there’s effectively only a 14 square foot region where teams shoot 60%; while there is approximately a 250 square foot region where teams shoot upwards of 40%.

So while it is essentially easier to get three point attempts, it must be known that similar efficiency, when limited to similar frequency actually leads to deficits. Quite the interesting concept when considering scoring strategy within the NBA; as equal EPV does not imply equal probability of winning.

	Paul Keane on 1990 – 1991 NBA RAPM
	Playoff Success of 8… on Analyzing NBA Possession Model…
	The Historical RAPM… on Historical RAPM: 1985 –…
	The Historical RAPM… on 1995-1996 NBA RAPM
	The Historical RAPM… on 1991-92 NBA RAPM

Squared Statistics: Understanding Basketball Analytics

Possession-level analytics for the pre-play-by-play NBA era. Historical RAPM data, 1985–1996.

Curious Tale of 3’s Versus 2’s in the NBA

Effective Field Goal Percentage (eFG%)

Expected Point Values

Statistical Set-Up

One Possession Game

Two Possession Game

Three Possession Game

More Possessions!

Combating Phenomenon: Two’s are “better” than Three’s But…

Real Estate is Key: Elephants.

Like this:

Related

4 thoughts on “Curious Tale of 3’s Versus 2’s in the NBA”

Leave a ReplyCancel reply

Effective Field Goal Percentage (eFG%)

Expected Point Values

Statistical Set-Up

One Possession Game

Two Possession Game

Three Possession Game

More Possessions!

Combating Phenomenon: Two’s are “better” than Three’s But…

Real Estate is Key: Elephants.

Share this:

Like this:

Related

Related posts

4 thoughts on “Curious Tale of 3’s Versus 2’s in the NBA”

Leave a ReplyCancel reply

Discover more from Squared Statistics: Understanding Basketball Analytics