A couple years ago, an article by Bill James on ‘How Safe a College Basketball Lead’ is given a point differential between two NCAA basketball teams was published. This article identified a method for estimating the number of seconds a lead is safe. The algorithm is given as follows:
- Take the differential and subtract three points.
- If the team in the lead has the ball, add 0.5. Otherwise, subtract 0.5.
- Take this number and square it. This is the number of seconds a lead is safe.
Let’s look at an example. Suppose a team is up by seven points and the opponent has the ball. Then we calculate (7 – 3 – 0.5)*(7 – 3 – 0.5) = 3.5*3.5 = 12.25. This suggests that the seven point lead is only safe for 12.25 seconds. Which is not too unreasonable. Remember, this is for determining a guaranteed win, not a probabilistic win. Therefore, the next 12 seconds is supposedly, according to Bill James, never going to see a lead change.
Let’s apply this formulation to all leads in the 2015 – 2016 NBA season and see how well this discriminant function works. For the 2015-2016 NBA season, we removed all instances when teams are tied (no leads) and all instances where games ended; as no team can come back after a game is over. This resulted in over 61,000 possessions where a team had the lead.
Is It Really True?
For the 60,000+ possessions, we found that there were two instances where a team had a lead that was deemed safe and ended up losing the lead before the expected time. These two games were the following:
- January 4th, 2016:
- Oklahoma City Thunder up 15 over the Sacramento Kings; Kings Ball
- 15 Point Lead = 132.25 seconds
- Lead lost in 122 seconds.
- Game in Oklahoma City.
- First Quarter: Kings 16, Thunder 31 with 2:38 to go.
- First Quarter: Kings 31, Thunder 31 with 0:36 to go. (122 seconds elapsed)
- Kings won 116 – 104
- Oklahoma City Thunder up 15 over the Sacramento Kings; Kings Ball
- April 13th, 2016:
- Philadelphia 76ers up 21 over the Chicago Bulls; Bulls Ball
- 21 Point Lead = 306.25 seconds
- Lead lost in 278 seconds.
- Game in Chicago.
- Second Quarter: 76ers 60, Bulls 39 with 2:17 to go.
- Third Quarter: 76ers 61, Bulls 61 with 9:39 to go. (278 seconds elapsed)
- Bulls won 115 – 104.
- Philadelphia 76ers up 21 over the Chicago Bulls; Bulls Ball
Distribution of Time Needed to Make a Comeback
As we see, this discriminant is actually not too bad. So let’s take a look at a random sample of leads given up over the course of the NBA season. Since there are over 61,000 such events, the memory required to plot the entire season is a little large for a single processor. Instead, we plot ten percent of the scores at random. In the case one of the two games are selected, we would find these points underneath the discriminant line. Otherwise, all points should be above the discriminant.

Ten percent sample of all lost leads in the 2015-2016 NBA season. Right side of the zero line is a lead lost by a visiting team. Left side of the zero line is a lead lost by a home team. Vertical axis is time in seconds; Horizontal axis is point differential from visiting team perspective.
From the figure, we see the Philadelphia 76er – Chicago Bulls game just below the discriminant line at the 21 point lead mark. However, we see all the scores filling in this curve exceptionally well. The blue vertical line merely is the zero line separating the home team and visiting team halves of the plot.
Can We Do Better?
So now that we see the data, can we do better? The short answer is… not really. Losing two values in over 60,000, while not a guarantee, is still over 99.99% accurate, which is effectively guaranteed without any catastrophic sequence of events.
Let’s take a look at the Kings – Thunder game. DeMarcus Cousins started off the come-back by getting fouled by Steven Adams and converting both free throws. After a Enes Kanter miss, Cousins rebounded the ball and came down to hit a three point attempt. After another Thunder miss, by Cameron Payne, Cousins dunked off a Kosta Koufos assist and through a Payne foul; cutting the lead from 15 to 7.
A Dion Waiters turnover led to a Rajon Rondo lay-up and a Thunder time out. After a trade of missed jumpers, Koufos was fouled by Kyle Singler, leading to a pair of free throws. After another Kanter missed field goal attempt, Belinelli knocked down a three, eliminating the lead.
Over this sequence of events, the Kings were 4-6 FG, 2-3 3PG, 5-5 FT, 2 OREB, 4 DREB, 0 TO; while the Thunder were 0-4 FG, 0-0 3PG, 0-0 FT, 0 REB, 1 TO.
So the question is, how can we do better? The answer is to use an advanced machine learning technique. For instance, we can use a multi-layer neural network that uses back-propagation to estimate a regression discriminant. This method will learn the same discriminant in a neighborhood about 0; but will become sharper as the lead is much larger; where the current method is conservative. However, we only recover an extra 15 seconds for 30+ point leads. Hence, the reason we mentioned that we can’t really do much better.
Can we turn this into a probability?
Heck yes we can. For the random sample, we effectively have a distribution of times until the lead is dissipated. Hence, we can ask the following question: With a lead of 10 points, what is the probability that we will have the lead after 120 seconds?
The lower bound, given by the discriminant is 42.25 seconds. We find that a ten point lead typically lasts more than 500 seconds (8 minutes, 20 seconds). And we can use the column in the figure above to estimate the distribution and count the probability accordingly. In this case, a team has a 97.4% chance of leading after two minutes with a ten point lead.
So how did Bill James do? The discriminant process looks pretty nice, but as we see… it does not guarantee that the lead will be held that long. But then again, this process was developed for NCAA basketball, which has a longer shot clock that 24 seconds.
If you’d like a copy of the code, feel free to send a message!
I really enjoyed this post! I’m an amateur R user/professional NBA fan (ha) and I think it would be fun to see this code. Would it possible for me to obtain a copy?
LikeLike
Sure thing! I’ll send you a sample data file and how to extract out the leads and plot in Python.
Send an e-mail over to jjacobs1@umbc.edu with the subject: NBA Leads Code and I’ll send the copy as soon as I get it!
LikeLike
Hi, great article, very interesting for me considering that I am a big NBA fan and have an interest in data science. I wanted to request if I could get a copy of the dataset to play around with?
LikeLike
Pingback: How Safe is an NBA Lead in the 2016-17 Season | Squared Statistics: Understanding Basketball Analytics