Schedule Strengths: Who’s Going to Win the Super Bowl?

(Warning: Math Heavy)

The NFL has an adage: “Any given Sunday.”

This phrase typically means that during any week, for any given match-up between two teams, either team has a relatively high probability of winning the game. This translates into parity within the league. Typically we should see teams win between 6 and 10 games each season. The teams that are developed properly for the long haul (New England Patriots, Green Bay Packers, Pittsburgh Steelers) tend to sit on the upper end of the spectrum; while teams with sub-par front offices and an inability to consistently analyze talent in an adequate manner (Cleveland Browns, Oakland Raiders, Jacksonville Jaguars) will tend to be near the bottom of the standings each year.

The other teams tend to have a yo-yo effect. Some years are above-average with a possibility of making the playoffs; some years are below average. Teams tend to catch into this yo-yo effect due to several reasons such as free-agency, injuries, coaching adjustments, and pure luck of the draw in weather, referee assignments, or fan interaction.

In this article, we are going to take a look at the strength of schedule as determined by the quality of opponents, and determine which teams benefited due to their schedule. For instance, which 12-4 team was truly better: Denver Broncos, New England Patriots, or the Cincinnati Bengals? According to the NFL rules of tie-breakers, the Denver Broncos were the best team. The tie breaking procedures for determining the first through third seeds of the playoffs were as follows:

  1. Apply division tie breaker to eliminate all but the highest ranked club from each division prior to proceeding. Check. All teams from different divisions.
  2. Head-to-Head sweep. Only applicable if one of the three teams played the other two and won both games. Completed: Broncos defeated Patriots 30-24 (OT); Broncos defeated Bengals 20-17 (OT); both games in Denver. Denver Obtains #1 Seed.
  3. Head-to-Head, if applicable. Not applicable. Bengals did not play Patriots.
  4. Best won-lost-tied percentage in games played within conference. Patriots 9-3; Bengals 9-3.
  5. Best won-lost-tied percentage in common games; minimum of 4 games played. Patriots (4-1; Bills x2, Steelers, Broncos, Texans) , Bengals (2-3; Steelers x2, Broncos, Texans, Bills) – Patriots obtain 2 seed.

While the rules appear to be as close to fair as possible, the Patriots schedule appeared slightly easier than that Bengals as the Patriots as the Bengals’ opponents combined record is 122 – 134 compared to the Patriots’ opponents combined record of 121 – 135. Furthermore, the Bengals had to endure the NFC West compared to the Patriots’ run through the pitiful NFC East. While the basic numbers arguments can be made for the Bengals making a 2 seed over the Patriots, one thing is clear: neither team made the Super Bowl. Furthermore nothing changes the path to the Super Bowl going through Denver with the Patriots and Bengals swapping spots; provided either team defeated the Broncos on the road during the regular season.

On a similar note, the Carolina Panthers were consistently branded a weak and untested 15-1 team. Primarily due to their schedule. The Panthers started to quickly halt the nay-sayers with a first half drubbing of the Seattle Seahawks in the Divisional Round, followed by a blow-out of the Arizona Cardinals in the NFC Championship. With a stellar defense and a well-grounded (no pun intended) read-option game; the Panthers are poised to put up a strong fight against the turbulent Denver Broncos in Super Bowl 50.

So the question is, who will win Super Bowl 50? To answer that, we take a look at how the teams played throughout the season, how their opponents played, and how the playoffs were saddled due to the NFL rules. Would a Patriots-Bengals swap mattered? Does that increase or decrease the odds of the Broncos defeating the Panthers in a little more than a week?

First, we take a look at the continuity correction method for building a probability of winning a game. If we consider n_w to be the number of wins for a team, n_l to be the number of losses for a team, and n_{t} = n_{w}+n_{l} to be the total number of games played, we are able to use some form of a Bernoulli, binomial, or beta type distribution to estimate the probability of winning a game, p.

An estimate for p is typically \hat{p} = \frac{n_w}{n_t}; which is just common winning percentages. However if a team wins their first game, they are seen as unbeatable: \hat{p} = 1.0000. We know this is usually not true. In 1934, Yates proposed a continuity correction factor for a multinomial distribution (and inherently binomial distribution with probability, p). This correction is interpreted as equation 1:

\hat{p} = \frac{n_w + 1}{n_t + 2}.

This implies that teams that have never played a game will be viewed as a “fifty-fifty” chance of winning games. As teams play, the probability of winning a random game moves in the direction of 1 or 0 in a conservative fashion.

Now that we have an (eighty year old) estimate for the probability of winning games, we then look at a clever way of writing a very simple number, identified by Colley (2002). From Colley, we can write the number of wins as

n_w = \frac{n_w - n_l}{2} + \frac{n_t}{2}.

This is the first step in writing the number of wins in a clever way. The second step is to note that

\frac{n_t}{2} = \sum_{i=1}^{n_t} \frac{1}{2}.

This states that the total number of games is the same as adding 1 n_t times. As a concrete example: 5/2 = 1/2 + 1/2 + 1/2 + 1/2 + 1/2. Now note that 1/2 is just the continuity corrected probability of winning a random game given that no games were played by that team.

If we instead substitute in the probability of team i beating team j, say \hat{p}_{ij}, then we have what’s called the strength of schedule, defined by equation 2:

n_w = \frac{n_w - n_l}{2} + \sum_{i=1}^{n_t} \hat{p}_{ij}.

Once a series of games are played, we then must estimate the probabilities of winning games. This is a difficult task as a team’s probability of winning a game against a particular opponent is dependent on the opponent’s probability of winning that game; which is estimated from previous games with unknown probabilities. So we must pre-load a system of equations with unknown probabilities and solve for the unknown probabilities using linear algebra to estimate the probabilities.

To build the system of equations, we rewrite equation 2 by adding one to each side:

n_w + 1 = 1 + \frac{n_w - n_l}{2} + \sum_{i=1}^{n_t} \hat{p}_{ij}

and then, by multiplying the left side by 1:

\left(n_t + 2\right) \frac{n_w + 1}{n_t + 2} = 1 + \frac{n_w - n_l}{2} + \sum_{i=1}^{n_t} \hat{p}_{ij}.

Note that \frac{n_w+1}{n_t+2} is the probability of team i winning a random game, regardless of opponent, \hat{p}_i. Using the notation n_{w,i}, n_{l,i}, and n_{t,i} as the number of wins, losses, and games for team i, respectively, we are able to write the system of equations as:

\left(n_t + 2\right) \hat{p}_i - \sum_{i=1}^{n_{t,i}} \hat{p}_{ij} = 1 + \frac{n_{w,i} - n_{l,i}}{2}

for i = 1, \dots, N; where N is the number of teams in the league. This is a system of N equations with N unknown variables.

That’s a lot of work, and it might even not look like an equation. But let’s consider what this looks like for one team in the league. Team i will have a probability of winning n_{w,i} games, given by n_{t,i}\hat{p}_i. But this gets adjusted for each team they play, based on the strength of schedule calculation, by subtracting the value n_{t,ij}\hat{p}_{ij}, where n_{t,ij} is the number of games played between team i and team j. Let’s think about this for a moment: The probability of winning a random game gets weakened by beating a weak opponent while the probability of winning a random game gets stronger by beating a stronger opponent. This is since a weaker opponent has a larger p_{ij} while a stronger opponent has a lower p_{ij}Teams are expected to beat weaker teams and expected to lose against stronger teams.

This is then a matrix called the Colley matrix, C, which has the values c_{ii} = 2 + n_{t,i} down the diagonal for i = 1,\dots, N and  values c_{ij} = -n_{t,ij}. The vector, d, called the average win differential, is the right hand side of the system of equations: d_i = 1 + \frac{n_{w,i} - n_{l,i}}{2}. This is merely the number of games above .500 divided by 2, with the continuity correction factor of 1 lingering. The system of equations, written in matrix notation, is then Cr = d, where r is the vector of unknown win probabilities; the vector we need to solve for!

Now, there’s a subtlety here. If there are enough games, the matrix C is invertible. In Major League Baseball, the National Basketball Association, and the National Football League this is guaranteed true by season’s end. For NCAA Division I football, this is not the case. Thus the solution for r is either through matrix inversion of C or by using a pseudo-inverse.

As a quick, comparison of simple examples, let’s take a look at the Bay Valley Conference High School Basketball intra-conference rankings:

School Name *W-L *Pct. *PF *PA W-L Pct. PF PA Strk
Freedom 4-0 1.000 274 204 15-4 0.789 1207 1042 4 W
Pittsburg 3-1 0.750 229 215 13-7 0.650 1158 1054 1 W
Heritage 2-2 0.500 282 272 16-4 0.800 1508 1212 1 L
Liberty 2-2 0.500 267 285 15-5 0.750 1291 1135 1 L
Deer Valley 1-3 0.250 266 274 10-9 0.526 1235 1191 1 W
Antioch 0-4 0.000 203 271 4-16 0.200 1058 1211 10 L

These are the games played only between the 6 teams in conference; in Contra Costa County, California, in the cities of Oakley (Freedom), Pittsburg, Brentwood (Liberty, Heritage), and Antioch (Deer Valley, Antioch). As we can see, this is a well-ordered standings. When we look at the wins-matrix, we have

  Freedom Pittsburg Heritage Liberty Deer Valley Antioch
Freedom X W W W W
Pittsburg L X W W W
Heritage L L X W W
Liberty L L X W W
Deer Valley L L L X W
Antioch L L L L X

The Colley matrix is then c_{ii} = 6 for i = 1, \dots, 6 (where the “X” values are in the win matrix); and c_{ij} = -1 for every place we see a “W” or “L” in the win matrix. Otherwise, the values are zero. The matrix C has full rank and is therefore invertible. The vector b is [3,2,1,1,0,-1]’. The resulting solution gives:

Freedom 0.8125
Pittsburg 0.6458
Heritage 0.5417
Liberty 0.4792
Deer Valley 0.3125
Antioch 0.2083

This ranking is exactly as we expect for the well-ordered set-up. Let’s take a look at another high school conference in California.

School Name *W-L *Pct. *PF *PA W-L Pct. PF PA Strk
Colony 7-0 1.000 522 312 18-6 0.750 1634 1264 2 W
Ontario 4-3 0.571 380 438 10-10 0.500 1107 1128 1 L
Alta Loma 3-4 0.429 374 409 8-15 0.348 1168 1304 3 L
Montclair 3-4 0.429 346 415 8-14 0.364 1220 1369 1 W
Don Lugo 3-4 0.429 434 405 6-16 0.273 1337 1270 2 W
Chaffey 1-6 0.143 346 423 4-17 0.190 1082 1284 4 L

These are the games played only between the 6 teams in conference; in San Bernardino County, California, in the cities of Ontario (Colony, Ontario, Chaffey), Alta Loma, Montclair, and Chino (Don Lugo). Here, the standings are lot more muddled compared to the Bay Valley conference above. When we look at the wins-matrix, we have

  Colony Ontario Alta Loma Montclair Don Lugo Chaffey
Colony X WW W WW W W
Ontario LL X W L W WW
Alta Loma L L X LW LW W
Montclair LL W WL X W L
Don Lugo L L WL L X WW
Chaffey L LL L W LL X

The matrix C is populated as:

9 -2 -1 -2 -1 -1
-2 9 -1 -1 -1 -2
-1 -1 9 -2 -2 -1
-2 -1 -2 9 -1 -1
-1 -1 -2 -1 9 -2
-1 -2 -1 -1 -2 9

The vector b is also [4.5, 1.5, 0.5, 0.5, 0.5, -1.5]’. Since the matrix, C, has rank 6, it is also invertible. This gives a solution of:

Colony 0.7715
Ontario 0.4657
Montclair 0.3980
Alta Loma 0.3574
Don Lugo 0.3243
Chaffey 0.0344

This suggests that despite the Mt. Baldy standings placing Alta Loma ahead of Montclair; Montclair has proven (only by wins and losses using the multinomial distribution for expected wins based on schedule strength) to be the stronger team in Mt. Baldy conference play.

Ranking the NFL Using the Colley Matrix

Now that we’ve seen how to use this common ranking method for separating out teams based on wins-losses and strength of schedule, we are able to apply the rankings method to the NFL. In this case, the matrix is 30×30 and the resulting rank of the matrix, C, is 30. The resulting rankings is then:

Team Ranking AFC Playoffs NFC Playoffs
Carolina 0.8260 1st
Arizona 0.7892 2nd
Denver 0.7442 1st
Cincinnati 0.7243 3rd
Minnesota 0.7007 3rd
Kansas City 0.6868 5th
New England 0.6753 2nd
Green Bay 0.6643 5th
Seattle 0.6518 6th
Pittsburgh 0.6317 6th
New York Jets 0.5432
Houston 0.5229 4th
Washington 0.5115 4th
Detroit 0.5017
St. Louis 0.4899
Buffalo 0.4771
Oakland 0.4735
Indianapolis 0.4711
Atlanta 0.4646
Chicago 0.4524
Philadelphia 0.4289
New Orleans 0.4247
San Francisco 0.3882
New York Giants 0.3688
Tampa Bay 0.3573
Baltimore 0.3557
Miami 0.3402
San Diego 0.3182
Jacksonville 0.2860
Dallas 0.2802
Cleveland 0.2610
Tennessee 0.1890

Here, we see that the Carolina Panthers are indeed the top team in the NFL, according to the Colley ranking system. Furthermore, when the AFC and NFC playoff rankings are placed alongside the Colley rankings, we see that only the New York Jets are left out of the playoffs due to scheduling and NFL playoff ranking rules.

Now, using these rankings in the playoffs, we see that the Carolina Panthers and Denver Broncos tighten a little closer to each other, with Carolina still retaining the top spot. This indicates that the Carolina Panthers are expected to win the Super Bowl. Note that we stated expected; not predicted. This is in part due to the rankings model being an a posteriori model. That is, the model reflects only on the past.

For a predictive model, we may wish to break out a regression type method; such as multi-logit regression to reflect the updated probabilities. In doing this, based solely on wins-losses. This indicates that the Panthers have a 56% chance of winning the Super Bowl. We are unable to produce an expected score, as the model does not take scoring into account.

To take scoring into account, we use a Poisson regression model, which (using many more factors in this process) indicates that the Carolina Panthers should win the contest by a score of 34 – 20.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s