In light of the continual chest-beating by the SEC against Ohio State’s schedule, we figured to take a moment to look at common methods for ranking teams from a mathematical viewpoint and gain insight on the impact of schedules. While we are accustomed to playoffs and all the drama that it brings to our everyday life, we find that many times the best team does not necessarily win the championships. This is quite apparent in the cases of NCAA March Madness, the NFL Playoffs, and as of late the NCAA Football Playoffs. The old adage of being good and being hot are all that’s needed to win a championship typically rings true. The 2010-11 Green Bay Packers, the 2011-12 New York Giants, and every double digit seed to crack the Elite Eight would agree: Any team can win on any given night. However in order to make these tournaments, some form of criteria must be met.
In the NFL, it’s win your division or have one of the best records around. The schedule is built so that the best cross-representation throughout the league is had with respect to divisions. As there are only 32 NFL teams, a 16 game schedule gives a good sense of the top teams in the league, based on record.
In NCAA Basketball, there are approximately 350 Division I teams, vying for 68 tournament spots. With only 25-30 games per team, we don’t get quite the representation of record as an NFL schedule yields. An NCAA team may only get to play eight percent of all other teams, while NFL teams play over forty percent of all other teams. While there are 32 conferences in NCAA basketball, only the conference tournament and Ivy League regular season champions are guaranteed a spot in the tournament. Winning your conference does not outright guarantee a spot in the NCAA tournament; that distinction is held for the NIT. This leaves 36 remaining spots to be vied for.
These 36 spots are hotly contested over and heavily scrutinized each year. With the model explained here, we will show that 65 of our top 68 teams were granted spots in the NCAA tournament, just one pick off from ESPN’s Joe Lunardi.
In NCAA Football, each team gets about 13 games with about 9 in conference. Against 128 NCAA FBS teams, this gives each team a chance to play about ten percent of all FBS teams. And, quite unlike the NCAA Basketball Playoffs, there aren’t 68 spots for the end of the season tournament. There are four. Furthermore, these four spots are purely determined based on rankings alone.
So how do we go about developing a rankings model? There are many methods to develop a ranking model. The simplest model is to count wins. If you win the most, then you’re the best, right? Not quite. A team may opt to move to a weaker conference and build a weak non-conference schedule to ensure 12-13 wins per season. In that case, the next model is to build a model that weights a win based on the opponents’ overall record. In this method, we at least gain insight into the quality of opponent; however there are many detractors. Why should a team be penalized for beating a team who had a bad day against the worst team in the league? This started then leading to identifying other factors that help contribute to ranking a team. We call this the data science of building a rankings model.
1. Variable Building: To Use or Not Use Scores?
One common method that statisticians use are to take the scores of games. If Alabama beats the Poor Sisters of the Blind 72 – 0 and Louisiana State manages to win 72 – 3 against the same team a week later, then how do we compare LSU and Alabama? By taking raw scores, we do not take into account several factors: some players may be gaining playing time they normally do not get if this were an Alabama – LSU game. LSU and Alabama may be trying out plays they never would run in close games. On top of that, we get into the old 1990’s games where teams would run scores up because it improved their status in the polls; a commonly frowned upon act in college sports.
Instead, many statisticians take scores and place a diminishing returns model on scores. That is, weight point differentials and give blowouts less credence than closer games. Winning by 69 or 72 matters as much as winning by 35: the team was blown out. One simple method for diminishing returns is to place a logarithmic scale to the point differential. For example, a win by 1 is seen as a win by 1. A win by 7 is seen as a win by 2.95. A win by 35 is seen as a win by 4.56. A win by 72 is seen as a win by 5.28. The “data sciency” part is determining the scaling factor of how much of a diminishing return the statistician wants to give.
2. Variable Building: Who’s Actually Playing?
Many statisticians will also attempt to take into account playing times. As a premier example, the University of Wisconsin Badgers played against Rutgers in the 2014-15 season for NCAA basketball. Early in the game, their star point guard (Traevon Jackson) went down with a foot injury and Sophomore guard (Bronson Koenig) was unprepared to handle duties on the moments notice. The Badgers eventually fell to the Scarlet Knights during the attempts to scramble and adjust game plans. This was only a minor hiccup, but in many models, the Badgers’ end of season resume showed UW as a weak #1 seed or the top #2 seed.
This becomes another difficult variable to measure. Some statisticians will weight the game by an impact of injury score. One simple measurement is to take the average number of minutes played by the starting rotation during the first three quarters of the game and scale the number of minutes played by the starting rotation over the course of the first three quarters. The idea is, if there is a close game, the top players are in. If there’s a lead, the top players are in extending it. If the team is behind, the top players are in. If a player is injured, they are not in. The final quarter worth of action is removed in case of blow-outs.
The extra work comes in by determining if the statistician needs to use more than just starters, or more or less time from the games. More parameters that are needed to be tuned and may cause chaos in estimating the model.
3. Variable Building: Does Location Matter?
Almost all statisticians use locations of games; but the question is how you build that variable. Simple statisticians, including myself, will use a plus-minus-0 scale. A home game is “+1”, a road game to your opponent is “-1”, and a neutral game is a “0”. More sophisticated models will measure out the distances that fans will travel for games or, on a related scale, the distance a team travels to the game location. It has been seen that there is somewhat of a realistic notion of homefield advantage.
The problem then becomes, how do we give weights to travel? Will Hawai’i always have a huge advantage when opponents come to town? So far this season, Hawai’i is 1-0 at home after a 28-20 victory over Colorado and 0-1 on the road after a 38-0 defeat to Ohio State. But how does this get quantified? If using distance to travel, is there is noticeable difference between 2400 miles and 2800 miles? Do the number of time zones matter? In case the statistician uses these variables, they have to defend their decisions.
Of course, the easiest way to defend decisions is to look at the amount of variation explained. That is, we look at the outcomes of games and if we can find the right set of variables to predict those victories correctly, then we have a perfect model. For the uninitiated, statistics is the science of explaining variability in data. If by chance you find a perfect model, then you most likely have done something gravely wrong. In reality, we find an optimal set of variables that explain variation to the best of ability without over-fitting our model to the observed values.
Wait. Don’t Rankings Models Say Who’s Suppose to Win?
The short answer is no. Almost never. A rankings model is suppose to identify the order of teams from best to worst dependent on the set of variables that are selected. Furthermore, the model is an expected value. An expected value will indicate who should win, based on your variables, more often than not. It should be well know that all games are a sample from a random process. If the games weren’t viewed that way, then sports gambling would never exist, playoffs would never be held, and we wouldn’t be discussing the best teams in history. A rankings model will never tell a person who will win, but rather say that a team on average, has a better chance of winning.
Applications to the 2015 NCAA March Madness
For the 2015 NCAA Division I Basketball playoffs, we took into consideration all 5,788 NCAA Division I basketball games for 351 teams. For fairness, we removed any games that involved Division II, Division III, or any non-NCAA team. From this reduction, we considered a series of three variables: location of game (-1,0,+1), diminishing returns on point differential, and the number of minutes played by starters, weighted by halftime. The result is either a 1 (win) or 0 (loss). Performing a weighted generalized linear model regression against this approximately 5,700 by 350 matrix, we obtained the following ranking scores:
Taking into account the 32 conference champions, we found that of the remaining 36 teams, we were able to adequately predict 33 of the teams to gain entry into the tournament. Our whiffs were on Cincinnati, Mississippi, and Boise State. However in comparison, Joe Lunardi missed on Indiana and UCLA instead opting for Temple and Colorado State. Our three teams that were erroneously included were Colorado State, Illinois, and Miami (FL).
Applications to NCAA Division I Football Bowl Series
The 2014 NCAA FBS season was a curious case of how a team can win but drop violently in the rankings. Speaking of TCU, who held onto the third spot of the rankings system before drubbing Iowa State 55-3 in the home finale and fell to sixth in the rankings. In fairness, TCU lost to fifth seeded Baylor 61-58 earlier in the year, their only loss in the season. However Baylor was dropped by a much weaker West Virginia team. Evidence backed up that TCU should have been ranked higher with a beating of Ole Miss while Baylor fell to Michigan State in their Bowl Game.
For our model, we took into account only game location and diminishing returns on scores. When doing this, we obtained the following results:
Here, our tournament would have been Florida State – Baylor, TCU – Alabama. We would have missed out on the rampage that Ohio State unleashed on Alabama and Oregon to win the FBS Championship. As noted above, this Ohio State team was rocked with injuries and their loss to an unranked Virginia Tech did not help their cause in our model.
To give evidence to the difficulty of determining weights. If we took into account the number of snaps played by starting players, Ohio State’s score changes to 7049. This would place them in fifth place as Oregon moves to fourth and Baylor drops to sixth. This gives us a better indication of the true playoff picture used.
This leads us to where we started. Who is the best team in FBS now? With the bickering coming from the SEC, we can apply a simple model and identify a preliminary set of rankings. With no history attached and straight “who beat who” variables, we take note and see that there have only been 89 games played through tonight and are tasked to rank the 128 teams. This leads us to a new question: how do we rank teams in this kind of scenario?
There are many more methods for that, such as sparse matrix completion, random forest proximity calculations, observing a different set of variables (historical data included), and Markov chain simulations. Using a pseudo-inverse to complete our calculations, we obtain the best team in the land based on 89 games is indeed Ohio State. Second? Toledo (1-0 FBS), who by the way just beat an SEC team (16-12 over Arkansas).