With the new year coming up, we will be posting our NCAA Rankings based on single-season, non-prior induced, metrics for predicting who should make the NCAA tournament. Every year, we typically score between 65 and 68 teams correct; last year being a bust with 64 teams. As a side note, three of the four missed teams were number one seeds in the NIT Tournament. So not all was lost. One reason we perform so well is due to our ranking algorithms that focus on multiple aspects to scheduling, strength reduction, diminishing-returns on point differential, and trending. We treat each ranking algorithm as a random manatee: that is a judge with some, but not much, idea of how teams should be ranked; much like eye test guys. We then use each random manatee as input into a probability distribution on orderings of items. In this case teams.
One of the biggest troubles I see when folks “combine” rankings, they actually don’t know how to combine rankings. Typically, they add ranks and call it a day. In fact, one NBA I worked with simply added ranks. When I convinced them to use statistical principles, the draft model cleaned up and it dropped specific players who are already out of the league.
In this article, we give insight on how to combine rankings without sacrificing the integrity of the analytics. And to do this, instead of spilling secret sauce on our NCAA rankings, we look at a few ranking algorithms for NBA Players: RAPM, RPM, Win Shares, BPM, and PIPM. But first… an example.
Simple Voting Exercise
Let’s consider a simple voting exercise for Most Valuable Player. Suppose that three judges are allowed to submit an ordering of five players, previously agreed upon by the trio. Suppose for this past year they agreed to vote on Russell Westbrook, LeBron James, James Harden, Stephen Curry, and Anthony Davis. Let’s suppose the first two judges submit their rankings with identical ranks:
- James Harden
- LeBron James
- Anthony Davis
- Russell Westbrook
- Stephen Curry
However, the third judge despises James Harden and knows how the other two voters are going to rank their players. To combat this, the judge submits his ballot as
- LeBron James
- Anthony Davis
- Russell Westbrook
- James Harden
- Stephen Curry
So who wins the MVP Voting? Under MVP Voting Rules, James Harden finishes with 23 points while LeBron James finishes with 24 points. This means LeBron James wins the MVP race despite losing the popular vote and not having majority vote while another player does. This process is akin to the voting process called Borda Counting.
Borda Counting Can be Easily Gamed
And it’s not even a statistically viable methodology…
Borda counting is the process of adding ranks together. In the above example, the Borda counting solution would be
- LeBron James – 5
- James Harden – 6
- Anthony Davis – 8
- Russell Westbrook – 11
- Stephen Curry – 15
We see the one irrational judge gets to have his vote weighted more than the other two judges merely because he was able to game the rank aggregation methodology. While this example is cartoonish in nature, it’s not fathomable to build a cult-like philosophy of many voters in an obviously minority position (with respect to voting), who are able to strategically down-vote candidates in an effort to push the candidate down.
In Borda counting, the goal is the minimize the ranking across multiple judge rankings (analytics). In doing this, we simply ignore the analytic in question and start by treating all analytics as equal. Since all analytics are not equal, we instead weight outliers as premier voters/judges/analytics.
One question is how to identify the error associated with a ranking. This means, how do we measure the difference between two rankings? For the above example, we know that Judge 1 and Judge 2 have the exact same rankings. Hence the distance between their rankings should be zero. For Judge 3, what’s the distance between his ranking and the other two judges’? The way we compute this is by counting the minimum number of pairwise shuffles to obtain each others’ list.
For the example above, we can write Judge 1’s ordering as ABCDE. Similarly, Judge 3’s ordering is BCDAE. In this case, the difference between these lists is three. That’s the number of shuffles required to get BCDAE to becomes ABCDE:
- BCDAE -> BCADE
- BCADE -> BACDE
- BACDE -> ABCDE
By using this distance measure, we can prove this is indeed a metric on the space of all possible rankings. As a further exercise, we can show the furthest ranking from ABCDE is EDCBA. And that distance is ten.
If we are to draw a probability distribution, we would see that not only do we have two-thirds of our mass on ABCDE, but we have the rest of the probability mass at a point a distance of three away, with several other possible rankings just 1-2 distances away. Due to this, the Borda counting solution is in a low-probability location with BACDE.
Maximum Likelihood Estimation: Not Borda Counting
Instead of Borda counting, we can use the probability distribution above and look for the maximum likelihood estimator. In this case, we do care about the voters instead of the gaming the voters can play. In fact, our methodology should have the majority vote winner be the winner. In this case, we call this a Condorcet ranking. And the maximum likelihood estimator is the Kemeny-Young Ranking.
In Kemeny-Young, the goal is to find the ranking that best fits the probability distribution; that is, to identify the highest probability ranking given all pairwise combinations of items being ranked by judges. Let’s walk through the methodology using MVP voting example above.
Step One: Generate Pairwise Comparisons
The first step of Kemeny-Young ranking is to look at all the pairwise comparisons given by the voters. Since we have five candidates, we will obtain ten pairwise comparisons. For Judge 1, we can write the 10-by-2 voting matrix as
This matrix represents the ordering of Harden-James-Davis-Westbrook-Curry. The result for Judge 2 is identical. This leaves us with irrational Judge 3:
Step Two: Aggregate all Judge’s Votes
We see that Harden loses the first three rows, but everything else remains the same. The Kemeny-Young ranking methodology then looks at adding the pairwise voting matrices. When we do this, we obtain the overall voting matrix:
Step Three: Find the Maximal Ranking!
Immediately we see that if we take the maximum across each row, we obtain the maximum likelihood estimator for the ranking, which matches Judge 1 and Judge 2’s votes. We even get the majority vote winner winning the MVP!
This process is a little harder than the example looks, as there may not be a unique solution. There may be situations where a circular argument exists. In this case, all the equivalent rankings are equal.
Application to Player Rankings!
Let’s apply this to the 2018-19 single season numbers. One of the benefits of aggregating player analytics is that we are able to see how robustly different the analytic is when compared to its brethren. For this exercise, we take a look at RPM, RAPM, PIPM, BPM, and Win Shares. Using these five metrics, our goal is to identify the TOP 10 players in the league.
Since we can argue the merits of each of these analytics, let’s just assume that all are relatively blind, but good intentioned, much like the manatees from South Park.
RPM: Brought to you by Engelmann and Ilardi
Real Plus-Minus is one of the “black-box” analytics used to help users identify an estimate for a player’s net differential per 100 possessions. Under it’s disclaimer, the measurement leverages teammates, opponents, and “additional factors.” Using this metric, the current top 10 players in the league are
- Paul George – 7.64
- James Harden – 7.52
- Anthony Davis – 7.20
- Nikola Jokic – 6.60
- Kyrie Irving – 5.82
- LeBron James – 5.50
- Stephen Curry – 5.14
- Kyle Lowry – 5.06
- Nikola Vucevic – 5.03
- Kevin Durant – 4.82
RAPM: Regularized Adjusted Plus-Minus
RAPM is a ridge regression that is applied on lineup data. It does not have a prior distribution, nor does it have an augmented box-score data set. It’s simply on-off net differential the leverages penalization to avoid variance inflation. A lot has been written on this here. Using Ryan Davis’ current listing, the Top-10 players under RAPM are given by
- Danny Green – 4.99
- Kevin Durant – 3.69
- Jrue Holiday – 3.29
- Maxi Kleber – 3.20
- Kyle Lowry – 2.88
- Paul George – 2.85
- Seth Curry – 2.76
- Giannis Antetokounmpo – 2.70
- Brook Lopez – 2.70
- Steven Adams – 2.59
PIPM: Brought to you by Goldstein
Player Impact Plus-Minus is yet another plus-minus algorithm, developed by Jacob Goldstein, that leverages a box-score prior distribution with luck-adjustment on top of 15 years worth of RAPM data. The idea is to smooth RAPM estimates in an effort to develop a posterior distribution that can predict slightly better than RAPM and RPM. It gives a slightly different top 10 than the other algorithms, and has just as good passing of the eye-test; so much so that many folks have started adopting it within the league.
Currently, the Top 10 players are given by
- Giannis Antetokounmpo – 6.08
- Kevin Durant – 5.72
- Paul George – 5.47
- Anthony Davis – 5.33
- Kyle Lowry – 4.62
- Stephen Curry – 4.61
- Joel Embiid – 4.54
- Kyrie Irving – 4.45
- Nikola Vucevic – 4.25
- Mike Conley – 4.06
Win Shares: Kubatko Recreation of Bill James
Win Shares, as obtained from Basketball-Reference, is a derived as a metric that mimics Bill James’ same-named algorithm within Major League Baseball. It follows the use of Dean Oliver’s Points Produced model and constructs a marginal offense per marginal points per win. The amount of marginal points produced by each player is their resulting contribution to the win. Alas, win shares.
It’s obtained a bad rap around the league over the past few years, mainly for it’s inability to predict future values directly. It’s primarily used as a summarization tool; which even then it is discarded for points produced. Despite this, the listing is actually quite reasonable for a top 10:
- Anthony Davis – 5.80
- Kevin Durant – 5.60
- Rudy Gobert – 5.60
- Giannis Antetokounmpo – 5.50
- Paul George – 5.30
- LeBron James – 5.10
- James Harden – 5.00
- Damian Lillard – 5.00
- Clint Capela – 4.80
- Kawhi Leonard – 4.60
Box Plus-Minus: PER. Wait…huh?
Box Plus-Minus is yet another plus-minus algorithm that attempts to apply prior distributions using box-score data, but instead of focusing on line-up based analysis, it focuses on rate-based analysis; and it shows with many early entries. Due to this, BPM fails eye tests and requires filtering; which we will perform to at least make BPM palatable. As a side note, every team I have ever worked with has dismissed Box-Plus Minus; even moreso since the arrival of PIPM.
As a side note, BPM suffers many of the exact same problems as John Hollinger’s Player Efficiency Rating (PER). And we will place these lists side by side for your viewing pleasure.
One rule of thumb in measuring contribution of players is that if you have to filter, your metric is massively flawed and should never be trusted. But filter, we must… as even Hollinger’s metric has a qualified tab to click on.
So to appease the ghosts of analytics past, we filter to obtain at least a reasonable top 10 list. First, Box Plus-Minus filtered on minutes played:
- James Harden – 10.0
- Giannis Antetokounmpo – 9.2
- Nikola Jokic – 8.9
- Anthony Davis – 8.8
- LeBron James – 7.6
- Kyrie Irving – 7.5
- Stephen Curry – 6.6
- Rudy Gobert – 6.5
- Paul George – 6.5
- Russell Westbrook – 6.4
And then for Player Efficiency Rating:
- Anthony Davis – 29.66
- Giannis Antetokounmpo – 28.50
- James Harden – 28.34
- Boban Marjanovich – 28.11
- LeBron James – 26.73
- Kawhi Leonard – 26.56
- Kevin Durant – 26.37
- Stephen Curry – 26.18
- Nikola Vucevic – 25.89
- Jonas Valanciunas – 25.33
Sorry, Montrezl Harrell, we left you off the PER list.
Let’s Do Some Rank Aggregation!
Now that we are armed with six ranking analytics, we can apply Kemeny-Young to identify the consensus Top-10 Players. For Borda counting, we would take players and arbitrarily assign ’11’ values if they do not make the list. This makes no sense at all. Instead, we simply just don’t count them in the Kemeny-Young process and treat them as “ties.”
Across the six metrics, we have a total of 25 players across the Top 10! Well, that’s not a good sign. In this case, we will be forced to look at permutations of 25 players across the league; which is a total of 1.55×10^25. Yeah, that’s a huge number… Nonetheless, we look at the 300 total pairwise comparisons made across the six analytics.
Now the difficult task is finding the right permutation that maximizes the score across all players. For instance, suppose our Top 10 ranking is
- Paul George (101)
- James Harden (84)
- Anthony Davis (106)
- Giannis Antetokounmpo (101)
- LeBron James (74)
- Stephen Curry (67)
- Kyrie Irving (49)
- Kevin Durant (79)
- Kyle Lowry (44)
- Nikola Vucevic (41)
Then the score for this grouping is 746. But is this the highest score possible? The answer is no. In fact, we find that Kevin Durant beat Kyrie Irving in 4 of 6 categories. By swapping Durant and Irving, we lose two points for having Irving beat Durant, but gain four points for Durant beating Irving; a total of 748 points!
By playing this really hard swap game, we are able to identify our aggregated rankings of players:
- Anthony Davis
- Giannis Antetokounmpo
- Kevin Durant
- Paul George
- James Harden
- LeBron James
- Stephen Curry
- Kyle Lowry
- Kyrie Irving
- Nikola Vucevic
In fact, there are two ties in the system: Antetokoumnpo and Durant are interchangeable; as well as Kyle Lowry and Kyrie Irving. Despite this, we are able to obtain our aggregated ranking and we can start asking questions such as “how reliable are the metrics relative to other metrics?” To do this, we can look at the permutation difference we outlined above. For further reference, this is called the Kendall Tau Distance.
Advances in Rank Aggregation
With respect to each metric, we can weight the voting matrix above by distributing weight relative to the cumulative distribution functions for each analytic. We’ve seen this before. We can also use the associated variation with each metric to identify how “reliable” the voting metric is. We’ve seen this before as well. There are many routes to go to build the permutation distribution and develop a maximum likelihood estimator. And all are much better than being lazy and applying Borda counting.
So armed with this knowledge, which of the six metrics would you trust? Or do we trust the aggregation instead…?