Consider, for a moment, being a General Manager for an NBA team that is faced with determining the number of years for a player contract. The problem seems simple: a team requires a certain skill set that a player possesses and they would like to know for how long a player would be able to contribute those particular skill sets for the team. This type of problem is known as a player career arc problem. The most common phrasing of this statement is: “Are we able to forecast the contribution of a player over the next three years?”
There are several ways to attack this problem. We can apply regression type models, time series methods, or even deep learning algorithms. Each method carries both strengths and weaknesses. Some of these strengths and weaknesses are data driven and/or are model specification. For example, suppose we apply a regression method such as RAPM. Here, we suppose that our stint data is Gaussian; which it is not. This means the resulting coefficients are not quite Gaussian when the RAPM theory suggests the coefficients are and we are unable to apply a nice transition model to estimate the next year’s coefficients. While we still do this and can obtain reasonable estimates, they are only reasonable to an extent and may not be indicative of the truth.
In this post, we focus on a nonparametric attack and develop a Random Forest model to predict player career arcs. Once we present the methodology, we randomly grab a player and identify their career arc with respect to the NBA players that this player closely identifies with. To do this, we use a nice property of random forests: the proximity matrix.
The player selected for this exercise? Eric Snow.
Random Forests: Decision Trees with Bagging and Randomization
In order to understand what a random forest is, we first take a look at a decision tree. A decision tree is a multi-level partitioning algorithm that chops up our data such that players are clustered according to their traits. For example, suppose that we list players by their positions and use attributes such as height, weight, number of rebounds per 36 minutes, number of three point field goals attempted per 36 minutes, and jersey number. It this basic exercise, if we have a player that is 6’11, weighs 250 pounds, obtains 14 rebounds and takes 0.3 three’s per 36 minutes, and wear jersey number 52; chances are, we have a center.
The decision tree may operate in this following manner. Suppose we select an attribute at random. Say it is rebounds per 36 minutes. We then look for an optimal splitting point that separates the rebounders apart. This may well separate centers and power forwards from the wings and guards, however, we may not get strong separation between the post players. Similarly, Russell Westbrook may slip through and hang out with the post players for this past season. If we have labels, such as positions, to train off of, we can use a special method of separation such as Gini Index or Maximum Entropy. If not, we look for an optimal split point that just gains separation between a determined set of players.
At the next level we select another attribute at random. This time, suppose jersey number is selected. Now there are two sets of players to look at from the previous level: low rebounders and high rebounders. We again split up each group of players into a pair of subgroups by some measure and continue this process.
At each level, a collection of players is called a root. The separation of the group of players at each root is called a split, with the players being distributed into leaves. Trees do not have to have only two leaves at each level; it’s just in our example we decided that for easy illustration.
For a decision tree, the root is the collection of all players. Each level is a depth of the tree, consisting of leaves previous level. Each of the leaves at the end of a tree is a terminal leaf. A nice illustration given by Dr. Mohammad Noor Abdul Hamid identifies splitting on Gender and Height to partition people.
Particularly notice that for the split on height, the splits do not have to be the same for each leaf node. That is, female heights are split differently than male heights.
Typically trees have low bias, but tend to have high variance associated with prediction and therefore have terrible predictive power.
To move from a decision tree to a random forest, we introduce (quickly) the idea of bagging. Bootstrap Aggregation, or bagging, is a technique to help understand the accuracy of a prediction for a given learning process. The process is relatively simple: First, we take a bootstrap sample from our training set. This is a sample with replacement. Next, we fit our model of interest to the bootstrap sample. Call this model number 1.
We repeat this process of taking a bootstrap sample and fitting our model to obtain model number 2. In this case, model number 2 will be similar to model number 1; however the prediction for a particular input, x, may be different for both models. This is what helps us understand the variability associated with a model without having to rely heavily on distributional assumptions.
We continue this process until we have obtained B many models. Taking the average predictions of these B models yields our bagging estimate for that particular input. Not only do we assess variance, we are able to reduce variance by using the bagging estimate as the mean. If our modeling process is the decision tree model above, this will help us reduce the high variability associated with trees. Combining this idea with decision trees above, we obtain what we call random forests.
Trees are one of the best models when it comes to capturing complex interactions within data. Unfortunately, noise is usually too high to make a good prediction and the tree ultimately becomes an explanatory tool. Instead, by introducing the concept of bagging to decision trees, we are able to help reduce the noise associated with fitting (also called growing) a tree to data.
A random forest is then a collection of decision trees obtained through bootstrap sampling with each node being a randomly selected attribute. This randomization is key to ensure that each of the bootstrapped trees have as minimal amount of correlation as possible. The process is relatively straightforward:
- For Each Tree:
- Draw a bootstrap sample of the data.
- Build a Decision Tree to the bootstrap sample.
- Select a subset of attributes at random.
- Find the best attribute to split on, using Gini Index or Entropy
- Split the Node into two Leaf-Nodes.
- Repeat B times to obtain B trees.
The resulting tree will then identify how to split up the data. If there are labels to each observation, then we take the maximum number of labels in the leaf to identify the value of that leaf.
In our player example above, our labels are positions: PG, SG, SF, PF, C. Suppose we use five attribute selections each time. For the first tree it may be Rebounds, Jersey, Three’s, Rebounds, and Height. Notice that rebounds was selected twice! That’s alright. There are then 32 terminal leaves that contain all the positions. Suppose the first terminal leaf contains 16 players that are labeled as PG, PG, PG, PG, PG, PG, PG, PG, SG, SG, SG, SG, SG, SF, SF, PF. Then the terminal leaf is marked as PG.
We can, actually, continue the splitting process until all leaves contain a single label. Either way, for a new player, if we take their attributes and drop them into each of the B trees, we obtain B labels associated to that player. For classification, taking the maximum label is the predicted label of the player. For regression, we simply average the outputs. So how do we compare players?
A proximity matrix is a player-by-player comparison that counts how similar (or proximal) two players are. In this case, we define proximity as the number of terminal leaves shared between two players. If we consider every NBA player that started after 1980, we leverage all 2630 players to obtain a 2630×2630 matrix. The (i,j) entry of this matrix identifies how close two players are according to the model. In this case, how many terminal leaves are shared.
If we perform a redundant exercise and train on all players, the diagonal of this matrix will be exactly the number of trees. If two players are as opposite as humanly possible, then the value of their row and column intersection is zero. Meaning they are not close at all.
Applying Random Forests to Forecast Players
For our simple exercise, we collected all summary statistics for every NBA player that started their career after 1979. Sorry, Kareem, you’re not included. However, Magic Johnson and Larry Bird are included! The attributes we used were:
Age, GP, GS, MP, FG, FGA, FG%, 3P, 3PA, 3P%, 2P, 2PA, 2P%, EFG%, FT, FTA, FT%, OREB, DRED, REB, AST, STL, BLK, TO, PF, PTS
We then took each season and split off players into years played in the league. This helped us redefine the 38 season worth of data into 21 years worth of data. Each year represented a year played by a player. For instance, Year 1 is the collection of Rookie Seasons between 1980 and 2017. The final year? That’s Year 21, which only includes Kevin Willis’ and Keving Garnett’s final seasons in the league; 2007 and 2016, respectively.
Once we obtain the 21 files, we apply a sequence of random forests to each files, growing 1000 trees at each file. This means what we have 21,000 trees over the 21 years!
Next, we consider a player of interest. Suppose they have played three years in the league. We take their first year and drop the attributes for their rookie season into the Year 1 random forest. This yields a proximity matrix for that player.
Repeat this for years two and three, and we obtain two more proximity matrices. We then add the three proximities together to get an idea of closeness between the player f interest and the other players in the league.
This way, if a player experiences and uptick in their career, they may match weaker players early on, but match stronger players in their third year. The proximities will capture this. Using the proximities as weights, we then take Year 4 values for players and compute the predicted stat line for the player of interest!
Similarly, if a player that the player of interest matches to is no longer in the league, we mark an indicator to identify a probability that the player of interest will be out of the league.
Note that half of the players who started after 1979 were out of the league by their fifth year. It would be of interest to identify this probability as we trudge along trying to predict future years.
Case Study: Eric Snow
One the 2630 players in the league, we performed a random selection and obtained Eric Snow as our case study. Eric Snow had a curious career after coming out of Michigan State in 1995. Snow was picked up by Seattle, used sparingly in his first three seasons; at roughly an 11 minute-per-game rate. After a trade to Philadelphia in 1998, Snow became more of a presence on the court alongside Allen Iverson, dramatically improving his scoring from 3 points a game to 12 points a game despite only tripling his minutes.
The question is, can we predict his 2004 NBA season using this random forest methodology?
If we apply traditional time series techniques, we would expect his numbers to increase, as they have over the final five year period. In this case, we apply the random forest methodology in hopes of finding players similar to Eric Snow and using their future years to predict Snow’s progression in the league.
Who are some players that Snow matches to? Here are some proximity scores:
- Milt Palacio (83 matches)
- Spud Webb (46 matches)
- Antonio Daniels (31 matches)
- Doc Rivers (29 matches)
- Randy Brown (23 matches)
- Scott Skiles (19 matches)
- Kevin Johnson (12 matches)
- Malik Sealy (12 matches)
- Bill Hanzlik (11 matches)
That’s quite a cast of characters! Note that these are not all the top matching players. There is a total of 380 players that matches to Eric Snow! Of those players who managed to play a ninth year? Only 186. That’s slightly under half. Therefore, we say that Eric Snow has roughly a 49% chance of being in the league for a ninth year. Pressing on, that percentage drives down to 42% and 37% for a 10th and 11th year, respectively. Snow managed to play 13 seasons in the NBA.
Predicting the 9th season…
Now Eric Snow moves on to his ninth season. Here, we use the proximity weights obtained from players like Spud Webb and Doc Rivers. This will give us a free-flow estimate of stats. Since coaches control the actual games played, we adjust accordingly.
In this case, the Philadelphia 76ers coaching tandem of Randy Ayers and Chris Ford oversaw Snow playing in all 82 games. In this case, here are the true stats compared to the predicted stats. NOTE: we deleted all of Eric Snow’s games up to season 8. This means we could not train using Snow’s season 9 through 13 to predict season 9. We do use statistical integrity!
Here we see that we miss quite a bit in starts and assists; however we manage to nail down items such as field goals, three pointers attempts, free throw percentage, rebounds, and particularly points scored.
What this helps show is that players can be approximated relatively well through their proximity to other players in the league.
By the way… Snow was also predicted to be 30.37 years old. He was indeed 30 years old for this season.
We Did Really Well! But Wait…
However note that we can only compare players given the data used! This is a very important note.
For instance, if a player is injured, they may have poor stats for that given season. Want an example? Look at Marc Gasol from a couple years ago with his broken foot. In this case, we may want to impose a new attribute such as days out with injury.
Similarly, we used totals. Totals aren’t the greatest statistics to use. Instead, we may wish to manufacture new attributes such as coaching type, number of possessions played, or strength of schedule of opponent. We may even want to change the entire variable set-up and use per possession type stats.
We just have to remember that the quality of output is indicative of the quality of attributes used. How does this old adage go…? Garbage in, garbage out?
Also note that we cannot perform this procedure on rookies. This is because we don’t use any pre-NBA data. In this case, we must obtain features that represent all players coming into the league, as well as have come into the league, or even attempted to come into the league to identify a proximity for eligible players.
Let’s See What Happens…
To test, let’s take a look at another random player. This time, Dwight Powell (Dallas Mavericks). Powell is heading into his fourth year this season and we are interested in his proposed stats. In this case, we have predicted the following for Powell:
Here, we expect Powell to get roughly the same amount of minutes, however distributed over more games. Due this this, we expect his shooting to decrease; as well as his rebounding and steals. However, we expect his passing to improve and his ability to get to the line to improve.
Let’s see how this plays out!