Introduction to Pandas Using Play-By-Play

By popular demand, attached you will find basic course materials that I developed for a sports analytics course taught at UW-Madison. The goal is simple: introduce Pandas and show how column manipulations, groupings, and report building could be accomplished. This was a working document at the time; and has not been updated since the course.…

Stochastic Tracking

In the era of tracking data, a need for a new style of analysis has emerged. Long gone are the regularized regression models and the simple counting techniques. Instead, we require leveraging shot-noise distributed systems such as Dan Cervone’s competing risks model, or Matthias Kempe’s self-organizing maps, or Peter Carr’s Imitation Learning. The list is…

Making Blocks Count

When we measure the defensive impact of a player, typically the first arguments we make are the number of blocks and steals that player has obtained. We celebrate players like Dikembe Mutombo and Maurice Cheeks for their prowess in obtaining blocks (2nd all time) and steals (5th all time), respectively. In the latter case, a…

Statistics of Colley’s Ranking Methodology

In 2002, Wes Colley (Princeton) developed a methodology that became a part of College Football BCS rankings lore: The Colley Method. In his original paper, Colley claims that his method is “bias free” for estimating the ranking of a team given a particular schedule. The resulting values for each team is identified as a ranking…