By popular demand, attached you will find basic course materials that I developed for a sports analytics course taught at UW-Madison. The goal is simple: introduce Pandas and show how column manipulations, groupings, and report building could be accomplished.
This was a working document at the time; and has not been updated since the course. That said, the document effectively breaks down play-by-play data to build a box score and look at 5-man lineups.
By the way, this document is the precursor to developing 5-on-5 stats such as this Minnesota Timberwolves one from mid-season in the 2017-18 NBA season:
5 thoughts on “Introduction to Pandas Using Play-By-Play”
Thanks for the post! Where can I find the csv file, say, for experimentation?
You can find a sample file here: http://s000.tinyupload.com/?file_id=29814271944843243423
Great! Thank you.
I’m interested in doing something similar in terms of calculating net rating and statistics for five-man lineup combinations in the NCAA. Did you generate the CSV file(besides the tracking data)by reading the play-by-play data into python and running code to create the csv?
No, it’s a generated csv that is delivered to me as content. For NCAA content, you may be able to track down Will Schreefer on Twitter.
I’d be very careful about NCAA data, however, line-ups are not done correctly and even Will can only clean up so much.