Now that we took a look at the probability of making the World Series for each MLB playoff team, let’s take a look at the optimal path for navigating the playoffs. For this post, we will still assume the continuity correction model from the previous posts for making the playoffs and making the world series. One of the major problems for predicting the world series champion using a season-based match-up model is that the teams that meet in the World Series may have never played each other that season. In fact, that is exactly what happens in our prediction model. So let’s break down each series based on the reproducible model and see how we are doing as we are coming to to completion of the fourth day of the postseason.
Wild Card Game: Houston Astros (86 – 76) vs. New York Yankees (87 – 75)
The New York Yankees came into this series limping in from a beating at the hands of the Baltimore Orioles, nearly dropping the top wild-card spot to a rejuvenated Houston Astros squad. However, based on our simple model, this did not factor in to our determination of a winner. Instead, we see that the Astros won the season series against New York 4-3. Based on a continuity corrected model, this suggests that the Astros had a 54.5% chance of defeating the Yankees. Since this is a 1-game playoff, the Astros were favored to win the series.
Predicted Winner: Houston Astros 1 – 0
Actual Winner: Houston Astros 1 – 0
Wild Card Game: Pittsburgh Pirates (98 – 64) vs. Chicago Cubs (97 – 65)
This 1-game playoff was a show down of two of the best teams in all of baseball. However, the Chicago Cubs were heavily favored over the Pittsburgh Pirates due to a surging (potential NL Cy Young award winner) Jake Arrieta. Our model also didn’t care about this, as the Cubs were 11-8 for the season series against the Pirates; yielding a 56.5% chance of defeating the Pirates.
Predicted Winner: Chicago Cubs 1 – 0
Actual Winner: Chicago Cubs 1 – 0
Divisional Series: Houston Astros (86 – 76) vs. Kansas City Royals (95 – 67)
Kansas City surged into the World Series last season and the Houston Astros seems poised to do the same. The Royals, however, maintained the best record for the American League despite a rocky second half after surging out, winning 60% of their games through the first half of the season. The season series saw the Astros take 4 of 6 meetings against the Royals. This gives the Astros a 60.0% chance of winning each game. Since this is a best of 5 series, we break down the odds for how every way the series will end.
Royals Sweep: Royals win three each at 40% gives a 6.4% of occurring.
Royals win 3-1: In order for this to happen, the Astros must win one of the first three games. This is a three-choose-one event, which is 3. So the chance of the Royals winning 3-1 is 3*(.4^3)*(.6^1) = 11.52%.
Royals win 3-2: Here the Astros must win two of the first four games. This is a four-choose-two event, which the number of ways we can pick two items from four items. In this case we get 6, leading to a probability of 6*(.4^3)*(.6^2) = 13.824%. Combining all the ways the Royals can win the series, the Royals have a 31.744% of winning the series.
Astros win 3-2: 20.7360% chance
Astros win 3-1: 25.92% chance
Astros sweep: 21.6% chance
Predicted Winner: Houston Astros 3 – 1
Current Status: Series Tied 1 – 1
Division Series: Toronto Blue Jays (93 – 69) vs. Texas Rangers (88 – 74)
With the Rangers going for broke by picking up Cole Hamels, the late season surge saw the Rangers jump from third to first and hang on for dear life to win the American League West. In a similar fate, the Toronto Blue Jays rode a strong second half behind David Price and Josh Donaldson to take the American League East crown. The season series belonged to the Blue Jays as they took 4 of 6 games, replicating the Astros-Royals series.
Predicted Winner: Toronto Blue Jays 3 – 1
Current Series: Texas Rangers lead 2 – 0
Division Series: Chicago Cubs (97 – 65) vs. St. Louis Cardinals (100 – 62)
In another National League Central match-up, the St. Louis Cardinals held a slight 11-8 advantage against the Cubs. This gives the Cardinals a 56.5% chance of winning each game. In the same 5-game playoff set up as described in the Royals – Astros match-up, we obtain the probabilities for each possible series scenario.
Cubs win: either by sweep (8.2313%), by taking three of four (13.9520%), or winning game 5 (15.7658%) for a 37. 9491% making the NL Championship Series.
Cardinals win: either by sweep (20.4774%), by taking three of four (23.5373%), or by winning game 5 (18.0362%) for a 62.0509% chance of making the NLCS.
Predicted Winner: St. Louis Cardinals 3 – 1
Current Series: St. Louis Cardinals 1 – 0
Division Series: Los Angeles Dodgers (92 – 70) vs. New York Mets (90 – 72)
The Los Angeles Dodgers and New York Mets are similar teams with strong pitching and decent hitting. Both teams quietly put together a strong season, each having to fend off and eliminate their competition late in September: Mets over the Washington Nationals with a key three game sweep; Dodgers over the San Francisco Giants by taking 3 of 4. This season, the Mets took 4 of 7 meetings, for a continuity corrected 54.5% chance of winning each game.
Dodgers win: either by sweep (9.4196%), by taking three of four (15.4011%), or by winning game 5 (16.7872%) for a 41.608% chance of making the NLCS.
Mets win: either by sweep (16.1879%), by taking three of four (22.0964%), or by winning game 5 (20.1078%) for a 58.392% chance of making the NLCS.
Predicted Winner: New York Mets 3 – 2
Current Series: New York Leads 1 – 0
American League Championship Series: Houston Astros (86 – 76) vs. Toronto Blue Jays (93 – 69)
If the Texas Rangers made it to the ALCS, we’d be discussing a Rangers 4-1 series win. Instead, Houston took four of seven meetings for a 54.5% chance of winning each game. Here there are eight possible ways for the series to end with each team either winning in 4, 5, 6, or 7 games. Here, we use the binomial distribution in each case to count each possible loss. For a sweep, we have a binomial coefficient of three-choose-zero, which is only 1. For a five game set, we have four-choose-one to get 4 different ways the loser wins one game. For a six game set, we have five-choose-two to get 10 different ways to get two losses. For a seven game set, we have six-choose-three to get 20 different ways to get a seven game series win for a particular team.
Astros win: by sweep (8.8224%), in five games (16.0567%), in six games (18.2645%), and in seven games (16.6207).
Blue Jays win: by sweep (4.2859%), in five games (9.3433%), in six games (12.7303%), and in seven games (13.8760%).
Predicted Winner: Houston Astros 4 – 2
National League Championship Series: St. Louis Cardinals (100 – 62) vs. New York Mets (90-72)
Here, the St. Louis Cardinals hold a 4-3 edge over the New York Mets, simulating the ALCS predicted probabilities.
Predicted Winner: St. Louis Cardinals 4 – 2
Check A Different Model…
Let’s take a quick moment to see what’s been going on. Here we have used historical data to completely predict the MLB playoffs. While these games take into account past efforts, the MLB playoffs are an entirely different set of circumstances. Games are played out a little differently, with little preference given to a month of games down the road; as games do not exist in mid-November.
Also, games in early April only affect the postseason by identifying who gets in. The games in September are a more likely barometer of which team has a better chance of winning. Thus, if we impose an exponential decay on games, allowing for games in September to be a value of 1 against games in April of being a value of 0.167, we can employ a logistic regression to predict winning percentages for each team.
By applying this type of model, we have this breakdown for the playoffs:
Houston Astros defeat the New York Yankees 1 – 0
Chicago Cubs defeat the Pittsburgh Pirates 1 – 0
Kansas City Royals defeat the Houston Astros 3 – 2
Texas Rangers defeat the Toronto Blue Jays 3 – 1
St. Louis Cardinals defeat the Chicago Cubs 3 – 2
New York Mets defeat the Los Angeles Dodgers 3 – 2
Texas Rangers defeat Kansas City Royals 4 – 2
St. Louis Cardinals defeat New York Mets 4 – 2
St. Louis Cardinals defeat Texas Rangers 4 – 1
OK, Let’s Go Back to our Model. And the winner is…
St. Louis Cardinals: 4 – 1 (By Logistic Regression)
That’s right, the Cardinals are predicted to win the world series in five games. Here, since there was only a 5-2 preseason match-up between these two teams, we could not simply predict this series based on this singular game. Instead, we take a look at the 2,429 total regular season games (Twins-Tigers did not complete a final game in their series) and mark every game as home team “1”, away team “-1” and no team as “0”.
We apply an exponential decay on the time of games. If the game is in April, multiply by 0.167. If the game is in May, multiply by 0.333; and so on. The response vector (win/loss) is “1” if the home team wins, or “0” if the visiting team wins. Then, a logistic regression is a general linear model that transforms the response vector into a ratio of a probability of “1” (home team win) to a probability of “0” (visiting team win). Taking the logistic function, we get a possible value between -Infinity and Infinity. Applying a regression, we can reverse the transformation to obtain a predicted probability of a win for each game.
In our set up, the St. Louis Cardinals have a 63.7% chance of winning each game against the Houston Astros. While this suggests a 4-2 series win, we are above 60%, which would give us an expected 3-2 series lead. Instead we see a common 4 – 1.8 series finish, when averaging a simulation of outcomes. This pushed us down to a 4 – 1 outcome.
Let’s see how both models work out!