I’ve become somewhat obsessed with these playoff predictors that keep popping up. Or perhaps I’ve become obsessed with the idea of trying to predict a team’s chances of making the playoffs during the season. I think a good model has a lot of use in major league front offices (or perhaps in message board rants as well). You could have a reference point to decide when you’re officially “done” or when you’re right on the brink of making the playoffs or what have you. You could theoretically add a player to your team and see how he affects your chances. I think it has a lot of real life usage in helping decision making. Let’s take a look at some different models and see what they think about the Padres’ chances.
A simple model like Sports Club Stats takes the remaining games and assumes each team has an equal chance of winning each game. The simulation is run “millions of times”. Essentially then this model is using in-season record alone to predict post season chances. Here’s how the NL West looks:
Padres: 59.3%
Dodgers: 50.3
DBacks: 45.3
Rockies: 6.5%
Giants: 1.8%
The teams are in order of their current record which of course makes sense by this system. What they’re doing is putting a number on it and incorporating the whole league (the wild card) into the percentages. It’s a simple method, but it’s a first step. By the way, when I say simple, I surely don’t mean to discredit the people who put this stuff together. It’s stuff I’d have no clue how to do.
Coolstandings takes a slightly more complicated approach. They simulate the season a million times and use a variation of Bill James pythagorean formula to predict how they will play in the future. So there is no more coin flip or .500 assumption for everyone. They’re saying that run differential is a better predictor of future success than just assuming everyone is equal, and they’re probably right. Here’s how the NL West stacks up in terms of chance of reaching the playoffs:
Padres: 89%
Dodgers: 52.1%
DBacks: 13.1%
Giants: 2.3%
Rockies: 1.2%
I think this is probably a good method, but I’m not so sure about it yet. I’m wondering if it’s accurate to predict based on run differential (or performance) over a given sample or if some regression to the mean component needs to be added in. A lot of my thoughts on this topic came from a great discussion on The Book Blog. In the thread, MGL suggests using preseason player projections (adjusted for current performance) and adjusting for playing time. So you’re forgetting the team (and their current performance) and using the actual players on the team at that time to predict the future. It makes perfect sense in theory. And it also covers the regression to the mean component because a big chunk of your projection is going to be based on each individual player’s preseason projection (like PECOTA for example). And this is where I think Coolstandings might be able to improve. They don’t involve any regression to the mean. (EDIT: correction…yes, they do regress to the mean) A team that has scored 180 runs and allowed 100 over a given sample is very likely not going to keep on playing like that (unless that is their true talent level, which us of course very unlikely), but Coolstandings makes no adjustments for this.
Now, along comes Baseball Prospectus’ prediction models. Their basic one simulates the season a million times and uses Equivalent Runs scored and allowed to forecast the rest of the season. They do regress this some amount to the mean (.500), but they don’t say how much. The Padres third order winning percentage is .561. BP’s expected record the rest of the way for SD is .546. So it looks like they’re regressing about 25% back towards .500. I have no idea if that’s enough at this point in the season. Here’s what comes out, though:
Padres: 70.9%
Dodgers: 66.2%
DBacks: 11.6%
Rockies: 2.3%
Giants: 1.9%
You can see that, like Coolstandings, they pretty much kill the DBacks because of their poor performance in run differential. They’ve been outscored by 17 runs but are 9 games over .500. That’s unbelievable. It’s even more unbelievable when you consider the Padres have outscored their opponents by 77 runs and are only 1.5 games ahead of Arizona. Next, BP regresses toward their preseason PECOTA projections instead of the mean of .500. This helps the Padres because they were projected at a .530 winning percentage while the Dodgers were at .493 by PECOTA. Arizona was at .543 but they are getting hurt by their run differential so far. What do we get?
Padres: 71.7%
Dodgers: 61.8%
DBacks: 13.1%
Rockies: 1.8%
Giants: 1.5%
BP also has an ELO adjusted version. I won’t get into that here because I don’t fully understand it (who am I kidding…I don’t fully understand any of this stuff). We’ve went through some of the models out there right now…from a simple .500 prediction, to a prediction based on pythag record, to a prediction that incorporates pythag and regresses. I don’t think we’ve reached the optimal model yet. It’s probably something along the lines of what MGL described in that thread I linked to earlier. Take individual players, regress their individual stats back toward their preseason projections, then account for playing time, injuries, trades, etc. I wonder if any teams are doing something like that. How valuable would it be to know that adding Adam Dunn will change your playoff chances from 48.6% to 58.9%? I’d imagine teams would like to know. Or what if you know your chances only sit at 8.0%. Adding Dunn is great, but not worth it in season as you’d still be at 11% or whatever. It’s interesting stuff I think.
No matter how you cut it, the Padres are sitting pretty right now. Here’s their main competition in the NL West:
By record: Obviously Arizona and LA
By Run Differential: LA
By Preseason Projection: Arizona
Overall: LA
I will hesitate to leave out Arizona though, as they were the preseason favorite. I’m still not sure if BP is applying enough regression to take that into account and look past a good portion of the current performance (in run differential). Does it really make sense that the preseason favorite is sitting there near the top of the division, yet is basically given no shot by the prediction models? Well, at this point, I’m not sure. It’s an area I’m sorta interested in. What do you guys think?