It’s been a little while since I’ve written about the PITCHf/x analysis going on, but it continues to fascinate me. There’s a ton of great stuff going on with it right now.

John Walsh wrote another great article for THT entitled “Pitch Identification Tutorial“. It’s a good overview on what you can do with this stuff. He takes a look at a bunch of pitchers, including Cla Meredith and Trevor Hoffman:

“Meredith is an exception to sidewinders throwing only two pitches — you can see his change-up thrown with the same movement has his fastball, but 10 mph slower.”


“Hoffman’s celebrated change-up is the slowest (under 80 mph), although his fastball is also slower than the others. What really counts is the difference in speed between the fasty and the change. It would be interesting to go through the data and see who has the largest difference.”

At the bottom of the article make sure to check out the pdf file which has charts of about 150 pitchers. They are plotted in what has become the standard, with horizontal break on the x axis, vertical break on the y axis, and speed indicated by the color of the dots. Awesome stuff.

For some further discussion on this article check out Ballhype and The Book Blog.

Over at Fast Balls, Mike Fast has been doing a great job of both tracking the PITCHf/x articles and providing some amazing content of his own. About a week ago, he published “Mad Dox mishmash“, an article looking at Maddux’s repertoire. Mike’s been using pitch speed versus spin direction to classify pitches. He finds that Maddux uses a few curves and sliders, but predominantly goes with 2 seamers, cutters, and changeups. Check out the whole article (and the others …).

Also, another great blog I’ve discovered is Josh Kalk’s “from small ball to the long ball“. Using a clustering algorithm, he has made player cards, similar to what Walsh did. Along with this, he’s done a lot of work with correcting the PITCHf/x data. After all, the system is not going to be perfect from park to park and there’s even going to be some problems within each park (such as moving the cameras around, random errors, etc.). A lot of it is a bit over my head, but it’s great stuff that somebody needs to do.

Finally, k4arros, a poster on the espn Padres board, helped me figure out some basic things with excel. I don’t know what I was thinking before, but I couldn’t get the graphs down how I wanted them (horizontal v. vertical with speed indicated by color). Now, I think I’ve got it. Here’s Peavy again, with my same database (just from August 2nd with about 3 starts missing … 701 pitches). Horizontal break is on the x, vertical is on the y, and speed is indicated by color ….


Again, I think his pitches are relatively easy to identify. You’ve got the fastballs at 90+. Two groups of sliders from 80 to 90 (the yellow and green dots). You can see that the slower slider breaks more than the faster one. I believe this is Peavy’s slurve, actually. Then you got a few curves and changes in the 70’s. Peavy, for the most part, works with fastballs and sliders (slurves).

The next step for me is obviously to get some more data (and work with some different pitchers ; ). I’ve still got to figure out how to set up a database and pull lots of data from the mlb site. I know there are a couple of tutorials out there, but that stuff is like a different language to me … progress will probably be slow.

Of course, you don’t need me! The real analysts out there are providing unbelievable stuff. I think there are going to be some real breakthroughs soon (adding to the ones already achieved), as long as this data is still made available. Too many smart people are getting their hands on it and it’s really fun to sit back and watch.