A titleless post

February 12th, 2008  |  Published in Sabermetrics, baseball, blogging  |  2 Comments

Couple of scheduling notes to start off. We’ll start on #4 in the prospect list on Wednesday (vote for #3 here). The final part in the projection series (relievers) will probably go up Thursday. I’ve got a couple of ideas for some things beyond that, plus whatever else pops into my head.

Anyway, I’ve been trying to post everyday of late. It isn’t particularly difficult, but some help wouldn’t hurt either. I notice some people who comment here or in other places don’t have their own blogs (it’s as easy as a couple of clicks, if you didn’t know). So if anybody wants an outlet to write something, I’d be willing to post it. Now it certainly isn’t anything special to post something on this blog, but I figured there’s a little bit of a built in audience and maybe someone would be interested. If you like doing it and I like what you’re writing, maybe we could turn it into a weekly column/regular contributor type of thing. That’s all up to you, of course, but I’m just opening the door to anyone interested. Mention something in the comments or contact me via email (the contact form up top) and we can discuss some things if you’re interested.

Now, some interesting links on a Tuesday:

I am truly amazed at how many people have great blogs and are able to do some great work, whether that’s in writing, sabermetrics, research, analysis, humor, or a number of other things. We know the Padre ones like Ducksnorts, Gaslampball, and numerous others, but it seems like every team has at least one or two must read blogs (at least for me). They’re fun stuff and when you put off your homework until like 3 am every night, they can come in handy as well.

One of them is Detroit Tigers Weblog and here’s the latest from Billfer. That’s a detailed look at Dontrelle Willis’ repertoire using the PITCHf/x data. He uses something called K-means clustering (which I believe is available as an add on in excel) to group Willis’ different pitches. As I understand it, this type of clustering takes three parameters (in this case, speed, vertical movement, and horizontal movement) and groups the pitches based on similarities in these parameters. Although it certainly won’t classify every pitch correctly, if you have enough data it allows you to analyze each type of pitch that a pitcher throws. Then he goes on to look at location, swing/contact %’s, pitches by count, and a bunch of other things. It’s pretty awesome stuff.

A poster at Lookout Landing has some great stuff on probability and uncertainty in projections. Basically, all projections systems will give you is the most likely point on a team’s probability density function. The projection could be right in a sense, but still be off by a few games when all things are said and done. Take the coin flip example. You know that a fair coin is going to be very close to 50% heads and tails if you go on forever and ever, but in 100 flips it may end up 55% tails. Does that mean your original projection of 50/50 is wrong? No, it just means there wasn’t enough flips to get to the true level. I’m certainly not a math expert as we all know, so if I explain something wrong, tell me about it.

Skyking is running through different players’ career values and graphing it. As you can see, Willie Mays just towers over the other guys. Ozzie Smith’s total career value is similar to that of guys like Lou Whitaker and Alan Trammell.

There is plenty of debate on the Mariners’ blogosphere about the state of our bitter rival as well as the Erik Bedard trade. Much of it stems from a projection of the team going into next year and the idea that they’re not really an 88 win team (or that they weren’t last year) because they significantly outperformed their pythagorean record. However, SABRMatt chimes in with his PythagenMatt formula and it says that they were an 85 win team (which is about 5 wins better than their pythagenpat estimate), which would certainly impact future projections of the team. Although, as an aside, I think the best way to do projections is to simply project every player for that season and do it that way (i.e., their last season’s record or estimated record should not really matter in a future projection). Anyway, Matt’s formula calculates pythagenpat for each games, thus reducing the effect of blowouts. It does better than pythagenpat overall correlating with actual winning percentage, but I’m not sure that’s the only point of a pythagorean estimation.

As Guy once joked on the Book Blog, you could give a team 1 if they score more runs than they allow in a game, 0 if they don’t, and get an extremely close correlation with the team’s actual record. The point of the pythagorean stuff is, I think, to project a team’s future performance (or better estimate their “true” record). That is, if they score 30 runs in a game, they should be given full credit for that. Generally, great teams are the ones that blow teams out the most and there’s no reason to punish them for that.  Of course, perhaps there is … but I’m not sure there’s evidence that this method is better. Perhaps if someone looked at pythagenpat vs. pythagenmatt in actually predicting future records (rather than correlating with past ones) it would shed some more light on the issue. That is, take a look at teams at the 81 game mark and see what does a better job of predicting their next 81 games, using multiple years for the test — or something like that.

Finally, Dan Fox has a post utilizing his simple projection method vs. actual performance. The idea is that something like this could be used to identify or at least point toward possible steroid users. More interestingly, at least to me, it could possibly be used to see what kind of players outperform their projections on a consistent bases, what’s wrong with the projection methods, and things like that.

Responses

  1. billfer says:

    February 12th, 2008 at 6:09 am (#)

    Thanks for the link! For the K-means clustering I used Minitab. I wasn’t aware of an Excel add-on but will certainly check it out. If you’re interested in Minitab, they have a full version 30 day trial (which is what I used). The only downside to K-means clustering is that you have to know the number of clusters before doing the analysis. With Willis I had a couple of scouting reports to work with.

  2. MB says:

    February 12th, 2008 at 2:34 pm (#)

    Thanks a lot for stopping by, Billfer … really enjoy your blog.

    Yeah, I believe you can add it on with the cd that you got with excel. I forget where I was reading about it, but some of the PITCHf/x guys were discussing it. Thanks for the heads up on minitab — I may check that out.

    Anyway, I thought your article was a great presentation of what this data can give you.

Leave a Response