Tuesday 12 May 2009

I Don't Like Cricket

It's very frustrating trying to do sabermetric-style analysis for cricket. Basically, there is much more effort involved in trying to assemble statistics in a form suitable for doing a study than there is in baseball. CricInfo's Statsguru is a major step forward compared to the situation that confronted me during 2003-4 at the old site, but still I had a very instructive experience recently.

I wanted to do some projections for the 1968 American League season. It was quite easy to go to the Baseball Prospectus web site and download some stats to help me find the players I wanted to project, a simple cut-and-paste job into a .csv file. Then the Lahman Database makes it a simple matter to gather together stat lines for individual players. By contrast, a cricket study involves a Statsguru query followed by the laborious task of copying and pasting scorecards, then editing the data into a form suitable for doing the research. I can gradually build up a database with the information arranged in a way suited to me, but it really is a tedious prospect.

Just an excuse, really, for my long silence. But when are we going to get a cricket version of Retrosheet? If someone with some money would come forward, I could make a start. Until that happens, performance analysis in cricket is going to remain in the Dark Ages, relative to baseball.

2 comments:

  1. I meant to upload such a thing a few months ago, and then I got lazy and didn't bother. I'll see if I can put something together soon. It won't be anywhere near as professional as Retrosheet though.

    ReplyDelete
  2. You stole the words right out of my mouth. I've long been frustrated by the lack of easy to use, structured cricket data. Another drawback of Statsguru & scorecards is that you can only get match or innings-level data, not ball-by-ball data (ie, you can't answer the question, "What is Hayden's strike rate the ball after he hits a four?").

    I've made a first attempt at a structured, ball-by-ball data repository on my site at http://data.againstthespin.com. If you're handy with programming, there's also some Python code on the site to manipulate those files. To start with, it contains only recent Twenty20 matches. I also have further tools to parse Cricinfo scorecards that I haven't made publicly available -- drop me a line at aneesh (at) againstthespin.com.

    ReplyDelete