Finding Undervalued Pitchers: Fun with Correl!

My buddy Mike Salfino, who writes about baseball (both real and fantasy) at the Wall Street Journal and Yahoo!, loves (K-BB)/IP to evaluate pitchers, and to find those delightfully undervalued arms that help teams (both real and fantasy) win.

I like (K-BB)/IP, too, but in talking about it I haven’t been thoroughly persuaded by its predictive ability. Rank pitchers by the stat and you get good arms, for sure, but does that tell you anything new?

For instance, the Top 5 in 2013 (100 IP or more) were Yu Darvish, Matt Harvey, Max Scherzer, Cliff Lee and Chris Sale.

If you read Mike’s story, linked above, you’ll see he pitches this a different way. His IP cutoff is 50 IP, which lands the 2013 version of Danny Salazar at No. 1. Unfortunately for Mike, this means that he spends too much time pumping up Salazar for 2014. That didn’t work out so well, since Salazar ended up costing $17 (because no one overlooked his 2013 cup), and even after finishing the year fairly strongly, after he returned from the minors, he earned -$4.

kluberBut Mike can also rightfully point out that he identified Corey Kluber using the method, one of last year’s biggest successes. Kluber cost $12 and earned $31. Ka-ching!

So, is this metric really useful for finding undervalued pitchers? Is any stat? These questions gave me the idea of testing all the various stats we have for pitchers, to see if any are truly predictive from year to year for starting pitchers.

This story represents phase 1, since it only covers 2013 stats compared to their 2014 results, which is a fair number of pitchers (133 total, with 100 IP or more) but really is an N of 1 when it comes to judging the capabilities of the stats.

The tool I used is the statistical function of Correlation, which is used to compare two lists of numbers. If the two lists have the same-sized spaces between the ranked numbers in the list, they will have a correlation of 1. If the two lists have no similarity, the correlation will be 0. If the two lists are ordered in opposite ways, the correlation will be -1.

I compared each metric to three 2014 outcomes: Cost, Earnings and Profit/Loss. I did this by both actual stats and each pitcher’s rank among this cohort, but it was the comparisons of rank that were clearest.

Comparing the stats to a player’s cost, the highest ranked stats were (in order, above .5): WAR (.689), ERA (.684), tERA (.658), WHIP (.653). FIP (.652), BA (.631), K%-BB% (.621), (K-BB)/IP (.584), SIERA (.564), K% (.558), xFIP (.535),

This comparison shows that pitchers who were paid the most in 2014 were good pitchers in 2013. Not surprising, really, when you think about it. Other high scorers, but below .5, were K/9 (.47) and HR/9 (.49).

These measures helped us figure out who the most expensive pitchers would be, and which pitchers we shouldn’t pay for, but did they help us find profitable pitchers?

No. WAR may have had the highest correlation with price, but compared with profits it scored -.002. No correlation at all.

ERA versus profit was -.154. K% was bad, at -.199. The worst was BA, which had -.322 correlation.

It isn’t surprising that the correlation of the stats is mostly negative, once you think about it. With such strong correlations to price, most price movements up or down are likely to be unrelated to the stats. But one stat did correlate somewhat to profits. BB/9 (.312) and BB% (.332)  were the highest rated, while K/BB scored .186. Walks matter!

Not nearly as strongly correlated with price, but apparently somewhat indicative of profits. And isn’t that what we’re going for?

These results got me thinking about whether it was really helpful to lump all the starting pitchers together. There’s a big difference in the potential profitability of Clayton Kershaw, for instance, and Lucas Harrell. To test this I broke my 133 pitchers into two groups.

The first consisted of pitchers who qualified for the ERA title in 2013. There were 74 of these who pitched in 2014.

Among this cohort, the highest correlated stats to Price were K%-BB% (.834), SIERA (.826) and, yes, (K-BB)/IP (.822).

And once again, BB/9 (.253) and BB% (.244) scored highest in correlation to profits.

The other group was the 58 pitchers with between 100 and 162 innings pitched.

The results were similar, though the correlations were lower. This was a less predictable group, and it showed. One curious difference, however, involved xFIP, which is the version of FIP that standardizes Home Run rate.

In the group of qualifying pitchers, xFIP had a correlation to cost of .737 and a correlation to profit of -.010.

In the group of less illustrious pitchers, however, the xFIP correlation to price is only .488, while the correlation to profits is .219, nearly as high has the correlation to BB% (.267). SIERA, too, has a correlation to profits from this group of .199.

I hesitate to make too much of this just yet. I’m going to have to glom together a few more years of data to see whether this holds up.

As for (K-BB)/IP, it is useful the way all other filters are useful. The goal is to locate pitchers who are good who have so far had bad results in ERA. If they’re cheap, they’re worth a shot, but they are far from a sure thing.

I do think it’s safe to say that BB/9 and BB% are the place to look for potential profits, at least in part because those are the results fantasy owners are paying least for.

Here is the spreadsheet, if you want to check my work. Please.

Download (XLSX, 303KB)