Denis Poroy - Getty Images
10 months ago: SAN DIEGO, CA - JULY 29: Seth Smith #7 of the Colorado Rockies connects for a single in the second inning of a baseball game against the San Diego Padres at Petco Park on July 29, 2011 in San Diego, California. (Photo by Denis Poroy/Getty Images)
There's going to be a lot of talk about Seth Smith's splits in the coming months. Home/away, righty/lefty, all of it. This post is a friendly reminder of what splits can do...and what they can't. There are two big problems with using splits, and they both stem from sabermetrics' favorite "you can't do that" refrain, sample size issues.
Let me start with an example. If a player hits 3 singles over 10 plate appearances in two games, does that mean that his true talent batting average is .300? No one would possibly suggest it. What if he batted .300 over 50 plate appearances? 100? 5000? Where is the sample size line where we can be reasonably confident in our assessment of his batting ability?
Let's say that John Q. Baseballer has an OBP of .350 in 250 career plate appearances. How close to his true skill level is that .350 figure? It turns out that we can actually calculate the uncertainty in the OBP figure by using a formula derived from the distribution of OBP talent in baseball. Over 250 plate appearances, the uncertainty in John's OBP comes out to 0.030. For the math-inclined, this uncertainty is expressed in standard deviations, but to put it more simply, we can say that John Q. Baseballer's OBP has a 68% chance of being within 0.030 of .350 (in other words, it lies between .320 and .380). If we double this uncertainty interval to 0.060, John's true OBP now has a 95% chance of being within the interval (between .290 and .410).
Clearly, 250 plate appearances don't tell us much. But if we quadruple the amount of plate appearances to 1000 (about two seasons worth), the uncertainty now falls in half to 0.015, which is a far more useful estimation of his talent. See the big problem here? Quadrupling the sample size cuts the uncertainty in half. To quote Tom Tango in The Book, "if you want twice as accurate a measurement, you need four times as much data". So splitting a player's body of work into small splits results in uncertainties that grow into huge analytical problems. But that's just the beginning.
The second problem arises from the first. Let's go back to John Q. Baseballer.
Home: .370 OBP
Away: .330 OBP
Let's assume he has two full seasons under his belt with 1000 total PAs, half at home and half away. That seems like it should be a big enough sample to conclude that he has giant home/away splits, right? Wrong. Let me cite those splits again, this time with the appropriate calculated uncertainties included.
Home: .370 ± .022 OBP
Away: .330 ± .021 OBP
That certainly changes things, doesn't it? Remember, this means that there's only a 68% chance that his true OBP at home is between .348 and .392. That's the difference between a league average player and an MVP. And there's a 32% chance that it's outside that range! In nearly every use of splits you'll see in baseball commentary, you have not just one fuzzy number, but two, and they both have uncertainty intervals that overlap. With 1000 PAs split into two equal groups, there's a 3% chance that John Q. Baseballer is actually better on the road than he is at home! It's a small chance, to be sure, but there's a much greater chance that he has a home/road split that's smaller than average. We just don't know.
But enough about hypotheticals. What about Seth Smith?

Seth Smith is widely regarded as a guy who can mash RHP, but flails against LHP. His numbers make for a perfect illustration.
vs RHP: .377 wOBA
vs LHP: .262 wOBA
That's the difference between Evan Longoria and Jeff Mathis at his "peak". It's absolutely nuts. But are we certain that Seth Smith's true platoon split is that large? Just like John Q. Baseballer, I'm going to cite his splits again, this time with uncertainties. In his career, Smith has had 1209 PAs against RHP, and only 239 PAs against LHP.
vs RHP: .377 ± .015 wOBA
vs LHP: .262 ± .030 wOBA
Now it's pretty obvious—we can say with an extremely high level of confidence that Seth Smith is much better against RHP than LHP, even after assuming a hefty dose of "number fuzz". But there's another tool in the sabermetrics toolbelt called regression to the mean, which allows us to refine our estimate a bit. We have one more piece of information to use—Seth Smith is a left-handed baseball player, and left-handed baseball players generally have a certain average platoon split. In essence, we can "fill in" the uncertainty with league average plate appearances to give a sharper estimate of Smith's true talents. I'm going to skip the math, as there's no way I could possibly make it readable, but after regressing to the average LH batter's platoon split, Seth's splits cited above look more like:
vs RHP: .359 wOBA
vs LHP: .292 wOBA
Well, that's not so bad at all. It's certainly a larger split than normal, but it's not the Jeff Mathisian debacle that his non-regressed splits say. If Oakland plays Seth Smith as a full time player, assuming that LHP make up 25% of the pitchers in the league, we can expect a .342 wOBA. Not too shabby.
11 recs | 133 comments
Excellent article, dan. Thanks
This should also be remembered when discussing players like Chris Carter. His ABs are such a SSS, that there is no way they give us an accurate account of his ability. Kind of like basing Willie Mays’ career on his first month in baseball. He was every bit as bad as Carter( not that I’m expecting Chris to ever be that good.☺). This season should be a real eye-opener as to the talent we actually have in the system.
Tutu-late - January 18, 2012
And Allen would be the other side of the SSS problem...
The challenge in projecting prospects is that they’re not aiming at a stationary target: minor league pitchers are substantially worse than major league pitchers, so you both have to assess the statistical “true talent” reflected in the SSS of a year at AA, and at the same time make some kind of guess as to how that hitter will do facing a materially different challenge — pitchers who throw harder, with better control, and throw a greater variety of pitches with more movement.
Another way to put it is this: this statistical analysis deals with players as fixed values, not as players who may or may not be able to improve over time. That’s why youth is so important for prospects (and why Mays’ struggles as a 20-year-old rookie, IIRC, are less of a problem than Carter’s as a 24 or 25-year old).
Nick - January 18, 2012
Yes, I can see what you are getting at.
I was more interested at the effect of the SSS. Carter seems to be notorious as a slow starter as he goes to a new level. Be jitters, or whatever, he( and all prospects for that matter), needs to be given the time necessary before we conclude he is a failure.
Tutu-late - January 18, 2012
That's a great point, Nick
“How you can hit if you usually face Josh Rupe and Vin Mazzaro” is a lot different from how you can hit if you face “lots of major league pitchers”.
Nico - January 18, 2012
The important part is to give them more than 33 ABs against "lots of major league pitchers" before throwing them away.
Tutu-late - January 18, 2012
This is why I think the A's are a rudderless ship.
Carter was our big power threat in the minors and we jerked him around.
OldhamA - January 19, 2012
Cowgill is an example of both problems
First, it’s very, very, very unlikely that his 2011 numbers represent his true talent level facing AAA pitchers. Second, even assuming last year’s stats are an outlier, we still don’t know how he well he’ll deal with facing lots of major league pitchers.
Nick - January 18, 2012
That's the big big problem with the "throw five guys at the wall and see who sticks" approach.
We have no idea who’s actually sticky and who’s not!
danmerqury - January 18, 2012
Well, we do about Matt Carson.
Nico - January 18, 2012
Yeah, he didn't stick...
Seb - January 18, 2012
What’s the population of interest? Data over the course of a season or a career?
Reg - January 18, 2012 via mobile
The uncertainty formulas are Tango's.
And he derives them from the population standard deviation of the stat in question. I think (I’m not exactly sure) that he used a block of 4-5 seasons or so.
danmerqury - January 18, 2012
Oh, and in case you're curious:
σ(OBP) = sqrt((OBP*(1-OBP))/PA)
σ(wOBA) = sqrt((wOBA*(1.1-wOBA))/PA)
danmerqury - January 18, 2012
Does this count as offensive language?
A'sFanDFW - January 18, 2012
Offensive?
please don’t sqrt on my woba or my pa
Stew's Crew - January 18, 2012
Great analysis dan.
So where does Smith profile in the lineup? I remember reading something about lineup optimization and the need to put your best guy in the 4 spot. I’m trying to avoid over excitement from our brand new toy, but it looks like Smith is a pretty solid offensive player.
doolallynastics - January 18, 2012 via Android app
In a nutshell,
you want to organize your players into a few tiers. The best sit at 1, 2, and 4. The second best should be placed at 3 and 5. The rest, in order of value at 6, 7, 8, and 9. Conventional wisdom is correct about the 4 spot though, power is worth the most at the 4 spot.
danmerqury - January 18, 2012
traditionally
“baseball people” put their best hitter in the 3 hole..yet you place 1 and 2 hitters in tier 1 excluding the 3 hitter hole…empirical evidedence or articles to support this plz?
oakwin2004 - January 18, 2012
Tom Tango tackles this in the book
Here’s a quick, easy to read breakdown of why
sc00by - January 18, 2012
That's chapter 5 of "The Book", by Tango et al
Here is a discussion of it at BtBS.
A point that is often omitted in lineup discussions is that The Book also concludes the order doesn’t actually make much difference in the larger scheme of things.
iglew - January 18, 2012
thanks!!
without reading up on the topic or checking stats as a coach I generally like to put my best hitters in the 3 and 4 spots with the thinking if i can get my 3 hitter up in the 1st inning thats ideal….then hopefully that situation reoccurs throughout the game
oakwin2004 - January 18, 2012
correction
getting the 3 hitter up with 2 on is ideal…
oakwin2004 - January 18, 2012
correction
I thought having 3 runners on is ideal
Stew's Crew - January 18, 2012
umm
for the 3 hitter? two runners on for the 3hitter is ideal…
oakwin2004 - January 18, 2012
:)
bat around, 12th batter in the inning is the #3 hitter
smiles…
Stew's Crew - January 18, 2012
as a coach
thats a wet dream scenario
oakwin2004 - January 18, 2012
don’t sqrt on your woba...
the_rozeboom - January 18, 2012
although
more often the 3 hitter comes up with the bases empty lol
oakwin2004 - January 18, 2012
which is why you bat your best hitter 2 or4
Billy Frijoles - January 18, 2012
Right
The overall benefit of a perfectly tweaked lineup is extremely minimal. But I figure this team will need everything it can get for 2012.
doolallynastics - January 18, 2012 via Android app
cue the melvin "mad scientist" references
Billy Frijoles - January 18, 2012
Illustrated by a photo of him getting a left-handed hit at Petco!
Danny always bringing the good graphics with the knowledge.
stormtown - January 18, 2012
I think this is why Beane got him
Granted he’ll be hitting at the Coliseum and in recent seasons, we’ve seen very few, if any, players come to the Coliseum as their new home ballpark and be able to hit well. Or perhaps our equipment manager is making their uniforms too tight so as soon as they put on the A’s uniform, they literally CAN’T hit.
Regardless, I think Billy is going to flip him because I still don’t see the A’s anywhere near the other teams in the division especially if the Rangers get Prince Fielder and Yu Darvish. http://dallas.sbnation.com/texas-rangers/2012/1/17/2713845/prince-fielder-rumors-texas-rangers-yu-darvish-scott-boras So get some interesting trade chips and hopefully down the road you can get even more young talent for that mythical San Jose ballpark.
Tyler Bleszinski - January 18, 2012
Is it just me
or does it seem ridiculous for the Rangers to pop $100MM on a Japanese pitcher when they could get Prince Fielder for the same price?
I know the Rangers rotation isn’t very good, but, as the A’s know, offense beats defense.
ru155 - January 18, 2012
Beane mentioned that Smith has power
but I have trouble saying that about a hitter who only hit 15, and who played half his games at Coors Field.
OTOH the coliseum shouldn’t pose much of a problem for a hitter who tends toward hitting line drives and to all fields. Coco said it well, that hitting at the coliseum “plays true”. It’s no more or less a “pitchers’ park” than other west coast MLB parks. A true power hitter has no trouble hitting it out of the coliseum, as we most recently saw with Willingham.
OaklandSi - January 18, 2012
It is all relative... haha
dwishinsky - January 18, 2012
Good article
A few good things and one mistake. But I like that you understand the concept of “SSS” without dismissing stats instead of normalizing it using standard deviation.
The one correction in the article is regression to the mean. Regression to the mean is not applicable here as the mean might not apply to him. He may be a bad hitter vs LHP, in which case he won’t “regress” to the mean. If his true ability is higher, then he will “regress” to his mean, but that mean hasn’t been established, according to the stats at least. Players don’t necessarily regress to league average on stats that are within their control, they regress to their mean. There are some stats that would correlate higher to regressing to the mean (BABIP for example) but those are different.
closetasfan - January 18, 2012
OK, and how do you propose we determine "his mean"?
WaddellCanseco - January 18, 2012
This is the key
If you have reason to believe you know better than “the mean” then by all means use it, but…how do you “know”? I mean I’m left-handed and if I stood in against LHPs, after about 10 ABs (I’d be 0 for 10) you could reasonable regress me to a lower mean than average. But even bad MLB hitters aren’t me. So…tough to know when it’s appropriate to say “I know his mean, your honor, and it’s not the usual one!” and when you’d be better off just going with the usual mean.
Nico - January 18, 2012
That's exactly the point.
We don’t know his mean. So until we do, we have to assume that he has an average LH batter’s platoon split. Regressing to that mean is the correct thing to do.
danmerqury - January 18, 2012
Do we, though?
After “half a sufficient sample” (say, 1,000 PAs), which is a lot more than “no sample,” do we not have enough data to say “OK he may not be as bad as he’s been so far, but his mean is probably lower than average”? I don’t know; I’m really just asking the question.
Nico - January 18, 2012
That's exactly what regression to the mean is meant for.
It turns out that OBP becomes halfway reliable at 500 PAs. So let’s use that. If a guy has 500 PAs and his OBP is .290, we’d regress to the mean (.320) exactly halfway. So we’d use his 500 PAs of .290 OBP and add in 500 more PAs of league average .320 OBP to get .305. An estimate .305 is the most probable estimate of where his true “mean” OBP is. The more we know about a guy, the less we regress, and vice versa.
danmerqury - January 18, 2012
all right, this is hard to explain
but if we’re trying to argue that his splits aren’t bad, then you can’t regress to the mean before you compare and say he’s likely going to have so and so stats if you normalize so that’s Ok, when you have stats indicating otherwise. You can say there’s a so and so possibility that its not so bad based on standard deviations away from the mean based on the sample size. Saying he’ll regress to the mean on platoon splits I think is a stretch for this argument so far though. And in some senses we’re talking about different arguments anyways.
But in general, you can normalize lucky and unlucky to the mean, but you can’t normalize good or bad to the mean. In the argument above, he can’t adjust WOBA up vs LH and down vs RH. Though, if you say his WOBA is xxx, then he’s % likely to have something close to those WOBA is OK, but you have evidence that his split is not normal.
Anyways I hope he can be a source of power, vs RH and LH. Sign.
closetasfan - January 18, 2012
Sorry, but you're wrong here.
I’m not trying to argue anything at all. I’m just putting his splits in the proper context. I have no agenda. I’m just showing the noise inherent in these measurements.
And yes, you can normalize good, bad, lucky, or unlucky to the mean. You say in your original post that you can regress BABIP to the mean, but not wOBA, and that’s false. There’s no fundamental difference between BABIP and wOBA. Now, the amount of regression needed is wildly different between the two stats, since one is mostly driven by luck and the other skill, but the same process applies completely equally between the two.
Yes, we have evidence that his split is not normal. That’s why even after the proper amount of regression, his splits are still significantly greater than the average LH platoon split. Regression just shows the most probable number, based on two things: his current splits, and the average LH platoon split for a major league baseball player. Since we can’t trust the former completely, we have to use the latter as a crutch until the former can be trusted.
danmerqury - January 18, 2012
Here's what I wonder (and I don't know, I just wonder)
Let’s say you have two hitters like Seth Smith, whose numbers vs. LHPs are both significantly below the mean. Let’s say each has an identical career line of .200/.250/.350 in 1,000 PAs and so if you just saw them on paper you would regress each halfway to the mean.
But let’s say one of them fails the “eyeball” or “evaluate why” test; perhaps you see they can’t recognize the slider time after time and they just can’t seem to make any adjustment even if they “know” what the problem is. Is it fair, in that situation, to regress that guy less than the other guy?
Nico - January 18, 2012
Oh, and here's another question for DM
(and I don’t have any idea, or supposition, about the answer to this one): Is it non-linear in that the farther someone’s stats deviate from the mean, the more likely it won’t regress as far to the mean? In other words, if Smith’s numbers against LHPs were really really bad (.140/.200/.220) would that suggest “No he really is bad” more, proportionally, than if he were a bit below the mean, say .230/.310/.360)?
Seems intuitive to me that if a guy is “a bit below the mean,” the chance of him regressing significantly towards the mean are really good, whereas if a guy is way below the mean, once the sample gets decent you’re more likely to be looking at someone who “just can’t”.
Nico - January 18, 2012
talking academically about stats
.140/.200/.220 is statistically more significant. The standard deviation would probably be higher too because its so far from the mean, but there would be less probability that he is closer to average.
Baseball uses statistics in an unacademic way. SSS can’t be thrown out in academic stats. The standard deviation is just higher and may or may not be statistically significant. Also, regressing to the mean is a bit of a statistical anomaly too. They should perform to their mean going forward, but might not gather to their mean. Though I guess I’m being an academic nitpicker. Although baseball statheads get it mostly right, there could be a bit more academic rigor towards their analysis.
closetasfan - January 18, 2012
Well, wait a sec.
There’s a difference between how sabermetricians use numbers and how they try to relate these concepts to a wider range of people. No, technically speaking, you can’t throw out any number. But it’s far easier to tell people to throw out 10 PA samples than to tell them “hey, for a 10 PA sample of .500 OBP, you need to grab a calculator and regress it 99.942% towards the mean”.
danmerqury - January 18, 2012
Yeah, it’s not fun. I have to do it a ton in April when some guy starts off 10-for-20 with 3 homers and people get upset that it slightly moves the projection needle.
D.Szymborski - January 19, 2012
If we're talking the same amount of PAs?
Then yes, it’s linear. Obviously, someone who bats .140 has a smaller chance of being good as someone who bats .230 in the same amount of PAs. But yes, it’s linear. You’d regress each proportionally based on how far they are from the mean.
danmerqury - January 18, 2012
I mean should you "regress each halfway from where they are to the mean,"
or do you start relying more on the “suckitude” of something, and project it to regress less towards the mean, as it gets farther from the norm?
Someone who bats .230 is in the heart of that bell curve, whereas someone who bats .140 is in the “more than an SD” part. And since bell curves aren’t linear, I wonder if “expected regression” might not be either.
Nico - January 18, 2012
You're more likely to regress to the mean
if you’re further from it than if you’re near to it.
But I’m using the term in the normal statistical sense, which is rather different from how it is generally used in sabermetric progressions. (And also different from the way it is erroneously used in some discussions, ie, that when a sample gets larger the result becomes more accurate. That is true, but it’s not regression.)
iglew - January 18, 2012
That's true
When your batting average is way low, a couple hits will jump it a lot whereas if it’s “average” then a couple hits won’t do much to change it.
What I’m referring to is the idea that perhaps “outliers” are more likely to be “truer results” in the way that pitchers hitting is not a “bad luck with BABIP!” phenomenon. They really just suck at hitting as a group.
Nico - January 18, 2012
That's something entirely different.
Take scouting and stats into evaluations, sure, but purely from a statistical point of view, mixing the two sounds like a bad idea. Take both separately as their own evaluation of a player. The regressed stats say X, scouts say Y. Don’t use Y to make X.
danmerqury - January 18, 2012
I'm thinking more using scouts to say,
“That’s regression…However, that isn’t regression — that’s suck!”
Nico - January 18, 2012
Very Nice Work Dan
Something to keep in mind is significance testing. One approach is to compare the difference of two means with the largest standard error of the two measurements. The standard error is defined as the standard deviation of the mean divided by the square root of the number of tests. If the difference between the two means is larger than the largest standard error then the difference between the two measurements is considered to be at least somewhat significant. At some point a guest is made as to how significant the difference actually is. Unfortunately, that is not a simple estimate to make.
Ran - January 18, 2012
Oops, I missed this reply when I made my own below, so I’m totally wrong about the sabermetric proponent part.
On the merits, though: I’m not sure that, if I were a GM, I’d want just X and Y. I would want stats unbiased by scouting. I’d also love scouting unbiased by stats (impossible as that might be to achieve — could I order my scouting department to never look at a number?). But I think I would also strive, not instead of but in addition, for a system that actually combines those, even if that system involves subjectivity.
Maybe it’s as simple as using the scouting to pick the spot on the distribution of projected outcomes for a given player, or maybe it’s more complicated than that, but I tend to think the ultimate value is by literally blending the two ways of viewing a player.
doctawojo - January 18, 2012
I don’t think any sabermetric proponent worth their salt would object to this, and it is likely a model of what teams already do, in more or less systematic or sophisticated ways. (Though if there is any sabermetric proponent who would like to contradict me on the first point, do speak up.)
doctawojo - January 18, 2012
I love this quote
I love the reality check from Billy.
Tyler Bleszinski - January 18, 2012
So, who's it going to be?
I know he’s not done offensively.
baseballgirl - January 18, 2012
JD Drew maybe?
Geronimo Berroa - January 18, 2012
I'm thinking it will a righty
Given the outfield right now.
Tyler Bleszinski - January 18, 2012
COJACK
BWH - January 18, 2012
Oh, I hope not...
He makes me have Valley Fever.
the_rozeboom - January 18, 2012
Digging deeper into the quote
“I don’t think” – Well, that’s not promising. Some of us have suspected this regarding the Smith trade. Good to here an executive being honest.
“We had a pretty good Hill” – Billy is lamenting the loss of Jonah Hill, who was picked up by the the Dragons of How To Train Your in the Nippon Japanese league.
“to climb offensively” – Sign that we’re bringing free agent Milton Bradley back to Oakland? maybe as a pinch taunter?
“At least We made some progress” – Don’t know this We fellow, maybe a Korean prospect? Good to here that at least one minor leaguer is making positive strides.
“We still have a couple Weeks left” – Did I miss us signing another of the numerous Weeks brothers? Good to here Beane acknowledge that we have exactly zero players not named Weeks who can actually hit.
“We’ll see where IT goes” – Excellent. Sounds like we’re still striving to be on the forefront of baseball technology. Maybe this is a hint a HitFX system is coming to the O.Co?
Yep, lot’s of reality checks in there. Thanks for posting.
Ciderbeck - January 19, 2012
This is awesome!
And I still love my boyfriend, Seth Smith. And I wonder if maybe Billy puts the same stock in his splits; he has definitely said that Smith will not platoon. People who have watched Seth Smith for fantasy reasons ;-) have said that they think he has received a bum playing deal, and are curious to see what he could do as a regular player, against all pitchers.
I am too. I wonder if we’ll get to find out.
baseballgirl - January 18, 2012
I'm glad you're still writing, DanBot
Rec’d.
darooster - January 18, 2012
agreed
your articles are definitely appreciated
rhymeswithelephant - January 18, 2012
thirded.
67MARQUEZ - January 18, 2012
Fourthed.
And fifthed by Cindi, though she always says, “I don’t get it.”
Nico - January 18, 2012
A fifth.
Imbibed.
LoneStranger - January 18, 2012
Excellent piece, particularly the opening gambit about sample size.
It illuminates the problem of looking at home/road and lefty/righty splits in isolation and forming conclusive opinions on a players capabilities! Sweet!
TheQuestforMerlin - January 18, 2012
Nice explanation, Dan, but is a .342 wOBA really good enough for an average defending LF in Colorado?
Don’t we need to park adjust to move the wOBA from Colorado to Oakland? Isn’t that roughly a league average hitter once you park-adjust? Isn’t a league average hitter and league average LF something like a 1.5-2.0 WAR player?
Not only am I not seeing how that’s worth two probably crappy but potentially adequate starting pitchers, but I’m also not seeing how that’s worth much in trade.
Of course we can’t undo the trade, but now that we’re stuck with him, would it not be better to platoon him, not only to win more games, but also to boost his trade value?
WaddellCanseco - January 18, 2012
Platooning would decrease his value, because platoon players have limited value.
The only way to increase Smith’s value is to play him every day, and if he shows that he can hit lefties, then his value goes up. Otherwise, he’s worth two midling AAAA pitchers, one who’s smoke and mirrors, the other a reclamation project.
But that’s only if you are looking at Smith in terms of a flip, or deadline trade value.
I genuinely believe that Beane is primarily looking at Smith as fulfilling an immediate need – offense, and an mlb caliber COF.
jeff-athletic - January 18, 2012
If you genuinely want to win games, why not platoon him with someone who can do better
than 30% worse than league average (see Jonathan Herrera, 2009-2011)?
WaddellCanseco - January 18, 2012
It's still a bump up from league average.
Something like a wRC+ of 105 or 106 or something. That said, yeah, not great for a LFer.
danmerqury - January 18, 2012
That would have been true in 2011, but his PA included 2009-2010 as well. For example
Ianetta managed a 99 wRC+ with a .339 wOBA from 2009-2011 and Dexter Fowler a 101 wRC+ with a .342 wOBA from 2009-2011. That’s where I got the “approximately league average”.
WaddellCanseco - January 18, 2012
Hmm.
Ah, I always forget about the declining offense in recent years. You’re probably right, then.
danmerqury - January 18, 2012
to your point smith was anointed the most average mlb of by fangraphs
Billy Frijoles - January 18, 2012
The more and more I've been looking at it, the more I'm liking the trade
Smith has great OBP, and very good power potential, and article shows that his lefty/righty splits are not a big concern. Plus, Coors is not the softball power park it used to be pre humidor. Balls do carry more due to the thin air, but Coors has huge dimensions. And Smith seems to have the kind of size/strength to be able to hit out in any park. If Smith is an every day, non-platoon, player, he could very well be a 20hr guy, with a decent BA, and high OBP. Plus he’s known as a clutch hitter.
And while I find both Moscoso and Outman to be decent and useful, neither one are great. Both are fly ball pitchers, both have “decent” stuff. And the A’s have plenty of arms. Plus this trade shows that Beane is pretty confident with Peacock, Parker, Milone, and even Gray.
jeff-athletic - January 18, 2012
My take on the outfield
Beane is going with a platoon situation. Against righties it will be Coco, Smith, and Reddickl vs Lefties you will see Reddickl and Smith sit. Cowgill will play against lefties. So Taylor, Carter, and Allen need to step and and hit lefties to make the team. I have a feeling Gomes is next on Beane’s list. He is a Lefty killer and falls into a platoon sysytem. this will leave 6 outfielders playing for ABs. The rest of the team will round out with weeks, barton/Carter/Allen, pennington, Sizemore, Suzuki, Rosales, and a back up catcher. Would leave the A’s with 12 regular players. Carter most likely DH.
Arcman - January 18, 2012
I think this could be the year Cardenas replaces Rosales
and that CoJack is a more likely target than Gomes.
Vs. RHP: Smith-Crisp-Reddick
Vs. LHP: Jackson-Crisp-Cowgill
Barton, Weeks, Pennington, Sizemore, Suzuki in the infield.
BWH - January 18, 2012
Always forget about cardenas
Not sure if he is good enough to play SS but could be a option in vcase of injury.
Arcman - January 18, 2012
I think if he were good enough to play SS, he'd be in the majors already
Nick - January 18, 2012
I'm sure he could play there in a pinch
I mean it’s not like Rosales is a great defender at SS. Plus Cardenas has better minor league numbers. It’s time for him.
BWH - January 18, 2012
Great work as always Dan
I do feel better about the right/left splits. I still don’t feel good about him hitting in the Coli.
I’ve heard he’s more of a Gap type hitter. Do you know if his splits home/road splits look better then the average Colorado player, worse, or average. Or even too same too tell?
OnlybuyBeaneJerseys - January 18, 2012
Astros signed their right fielder
Jack Customer. No way Oakland is getting the top draft pick.
gambler - January 18, 2012 via mobile
He just can't get out of the AL West.
Nico - January 18, 2012
So, Jack Cust and Carlos Lee manning the outfield corners?
Is that the yes triples defense?
OkayJay81 - January 18, 2012
Brett Wallace will be in CF.
WaddellCanseco - January 18, 2012
I like that concept!
Nico - January 18, 2012
Jack Cust
Stupid smart phone
gambler - January 18, 2012 via mobile
It was funnier the other way
WaddellCanseco - January 18, 2012
I like...
the way the OF is shaping out, but still have deep concerns about the long term plan for Carter and Taylor.
Are these signings and trades Beane’s way of saying he’s given up on them?
If so, then dang those are some serious losses to cut, considering what we gave up to get Carter and Taylor…
DoomandGloom - January 18, 2012
Carter was basically free. Just Anderson and Gonzalez was plenty for Haren, who in
turn was plenty for Mulder. Taylor cost basically Gonzalez minus a shot at competing in 2009, the latter of which turned out to be almost worthless, so that’s massive.
WaddellCanseco - January 18, 2012
Don't forget that Taylor cost Wallace, as well,
who is also kind of massive.
Nick - January 18, 2012
Indeed
WaddellCanseco - January 18, 2012
Regardless
You like to see the players that were a hyped as a commodity in a trade come to fruition. With the uncertainty of our future, I wanted to think Carter and Taylor would amount to something… I guess I just feel like a little kid who didn’t get what he wanted for Christmas…except that’s every damn Christmas when you root for the A’s!
DoomandGloom - January 18, 2012
carter is the one I have the most faith in
There are still a number of roster spots up for grabs, I hope he earns one
Billy Frijoles - January 18, 2012
Funny I was trying to figure out
where the joke was in Jack Customer. I thought I was missing something. lulz
Tyler Bleszinski - January 18, 2012
I know, right?
I thought it was quite the clever nickname.
EddieVegas_NRAF - January 18, 2012
Yeah, I thought it was a new one he picked up in Seattle...
the_rozeboom - January 18, 2012
Quirin!
So you found another one to go along with Jake and Emil, huh?
iglew - January 18, 2012
Rangers sign Yu Darvish
Details to come, 5 year deal w/ 6th year player option on top of the $50MM posting fee.
Apparently they’re out on Fielder due to the signing according to MLBTR
I suppose that’s good for us – Rangers have one less bat than the Angels.
ru155 - January 18, 2012
for that price i would rather have prince
Billy Frijoles - January 18, 2012
Seth Smith
Where does he bat in the lineup? And what does our lineup look like now?
2B Weeks
CF Coco
LF Cowgill
RF Smith
1B Allen/Barton
C Suzuki
DH Carter/Kila
3B Sizemore
SS Pennington
DoomandGloom - January 18, 2012
3B
Does anyone think Adrian Cardenas is worth a look there? Is there anyone else besides Sizemore and Rosales that are going to get a look at 3B during ST?
Stephen Parker?
DoomandGloom - January 18, 2012
Did Cardenas even play 3B last year?
Wasn’t he pretty exclusively in LF or DH? I’m pretty sure the 5 games or whatever Cardenas may have played at 3B is pretty indicative that management feels that 3B is not his spot on the field.
Also, Sizemore was like the 2nd best hitter on the team last year. No reason he shouldn’t be at 3B in 2012.
sc00by - January 18, 2012
Exactly.
Billy Frijoles - January 18, 2012
Reddick?
Menechino_Incarnate - January 18, 2012
Yes, I think Cowgill has to go to the bench or minors, which makes this all the more puzzling
WaddellCanseco - January 18, 2012
Excellent article...
I learned something new, which is awesome.
RickeySteals - January 18, 2012
Fielder
so now we sign Fielder and then ….
Ah, it’s fun to dream.
Stew's Crew - January 18, 2012
Excellent stuff about sample sizes and the dangers of looking at splits
Is there anywhere to see what Smith’s platoon splits looked like in the minor leagues? It seems like he would have a lot more AB’s against lefthanders there, which could shed some more light on any potential weaknesses.
OkayJay81 - January 18, 2012
Excellent point
Course theoretically the competition in the minors is much weaker but still it could give us some insight.
Tyler Bleszinski - January 18, 2012
http://mlsplits.drivelinebaseball.com/mlsplits/playerinfo/452234
They’re not good. His OBPs were generally fine, but his power against lefties was pitiful all the way up the chain.
Clay Davenport also provides them:
http://www.claydavenport.com/ht/SMITH19820930A.shtml
doctawojo - January 18, 2012
I was going to note this as a more general point
I notice that with most players (e.g., Cust, DeJesus), the big dropoff in the LH/LHP matchups seems to be in slugging. Oftentimes, a player might maintain an ok OBP but see a big drop in slugging.
Nico - January 18, 2012
isn't that just because of what the stat is measuring?
OBP measures a hitters pitch selection – what to hit.
BA and SLG measure how the hitter hits it.
A hitters OBP split shouldn’t vary as much as BA and SLG because the hitter can pick out the pitch type independent of what hand the pitcher is throwing from. But applying the mechanics of the hit will require different skills for which hand the pitcher is throwing from, such as how fast a hitter can turn on an inside pitch curving away vs. curving in (which would partially be a function of what hand the ball was thrown from).
rollierollieOxenfree - January 19, 2012 via iPhone app
I'm not sure this is always true:
Case in point: Crosby and Kouz’s inability to pick out the slider was a “RHP” phenomenon. The slider is a whole different pitch when you reverse everything.
Nico - January 19, 2012
This is amusing to me, since the slider is actually the only pitch that moves generally in the same way between RHP and LHP.
Obviously, to the batter, sliders look wildly different from opposite sides, and I agree with your point. It’s just kind of amusing. Sliders generally don’t spin much.
danmerqury - January 19, 2012
Sorry, that's what I mean:
That they look wildly different to a hitter when received from opposite sides, not that they are different pitches.
Nico - January 19, 2012
No, I know.
And I totally agree. It’s just kinda funny to me.
danmerqury - January 19, 2012
Incorporating minor-league splits as a way to tackle small sample sizes
seems to be a better solution than regressing to league-average, especially when all the evidence available indicates said batter’s true talent™ vs. same-handed pitchers is NOT league average.
Qwerty75 - January 18, 2012
The thing I want to know is how splits typically change over time/level. If we don’t know that, then we don’t really know how to incorporate the minor-league data. I’m not sure if treating a split from five years ago at AA (or whatever) the same way as a split from five years ago at the major-league level is appropriate. I don’t know that it’s not, either, but I’d want to know before I started incorporating that data.
doctawojo - January 19, 2012
Yes, minor-league data wouldn't be handled the same way as major-league data,
but I would think it would be useful in compensating for limited major-league data. As you said, putting minor-league data in its proper context within the population of batters is important and is needed before it can be used alongside major-league data. I’m not a numbers guy, but if the data exists for a broad enough set of minor-league batters, I’d think the analysis could be done.
Qwerty75 - January 19, 2012
So long, and thanks for all the fish.
jeepers - January 18, 2012
Some thoughts on the regression methods
First, could you post the league average LH batter wOBA splits? I can’t find a good database to get at the particular nugget of information.
I’m surprised how big a tumble Smith’s vs. RH wOBA took (from 377 to 359) given that there’s a fairly large sample size already. What are the number of PA’s that were needed to fill in the, what was that eloquent term, oh yes, “fuzzy spots” to get to your desired confidence level?
The other really useful piece of data here would be to look at the league’s standard deviation for wOBA. I’m trying to get a sense of just how bad Smith is compared to the rest of the league. The large upward regression certainly shows that Smith’s an outlier. To get a better sense of that I looked at 2011 wOBA; a mark 0 .262 vs. LHP would be the 12th worst in MLB. Here’s hoping he beats those 12 guys! C’mon math!
Loved this post. SSS strikes again.
Fun fact: 13th worst wOBA vs. LHP in 2011, Coco Crisp (well below his career norm).
Ciderbeck - January 18, 2012
Pardon for going against the grain of comments,
but I’m not sure what this analysis tells us that adds to what you would judge by looking at the raw splits. So we can ballpark Smith around a .292 wOBA vs. LHP (if we assume he’s actually league-average against lefties and not as bad as the small-ish sample indicates) instead of .262. Does that led any more weight to his case for being an everyday player? You really want to see him up against lefties instead of Cowgill, Taylor, or your minor league contract signee to be named (dare I say Conor Jackson?)? Sure, he may not be as bad as he has looked against LHP so far, but it’s likely that he sucks the big one against them, and I’m not sure the due diligence of running the numbers changes anything.
Note that I’m not taking issue with how the analysis was done or its premises, just the ultimate value of the data provided. From the warnings in the front-paged section, you would think that the ultimate conclusions of the data-crunching would tell us something that we wouldn’t have known in its absence.
Qwerty75 - January 18, 2012
What does this analysis add?
Really, not a whole lot. But the purpose of this article is pretty clearly stated at the beginning.
This is really just a nice reminder, or even an educational piece for those who may not be aware of sample size problems.
sc00by - January 19, 2012
For what it’s worth, regression to the mean doesn’t assume that Smith is actually league-average against lefties. If that were the assumption, then we wouldn’t incorporate any of his past performance data at all — we’d just project him for a league-average wOBA vs. LHP and be done with it.
(I agree with sc00by as to the answer to your main question, though. I don’t take Dan’s point to be that Seth Smith is going to be awesome, or even playable against lefties. It’s just an illustration of how we shouldn’t take splits data in relatively small samples at face value.)
doctawojo - January 19, 2012
Thanks for the clarification.
The regression to league-average LH batters vs. LHP is done because that is what is being measured by the split. Makes sense now.
Still not any more confident about Smith supposedly playing every day, though.
Qwerty75 - January 19, 2012
Is John Q. Baseballer going to be added tp the 40 man roster?
BalcoBomber - January 18, 2012
This is great
Laid out very simply and precisely. Thank you.
On a tangential topic, I just want to share one thing that drives me nuts: when a writer citing stats says something like, “He plays at Coors but he hit better on the road this year, so the park clearly isn’t helping him.” Completely incorrect. The park certainly is still helping, but the sample size is small enough that you can’t tell from the splits. Take the exact same performance (in terms of batted balls) and put it in a different park, and he would have hit even worse.
I’d be willing to bet that if you take any player with a reverse home/road split involving Coors and regress his performance properly, he will cease to have a reverse split.
laserbeams - January 19, 2012
You must Login with your SB Nation account and be a member of Athletics Nation to post a comment.