In last week’s post, I discussed the concept of what statistical inference actually tells you and how it’s boring and cumbersome to talk about it accurately, so analysts often shorten the conversation so they can actually talk with real people about something interesting. Today we take a slightly different tack regarding what exactly we know. Our example for this week is Minnesota Vikings quarterback Christian Ponder.
If you’ve been reading for a while, you know that I was actually a fan of Ponder for a long time. Or, at the very least, I didn’t hate him with every fiber of my being like every other Vikings fan seemed to. I called him “not the problem in Minnesota” instead pointing to the largely ineffective receiving corps. I was talking to my neighbor before the season started. I said that Ponder is not the problem. We had this long and somewhat loud conversation about how I have to be wrong about him because everyone was giving up on Christian Ponder. Even Paul Allen – the radio play-by-play announcer for the Vikings – a guy who has never in his life given up on anyone in a purple jersey had given up on Christian Ponder. When I persisted that Ponder wasn’t the problem, my neighbor ended the conversation by saying, “You’re the only guy I know saying nice things about Ponder. Is he your cousin or something?” At the time the comment made me laugh. Then the Thursday night game against the Packers happened. I had to think more about this and examine what I know and what I don’t know about Christian Ponder in particular and the game of football in general.
So why was I so adamant that Ponder wasn’t the problem? Because, for all his faults, Ponder has one singular but important ability. He is rather accurate for an NFL quarterback. He’s not super-star Peyton Manning accurate, but he can get a football into a receiver’s hands slightly better than the average NFL quarterback. And why do I care so much about accuracy and nothing else? Because it’s the only quarterback ability I’ve found at the NFL level that will predict useful outcomes. Nothing else comes back predictive. Not a quantification of arm-strength, not Wonderlich scores, nothing at the combine, nothing but accuracy predicts NFL level outcomes.
And now we have another trap that analysts can fall into, a trap that is particularly present and meaningful for the NFL. I can’t find a predictive effect of my in-house metric that I think measures arm strength (let’s ignore the measurement point of “how do we know this thing is really arm strength” for now. It’s important but not where we’re going here). So I don’t find this effect. There are a couple possibilities why. The first possibility is the one that brings the page views and the loud conversations – that Arm Strength isn’t an important thing. However, another interpretation is that the lack of data at the NFL level makes finding the effect of arm strength insanely difficult.
Think about it like this. Imagine I told you that there was gold to be found in the body of water closest to you. To me that body of water is a river, so for the rest of this example I’ll be talking about a river. But maybe for you it’s a lake or an ocean or your friend’s bathtub. Whatever. You want to find this gold because you think having gold would be better than not having gold. So you go out and buy all the equipment necessary to pan for gold. You get the sorter pieces and the dirt sucker and everything else and you go stand in the river for a few hours and try to find this gold. Now, if you stood in the same spot panning for gold for four hours and didn’t find gold, would it be reasonable for anyone to assume that I’m wrong and that there is no gold in the river?
No, it would be ridiculous to say that. Maybe you were panning in the wrong spot. Maybe the screen you were using was too big and all the gold was little and slipping through. There could be many reasons why you didn’t find gold in the river.
Analytical findings are like gold. Just because you don’t find one, doesn’t mean that they aren’t there. This is a concept called “statistical power” and in the NFL it’s a huge problem. Our ability to find effects generally increases the more data we have. Think of it like this – more data makes our gold panning screens smaller. It allows us to find ever smaller nuggets of gold. In the NFL, the data is very sparse. There are only 32 teams playing 16 games each with maybe 30 passing attempts in each game. This pales in comparison to basketball’s 82 games and baseball’s 162. Compared to other sports, an effect in the NFL has to be fairly large before our screens will catch it. There is so little data coming from the NFL that it’s possible an arm-strength effect exists but there just isn’t enough data to find it.
So, after the Thursday night Ponder debacle, I went on a quest for more power. And in football, if you want more statistical power you need to look at the college level. With many many more teams we suddenly have a lot more power in our data set. I spent most of my summer calculating the same arm-strength metric for every NCAA FBS level quarterback and I ran the same model to see if arm-strength, along with accuracy, can predict useful quarterback outcomes. Low and behold, it does (said the amazed analyst and no one else). Ponder fairs very well on accuracy, but he suffers horribly on arm-strength. With this lesson learned, it’s time to quit dying trying to take the Ponder hill. Ponder is a problem for the Vikings offense. One of many, many problems.