Using ICCs to Calculate the Effect of the Quarterback

Categories: NCAA FBS, Statistics
Comments: No Comments
Published on: April 8, 2015

Last time I discussed the importance of correctly modeling the game you are interested if you want to address the problem of data analysis in football. If you are new to the blog, I would suggest reading that post before reading this one. It will give you a good overview of how the analytics are built around here.

For everyone else, a quick refresher. We’re assuming that the correct model of a football offense is shown below.

BasicOffenseModelYards gained on the field begin as a play called by the offensive play caller. They then filter down to the quarterback execution which then filters down to the wide receiver execution. Obviously there are important aspects of the offense we don’t model here, most notably the offensive line and any interactions between the stated roles of the model, but those bridges are a substantial distance down the road.

Now that we have our model, we need two things 1) a question to answer and 2) an analytical tool that can take our question, address the realities of the model, and give us back a number for us to interpret.

The Question

I want to answer one of Bill Connelly’s 45 Reasons to Care about College Football Analytics. These are a set of questions Bill created to drive interest in analyzing college football data. Specifically, I want to begin to address question #5, quantifying how important the quarterback is to the offense. Believe it or not, we can address this question with already existing tools. The first bit of technology we need is the Better Box Score I detailed last week. Here is an example of a Better Box Score.

BetterBoxScoreSo our question is How Important is the Quarterback to the offense. We have our basic model up top and the Better Box Score to help us. Now all we need is the tool. To answer this question we will use Intraclass Correlations (ICC). ICC’s are analytical tools designed to understand similarity across sub-sets of a group. Similarity among group members can be interpreted as the effect of a common factor higher up the hierarchy – in this case the quarterback.

For example, below you see a scatterplot of the four individuals a) played quarterback for Utah State in 2014 and b) targeted two different receivers at least five times over the course of the season. Utah State presents a nice example of this as they had so many quarterbacks and no single quarterback ran away with the team’s attempts. Receptions are on the X axis and Targets are on the Y. Each mark represents a different pass receiver targeted by that quarterback, but the same receiver could be marked on this graph multiple times if they were targeted by multiple quarterbacks. Remember that’s not a problem for us because our model says a pass exists as a connection between quarterback and receiver, not as individual performances. Notice the general patterns in the subsets.

 4-8-2015 - Scatterplot - No Circles

First, Craig Harrison is markedly different from the rest of the quarterbacks on this list. His points are concentrated in the lower right hand corner. The second thing that should be noted is how tightly clustered Darell Garretson’s completion percentage is whereas Kent Myers is more spread out. This may be easier to see if we draw ellipses around each sub-set.

 4-8-2015 - Scatterplot - With Circles

See how Darell Garretson’s ellipse is much more squished compared to Kent Myers’s? That means the receivers targeted by Darell Garretson are more similar to one another than they are with Kent Myers. Likely, this indicates an effect of Darell Garretson getting more consistent performances out of the pass receivers (note that more consistent does not necessarily mean better. One could be consistently throwing balls into the dirt on every pass play and still be consistent).

But we want more than just looking at graphs and guessing if they mean anything. We want to quantify if those circles are meaningfully different from one another. The ICC is a good tool to use here as it returns both a null hypothesis significance test and an effect size of what percentage of variance in a statistic is attributable to a particular “focal person,” in this case the quarterback.

I will be using the terms focal person and partner repeatedly throughout the explanation, so let’s define those terms. A “focal person” is any entity on the higher end of a two-level hierarchy and the “partner” is on the lower level of the hierarchy. So in the hierarchy of our model, the quarterback would be the “focal person” and the receivers would be the “partners.” Note that, in this case, higher up on the hierarchy does not mean “better.” It just means that, when we model the actual game, multiple wide receivers are paired with a single quarterback.

The formula I will use for the ICC is a bit different than what you might find in other sources. Psychologists, most notably David Kenny, have evolved the formula of the ICC so it better matches the questions we care about. The formula we use can be interpreted as an assessment the similarity of the members of a sub-group. It will tell us whether or not receivers targeted by a particular quarterback are more similar to one another than they are to other random points in our data set. Therefore, using the following formula we can assess the percentage of variance explained by having the ball thrown to you by a particular quarterback.

Formula - ICC

  Where k’ = either the number of partners if group size if fixed or, if group size is variable as it is in our case

Formula - K prime

  To calculate our ICCs we first need to choose a dependent variable. I will focus on yards gained (rather than the completion percentage that I showed above). As a teaching example, let’s first calculate the ICC for our Utah State quarterbacks. Here are the data that I have that we’ll be using.

BetterBoxScore - Utah State - 2014

To calculate the ICC, we first run a univariate ANOVA on yards with Quarterback (the focal person) as the independent variable. This returns our Between-subjects and within-subjects variance. In this case those numbers are

Formula - MSbetween MSwithin for Utah StatePlugging those numbers into our formula above, along with calculating k’ gives us an ICC of

Formula - ICC for Utah State

This means that on Utah State during the 2014 season, 18.7% of the variance in passing yards gained can be attributed to the quarterback. Now let’s do this same thing for the entire league, but we have one final wrinkle to overcome, the fact that we have nested hierarchies – receivers within quarterbacks within teams.

To tease all this nonsense apart, we’re going to start at the very top of our hierarchy. Team will be our focal person and quarterback-receiver connections will be he partners. We need to enter more than one season’s worth of data into this analysis because we need to be sure that every team has at least two of the next level down in the hierarchy, in other words quarterbacks. Because of the NCAA’s eligibility rules, this means we need to have at least six seasons of data to guarantee this criteria is met for every single team. So we have data from 2009-2014 in the data set.


Calculating out the Mean Square (MS) between and MS within (a.k.a. between and within groups variance respectively) gives us the following.

Formula - ICC for teams

So 3.1% of the variance in yards can be attributed to the team. This would be anything that is common among all receivers and quarterbacks, so things like the offensive system, facilities, average offensive line ability, average relative defense strength played against, etc.


Now we run the same analysis on the same data but now we change the focal person from team to quarterbacks. Running this analysis gets us the following result.

Formula - ICC for teams and qbs

This result tells us that 6.4% of the variance is attributable to…what? Because it’s not directly true that this results explains everything about the quarterback only. Instead it says 6.4% of the variance is attributable to everything that is held in common among the partners, which would be quarterbacks but would also include, play callers, facilities, etc. So, we need to do a simple subtraction here to get a pure quarterback metric.

Formula - ICC for qbs

And there’s our answer. Quarterbacks in NCAA FBS football have 3.3% of the variance in passing yards attributed directly to them. I also find it very interesting that knowing who the quarterback on a team is will explain almost exactly as much of the variance in passing yards gained as knowing who the play caller is.

Building a Better Box Score

What is the most common line present in any analytics-based article/post/discussion of the game of football? My answer is some version of “Analytics in football are more difficult/impossible/can’t be done because of how interactive and interdependent the members of a football team are compared to other sports.” Sometimes that basic line is flavored differently, depending on the particular tone of the piece, but that line always seems to be there. It’s a seemingly necessary caveat of the genre. “Of course we can’t know everything about Quarterback X because so much of being a quarterback depends on [insert whatever point one want to make about interactivity on football teams].”

Why do we simply talk about that problem? Why do we constantly talk about the interdependence problem but never fix it? Can it actually be fixed and what would such a solution look like?

The Football Box score – A Model by Ralph Wiggum

To start, I’d like to spend some time developing the idea of why dealing with the interdependence problem is so difficult in football. After all, other industries deal with a similar problem. The entire job of management is interactive and interdependent by nature, we can still figure out who the good managers and the poor managers are. Why do we have such a difficult time in football? I argue it’s because of the modern football box sore, a ubiquitous, pervasive summary of the events on a football field that horribly misrepresents the realities of the game.

A good box score acts like a description of the events of a game, the same way that the model of the solar system you built in middle school acts as a basic description of how the solar system works, how it is set up, and how each component generally relates to the others. Your solar system model was also simplified. You didn’t put all of Jupiter’s 67 moons in your model. You probably didn’t put in the asteroid belt or the daily spin of each planet to an accurate degree (kudos to you if you did), and that’s okay. The purpose of any model is to get the major elements of the system correct while simplifying or eliminating the less important elements. And this is where the box score of a football game falls down dramatically. If a football box score were a model of the solar system, it would be a model created by Ralph Wiggum. This is true to some extent of the entire box score, but I’m going to focus mostly on the passing statistics element as it is the worst offender.

The passing yards box score for Florida State in their blowout loss to Oregon in the first round of the college football playoffs looks something like this.

Passing Yards

Jameis Winston, 29/45, 348 yards, 1 TD, 1 INT

Sean Maguire, 0/3, 0 yards, 0 TD, 0 INT

Receiving Yards

Travis Rudolph, 6 rec., 96 yards, 1 TD

Jesus Wilson, 5 rec., 72 yards, 0 TD

Karlos Williams, 5 rec., 59 yards, 0 TD

Rashad Greene, 6 rec., 69 yards, 0 TD

Dalvin Cook, 3 rec., 24 yards, 0 TD

Ermon Lane, 2 rec., 22 yards, 0 TD

Freddie Stevenson, 1 rec., 12 yards, 0 TD


There are lots of problems with how this data is presented, but let’s focus on two.

Problem 1 – Data Redundancy

The problem that upsets me the most about a football box score is needless redundancy. Every single yard gained by a forward pass is counted twice – once for the quarterback and once for the receiver. This is a problem, a big problem, because it means the data presented here do not reflect what actually happens in a football game. You don’t complete a forward pass and then immediately mark off the gained yardage again. However, in a football box score, because we call the events different things – completions vs. receptions and passing vs. receiving yards – suddenly it becomes okay to double count every single event in the passing game except attempts and interceptions. But they are the exact same event. Jameis Winston’s completion is Rashad Greene’s reception. One cannot happen without the other. Winston earns passing yards and Greene earns receiving yards on the exact same yards gained. We’re not modeling what actually happened in the game. We’re modeling a way to give credit in the most individualized way possible. However, as every single football analytics article will tell you, football is an interactive game. If the game is, in reality, interactive, why do we assign credit for the events in this individualized manner?

Problem #2 – Loss of Information

I think it’s rather ironic that a football box score has so much information redundancy, but it explicitly removes an important piece of information that would allow us to complete some very important analyses on team performance in football.

As an example of the loss of information, let’s look at a different example, this time from Oregon’s Week 1 win over the University of South Dakota. Early season games are useful for this example because it is very likely that the backup of the “high power” football program will spend a great deal of time in the game.

Passing Yards

Marcus Mariota, 14/20, 267 yards, 3 TDs, 0 INTs

Jeff Lockie, 11/12, 113 yards, 1 TD, 0 INTs

Receiving Yards

Byron Marshall, 8 rec., 138 yards, 2 TDs

Darren Carrington, 4 rec., 68 yards, 0 TDs

Dwayne Stanford, 1 rec., 62 yards, 1 TD

Pharaoh Brown, 2 rec., 32 yards, 1 TD

Johnny Mundt, 2 rec., 29 yards, 0 TDs

Keanon Lowe, 1 rec., 18 yards, 0 TDs

Royce Freeman, 1 rec., 11 yards, 0 TDs

Thomas Tyner, 3 rec., 8 yards, 0 TDs

Charles Nelson, 1 rec., 8 yards, 0 TDs

Devon Allen, 1 rec., 5 yards, 0 TDs

Johnathan Loyd, 1 rec., 1 yards, 0 TDs


Here we have two quarterbacks that completed a similar number of passes over the course of the game, but Marcus Mariota gained more than double the passing yards compared to Jeff Lockie. How was that accomplished? Which receivers gained all those yards for Marcus Mariota? Who caught those passes from Mariota to gain so many passing yards for him? Was it because Dwayne Stanford caught one long pass for a touchdown from Mariota? Or did Jeff Lockie complete that pass and only got 51 yards from the other 10 passes he completed? Who was the intended receiver on the six attempts that Mariota did not complete? How many times was any receiver an intended receiver, but did not complete the catch? Did Mariota target the same receiver over and over with no results? Or are his unsuccessful attempts scattered all over the place? We don’t know the answer to any of these questions from the box score description of this particular game. Answering these questions is of critical importance to a better understanding of the game of football.

Our first step to scientifically understanding football is to build a better box score. The question is, what would we want the box score to represent?

Begin with a Model

Before we can build a tool to aggregate data, we need to have a decent idea of what data we want and why we want it. We need to start with a theory of how a football offense works. This is our “model of the solar system” as it were. What things in a football offense are important and what things should we avoid for right now? Here is a figure of my current theory of a football offense.

Basic Model of a Football Passing Offense



This very basic theory says that a football offense begins with the offensive play caller. The play call made then filters down to whichever quarterback is currently in the game, and the ability of the quarterback then filters down to the receivers. Note that we could also include pass catching tight ends and running backs in this model.

Now, I’ve simplified the passing offense to a great extent in this model. Most notably, I’ve removed the impact of the offensive line here. That is by design, but not because I think that the offensive line is unimportant. Instead, I think the offensive line is like adding in the daily spin of the planets to your solar system model. Adding it in becomes incredibly complex and will probably take some specialized data that not is not publically available. For right now, we will keep the offensive line out of our model. Anything else that’s left out of this model we will consider as having such a limited effect that it won’t change our understanding to a large enough extent that we need to account for it.

A Better Box Score

Now that we have our model, we can adjust our box score so that it reflects the important elements inherent in our model. The most important element of the theory says that we must count passes as a single event and account for the coach that called the play, the quarterback that threw the pass, and the receiver that caught it. Such a thing isn’t terribly difficult to create. You can see an example of one below. In this example, I am allowing team to stand in as a proxy for the play caller.



Now all we need is a way to analyze data that fits our theoretical understanding and the data that our better box score is collecting.  Fortunately, such a way exists. Next week.

Are You My Mother?

Categories: General Info
Tags: No Tags
Comments: No Comments
Published on: March 25, 2015

One of my daughter’s favorite books is Are You My Mother? by P.D. Eastman. In the book, a baby bird hatches while its mother is away finding food. The bird decides to leave the nest in search of its mother, all along not being sure exactly who his mother is or what she looks like. He has several ideas, attempting to connect with a kitten, a dog, a cow, and an airplane until he is finally returned to his nest by a hydraulic digger (it’s a very cute story) just before the mother returns to the nest with a worm.

For a while, I’ve felt like my writing on this blog has been akin to that bird and its quest. I feel like I’m trying to connect with different styles of writing, but those styles simply are not me. I’ve tried being irreverent, I’ve tried touting my knowledge without discussing methods, but the words ring hollow in my ears. Last week gave me the opportunity to actually sit and think for a few minutes (but only a few) about what I value and what I really want this blog to be.

I thought about the times I felt valued and like I made some small contribution to a discussion on the internet. The last time that happened was the night everyone was talking about #TheDress. Half the people in my department are vision researchers, so I’ve been subjected to a lot of vision presentations in my time. I had just finished teaching the section on luminance perception in my Intro class and I had something to say. I didn’t say much, but people seemed to find value in it and it felt very good.

I read Bill Connelly’s article about the important, unanswered questions in college football. It’s a good piece designed to spark a search for answers. The interesting thing for me is that the answers to two of the first five questions are sitting on my hard drive right now. Understanding how to answer them isn’t particularly complicated or inventive. It just takes finding the right tools. Tools I know a great deal about due to 10 years of studying interacting, interdependent groups and teams. Tools most other people don’t have because hardly anybody spends 10 years studying interacting teams unless one already has tenure (not me) or gets phenomenally lucky in employment (there I am).

I also thought about times when I didn’t feel right about my whole enterprise of data analysis of football, that being the Patriots deflated ball controversy. That was a weird time to be interested in data analysis of football because someone that isn’t particularly interested in reproducible, publicly available analysis attempted to answer a question. That question wasn’t answered particularly well, and the whole thing really spiraled out of control. However, it did get me thinking of the notion of credibility and how truly important credibility is to me. The guy that answered the question initially isn’t particularly credible, but then I had to confront the question of how much outward facing credibility I have given that we both hide how we arrive at the conclusions we do.

I thought about my academic area as a whole, and how much of a credibility hit it’s taken lately. I’m a social psychologist. You might not be aware of this, but for the last 3-4 years, social psychology has taken something of a walk off the credibility cliff in academic circles. Reports of outright falsification of data[link], poor data management, and “massaging” of results have become too numerous to mention them all. Frankly, it’s been very depressing to be a social psychologist these last few years. I’ve survived by teaching, but I haven’t been able to get excited about research the way I used to. I have a dissertation that was nominated for (but didn’t win) an international award just sitting, unpublished, because I can’t force myself to get it out the door. I’d never really connected my publication struggles to credibility before, but upon reflection I think that may be a major cause, if not the primary one.

So here is where we’re at.

  • Scientific credibility is important to me, both personally and for the field of social psychology and the sub-specialization of interacting groups that I work in
  • I need to do research to keep my day job
  • I am struggling to do research because I have issues with the credibility of certain elements of my field
  • In my own internet space, I am not acting in a way that brings about scientific credibility either to myself or to my chosen field

This is a contradiction I cannot have, and the solution is to change the approach. I’m going to continue publishing here in an attempt to rebuild my own sense of credibility. Hopefully, this will rebuild my taste for research in my day job with the ultimate goal of gaining some job security for once in my life. Starting next week, I’m opening up the kitchen [link] to show everyone how I’m building everything I’ve built here. I’ll be discussing science as it relates to interactive, interdependent teams in the context of professional football. Hopefully, you’ll be able to apply some of what I say to contexts beyond football. I’m going to be spending a lot of time covering techniques and methods used to study interacting teams as they are reasonably uncommon methods of data analysis. I’ll be talking in a very scientific, academic tone with how I approach these concepts as that’s what I am.

This has been my realization. I am not a scout or a journalist. I am not a gambler or a tout, or an information broker. I am a scientist.  Come back next week and we’ll learn something together.

It’s Manziel Time

Comments: No Comments
Published on: December 9, 2014

It’s Manziel time in Cleveland. I’ve discussed what I think of Cleveland in other posts. The short story is that I think the decisions of the Cleveland Browns front office resemble those a robot with a combination of dissociative identity disorder and schizophrenia. Once again we get to examine an important decision in Cleveland: the benching of Brian Hoyer in favor of Johnny Manziel. First, the facts.

Fact #1: Brian Hoyer has not been good this year

Not been good is putting it very mildly. Worst in the NFL is more correct. No quarterback has been allowed to be as inaccurate – 9.4% below league average – as Hoyer and still throw so many passes – 387 attempts. His inaccuracy is on par with Zach Mettenberger, Mike Glennon, Drew Stanton, and Ryan Mallett, all quarterbacks that began the year as back-ups and are unlikely to find permanent starting jobs any time soon. All in all, I don’t think much of Hoyer’s individual performance this year.

Fact #2: It is December and the Cleveland Browns have a winning record

At the time of this writing, Cleveland has a 7-6 record in the winningest division in football. By my quick reckoning, this has happened four other times in the last twenty years. Why does it matter if Brian Hoyer has been the worst quarterback in the league up to this point? You have won seven games with the worst quarterback, something must be working. Cleveland’s pass defense is coming up big week after week, they’ve got a group of receivers that showed they can spar with the best of them, and their running game has shown signs of life throughout the season. You got to seven wins doing something right. Why are you throwing that away now?

Fact #3: We don’t know what Manziel will do over three games

No one has any idea what Johnny Manziel will do during the final three games of the season. I have a prediction of Manziel’s quarterback rating of 76.6, but that’s over four years not three games. What is the purpose of upsetting everything that the Browns have built over the course of the season on the scant hope that Manziel can help you out? No one has any algorithm that says Manziel will be any better or worse than Hoyer over these final three games. And it is very possible that Manziel could be worse.

Fact #4: Coaching tenure in Cleveland

Perhaps the number that actually explains what is going on here. The average tenure over the last three head coaches in Cleveland has been a glorious 1.67 years. Mike Pettine likely doesn’t want to lower that average even further than it already is. He knows that sticking with Hoyer will likely get him a .500 team. Perhaps he sees the writing on the wall and knows that .500 won’t be enough to save his job. I don’t know that is the case, but I wouldn’t put it past the Cleveland management.

 photo blitzwing.jpg

Analytics in Washington

Comments: No Comments
Published on: December 2, 2014

Tony Kornheiser made some interesting statements on the radio the other day regarding what the organization in Washington D.C. should do about their terrible football team. You can read a more in depth piece about those comments here, but the main thrust of the situation was that Washington should start looking at analytics to improve their player selection process.

On the surface, I agree with this position. I am a firm believer in using useful mathematics to improve decision making processes. I started this blog in an attempt to inform people that we can develop individual player analytics in football and predict something about team performance from them.

However, in my humble opinion, Washington will not be the place that the football analytic revolution begins. Mostly I think that because of the actions of the team’s owner, Daniel Snyder. Before I say what I’m about to say, I want you to know that I have no intimate knowledge of Daniel Snyder. I’ve never even met the man. But I can observe his behavior and his public facing behavior leads me to believe something very important about Daniel Snyder and how he may see the world.

It is my impression that Daniel Snyder loves certainty. His behavior seems to follow a profile of X will do Y, Y will do Z, we want Z so therefore let’s go get X. He made a tremendous amount of money entirely on this principle. He had a small company that wanted to do something slightly different, he would go out and acquire another company that did that specific thing, and incorporate it into the original company’s machinery. He’s made his entire living on being able to understand the needs of his organization and then trusting that the assets he spends a tremendous amount of money to acquire will become worth more than what he originally paid for them.

Don’t get me wrong, Snyder’s certainty has served him well in the contexts he was in when he made his billions. Certainty can be a good thing for business leaders, mostly because it allows them to remain leaders. Humans are really bad at differentiating confidence from competence. Projecting certainty creates an environment where people will follow you. So, in some contexts, having lots of certainty makes a lot of sense. The only problem is, football is not one of those contexts.

It’s like my colleague who had difficulty driving on ice. She was originally from California and went to college in Arizona, so she never had to learn about what driving on icy roads is like. In addition, the person who taught her how to drive was a stock car driver. She was taught that to make the most efficient turns, you steer into the bottom of the turn, accelerate quickly through the bottom of the turn, and then steer out on the high side of the corner. And that works well for getting through turns quickly and efficiently. In addition, living in California and Arizona means you never have to confront the fundamental assumption that driving in such a way rests on – traction. If there’s no traction – like say when you’re driving on ice – one must drive in a completely different, opposite way. Accelerating through the bottom of a turn is a really good way to spin your wheels and wipe out. Instead, you have to always ensure that changes in direction never coincide with changes in speed. You can do one, but not the other. In my mind this is what Snyder is doing. He is taking a method that has always worked in the past, applying it to a different context, not recognizing that the underlying reality is different, and wiping out on the ice that is the process of building an NFL team.

You can see this in the future draft capital he gave away to move up to the #2 pick in the 2012 draft to get RG3. Generally speaking, it is a really bad idea to give away future draft picks to move up.  But such a strategy does make sense in a particular light, the light of certainty. If you absolutely feel like you know that this one particular player is going to work out, then it makes every bit of sense to act as Snyder did in 2012. Unfortunately, in football, having such certainty is disconnected from reality. The dirty little secret about player evaluation in football is that nobody knows who’s going to be good or bad. There are too many things to take into account. The amount of error in prediction is so astounding that no human brain can comprehend it. The best mathematical model I can create was accounting for about 15% of what makes a good NFL quarterback at last check. The reality of the NFL says that the way to create a winning team is to stockpile draft picks, evaluate everyone as if they were all drafted in the same round, and repeatedly draft multiple players at the same position (i.e. at least try out a new quarterback every single year).

The mindset you need to build a football team analytically is a mindset of uncertainty. You must accept the general premise that no one knows anything about anyone, the best models will get you 15% of the way to where you need to be, and you need to put yourself in a position to make luck work for you. I do not believe an individual like Snyder – a self-made billionaire used to projecting certainty from a leadership position – would value these qualities. There’s already the story of the economist that Washington hired to do analytic research for them in 2006 who quit after seven weeks of being marginalized in the organization.

The analytic revolution in football is coming quietly. The teams that end up doing analytics very well are not going to make a bit splash about it. The Seahawks and Packers come to mind as teams that, I believe, are on the forefront of the football analytics movement but are not saying a public word about it. Washington is simply not the place where people will be quiet about a new idea. And, ultimately, talking a new analytics department or bringing in some fresh-faced savior with their fancy mathematical model while demanding mechanistic links between actions and outcome will result in utter failure of the analytics process. If Washington brought someone like me into the organization, I feel like Snyder would demand I hit the gas at the bottom of the turn.   And while I might not be certain about much in football I am certain about this. Either I’d have to jump from the car or we’d both wipe out together.

Wide Receivers are a Pain to Evaluate: Part I

My original plan for a post this week was to talk about college football wide receivers and who I think is having the most productive season in college football right now.  But I ran into a problem.  The problem is that wide receivers are a giant pain to evaluate.  In fact, it gets incredibly frustrating to evaluate and project the performance of wide receivers.

Dependent Variables

The first question is what to use as an evaluation metric.  Before you can begin to predict “performance” in any useful way, one has to settle on what “performance” means.  Do you want yards, touchdowns, yards per reception, yards per target, touchdowns per target, fantasy points, what?

The question of dependent variable is crucial to understanding every analysis that comes afterward.  All the conclusions drawn from analyses done will only be relevant to the dependent variable that one chooses.  Therefore, it is crucial that one choose the “right” dependent variable.  Sadly, there is little consensus regarding what the right variable is to use to evaluate wide receivers.  So you’re stuck having to simply pick one.  I pick a witches-brewed version of yards per target. It works for me, but it may not work for you.

Depth of Target

Once you’ve “solved” your dependent variable problem, you run headlong into another one.  Generally, whatever DV you choose is somehow correlated with depth of target.  Wide receivers that get thrown to farther down the field rack up more yards, generally get more touchdowns (that one is a bit tenuous, but I digress), have more yards per reception, and more yards per target.  So now you’re stuck with a problem of understanding how the wide receiver fits into the offensive system regarding average depth per target.  Does this receiver have low yards per target because they are not a particularly good receiver or because they are consistently being asked to be a “chain-mover” out of the slot?  This calculation is impossible in some circumstances and tricky even with witches-brewed data.

Small Effect Size

You know what actually predicts production at receiver? Targets. End. This is a graphic I made showing the relationship between targets and yards in college football.

Targets accounts for around 75-80% of the variance in yards, which means that there isn’t much variance left for differences in ability to do any work.  You could have a pretty decent receiver buried on a roster and they won’t look like much at all *cough cough Jarius Wright cough cough*  And the reverse is also true.  A relatively poor receiver could get a lot of targets and look like a golden god.

So, I wanted to post about wide receivers.  I ended up getting frustrated at the position and writing about my frustration.

Quarterback Carousel

First off, happy Veterans Day, Grandpas. I miss you both.

Now to football. The quarterback carousel continues to spin. Everyone wants to know about how their team will do now that the new quarterback is under center. I’m going to look at three teams this week, the Eagles, Cardinals, and Texans and discuss the probable futures of each team.


I am not worried one bit about the Eagles. I’m writing this after the Monday night where Sanchez went crazy, but don’t think I’m overreacting to one game here. Sanchez currently has a crazy high completion percentage for him. I’m fully expecting him to regress. Mark Sanchez has consistently been 4-6% below league average in terms of completion percentage depending on the year. We shouldn’t expect that he suddenly learned some profound bit of information about how to complete more passes. We should expect that he will return to his 4-6% below league average completion percentage over the course of the rest of the time he’s a starter in Philadelphia. But, you know who else was about 4-6% below league average completion percentage? Nick Foles. Honestly, I don’t see Sanchez being a detriment to the Eagles offense. I think Chip Kelly has a plan and that he is nothing if not adaptable. Philly will get through this leaning on their defense and their receivers.


The Cardinals are going to have more of a problem. The drop from Palmer to Stanton is going to be a much bigger drop than the drop from Foles to Sanchez. Stanton is consistently 3-4% poorer in completion percentage and he’s also not as effective throwing the ball down field. Palmer is generally league average in down field throws, but Stanton is more like bottom of the league in that category. Couple that with the fact that the Cardinals have really been punching above their weight up to this point in the season and you have a situation ripe for regression. The Cardinals should be very very worried.


The Houston Texans are the wild card. I did not see a quarterback switch coming for them. I’m not saying that Fitzpatrick is great. I know I predicted him to be a wildcard to lead the league in passing yards this year, but that prediction wasn’t as much based on him as it was everyone else around him. I believe I called him “serviceable” at the beginning of the year. And I still stand behind that assessment. He’s a little below average in completion percentage, but in an offense that’s more about throwing downfield than the average team we would expect that. Hopkins seems to be having a reasonable season and the team has already won twice as many games as it did last season.

Which is why I was very surprised to see that they’ll be going with Ryan Mallett for the foreseeable future. What exactly can Mallett offer you that Fitzpatrick can’t? Mallett has never started an NFL game and has thrown a total of 4 passes in an NFL game, one of which was intercepted. What can we even know about him?

Actually, we can know something about him since he played his college career recently enough to be part of my data set. I’ve even got a prediction about his career passer rating after four years in the banner up top (2011 draft class). The prediction in the banner is for a passer rating around 65. However, that prediction was based on a model I call “Mk. I” (Everyone seems to name their models and I’m an Iron Man fan). That model worked, but was based on Linear Regression and a data set that wasn’t as expertly cleaned as it could be.

Here’s what we learn about Ryan Mallett. I have a measure of college arm strength that helps differentiate quarterbacks. Mallett has the fourth highest score on that metric in a dataset that goes back to 2007. The three above him are Robert Griffin III, Andrew Luck, and Russell Wilson. However, arm strength is icing on the cake of an effective quarterback, but the cake itself. The cake of effective quarterbacking is accuracy and in that category, Mallett falls woefully short compared to the three other quarterbacks mentioned. When Mallett actually completes a pass, it goes for a long long ways. But he has tremendous trouble actually completing those passes. Basically, I see Mallett as Zach Mettenberger amplified. He’s got a cannon arm, but no ability to control it. The “Mk. III” model predicts his passer rating to be somewhere around 71. I think Fitzpatrick might be able to do a slight bit better.

What Exactly Do We Know Part III: The Nightmare Edition

Categories: Decision Making, NFL
Tags: No Tags
Comments: No Comments
Published on: October 28, 2014

This will be the final installment in my “What Exactly Do We Know” series. I think I’ve beat this horse enough that it’s about to fall over. But I need to talk about one final aspect of statistical reasoning and knowledge derived from data analysis. And this one is the creeping horror that should keep us all up at night. At the very least, this would keep me up at night if I were advising an actual NFL team. I’m going to being explaining the horror by having you imagine a job interview.

Actually, I’m going to ask you to imagine the lack of a job interview. How would you feel if you knew you were a top three candidate for a job, but the company called you one day saying “We don’t do interviews. We’ve already made our selection and we selected someone else because they had a higher college GPA.” When I ask my students to imagine this scenario, they say they’d be annoyed. They talk about how a particular score on a test doesn’t define them and if only they could get an interview they could prove their abilities and their worth.

However, if you’re trying to assess abilities and skills, evaluating on college GPA is actually the best way to get the skills and abilities you’re interested in. In fact, trying to assess abilities and skills with an unstructured conversation is one of the best ways to introduce unintended and significant bias into your decision making process. Most large, modern organizations don’t even use a conversation-style interview to assess skills anymore. Conversation-style interviews are done to only answer whether the person interviewing you could stand working with you for a day. But I digress.

I bring up job interviews because they are a fascinating point of the employee selection process. If used in the old let’s-chat-for-20-minutes way, the interviewer is unlikely to see the person’s worth with any form of accuracy. Which brings us back to football.

Football is an amazingly interesting game because of how interdependent all the action is. However, the interdependence leaves us with a fundamental problem. Can someone looking from outside the situation truly see what is actually happening in a quarterback-wide receiver connection? I looked in the published academic literature and couldn’t find the study that directly answers that question. I’m running the study in my lab right now, but I won’t have an answer for you for a long while. I haven’t looked at the data yet, but the tangentially related studies all seem to indicate that the answer is “No, we can’t see who is responsible for what when looking from the outside.” And, assuming I’m right, how then can we trust the opinion of any talent evaluator that doesn’t attempt to systematically control for such biases? Can even the most relevant talent evaluators, namely those that make personnel decisions for NFL teams, be trusted to make the right evaluation?

My top quarterback prospects from the 2014 draft were Nathan Scheelhaase and Keith Price. At the moment, neither of these players is on an NFL roster. The internet currently does not record what Scheelhaase is up to, but Prince is a quarterback – a backup for the Saskatchewan Roughriders in the Canadian Football League. So…what are we to make of this?

Keith Price 2013.jpg

Let’s say I’m right and we can’t trust talent evaluators that don’t use data to control the biases. This means that I can put out a list of prospect, those prospects can go out into the league and get evaluated. In the case of Price, he was evaluated by two of the best in the business – the Seahawks and Patriots.   Neither team desired his services which is how he ended up in Canada. But according to the theory we put out in the first paragraph, the fact that he didn’t get picked up doesn’t mean anything. We already believe that the evaluators can’t control a human bias. Hopefully you understand that this is a very advantageous rhetorical position to be in. How can you convince me that I’m wrong? What evidence would I accept if I won’t accept the pre-season evaluation of two teams who I have already stated I believe a two of the best in the league in evaluating talent?

The only evidence that the model will currently accept is on-field, regular season outcomes. And not only that, but I would need a lot of attempts to actually consider the notion I’m wrong. Which is why I would lose sleep if I worked for an NFL team. I’d have to trust the wings of Bayesian statistics to a degree I’ve never had to in the past. How terrifying would that be?

What Exactly Do You Know, Part II: The “Is He Your Cousin or Something?” Edition

Comments: No Comments
Published on: October 21, 2014

In last week’s post, I discussed the concept of what statistical inference actually tells you and how it’s boring and cumbersome to talk about it accurately, so analysts often shorten the conversation so they can actually talk with real people about something interesting. Today we take a slightly different tack regarding what exactly we know. Our example for this week is Minnesota Vikings quarterback Christian Ponder.

Christian Ponder close-up.jpg

If you’ve been reading for a while, you know that I was actually a fan of Ponder for a long time. Or, at the very least, I didn’t hate him with every fiber of my being like every other Vikings fan seemed to. I called him “not the problem in Minnesota” instead pointing to the largely ineffective receiving corps. I was talking to my neighbor before the season started. I said that Ponder is not the problem. We had this long and somewhat loud conversation about how I have to be wrong about him because everyone was giving up on Christian Ponder. Even Paul Allen – the radio play-by-play announcer for the Vikings – a guy who has never in his life given up on anyone in a purple jersey had given up on Christian Ponder. When I persisted that Ponder wasn’t the problem, my neighbor ended the conversation by saying, “You’re the only guy I know saying nice things about Ponder. Is he your cousin or something?” At the time the comment made me laugh. Then the Thursday night game against the Packers happened. I had to think more about this and examine what I know and what I don’t know about Christian Ponder in particular and the game of football in general.

So why was I so adamant that Ponder wasn’t the problem? Because, for all his faults, Ponder has one singular but important ability. He is rather accurate for an NFL quarterback. He’s not super-star Peyton Manning accurate, but he can get a football into a receiver’s hands slightly better than the average NFL quarterback. And why do I care so much about accuracy and nothing else? Because it’s the only quarterback ability I’ve found at the NFL level that will predict useful outcomes. Nothing else comes back predictive. Not a quantification of arm-strength, not Wonderlich scores, nothing at the combine, nothing but accuracy predicts NFL level outcomes.

And now we have another trap that analysts can fall into, a trap that is particularly present and meaningful for the NFL. I can’t find a predictive effect of my in-house metric that I think measures arm strength (let’s ignore the measurement point of “how do we know this thing is really arm strength” for now. It’s important but not where we’re going here). So I don’t find this effect. There are a couple possibilities why. The first possibility is the one that brings the page views and the loud conversations – that Arm Strength isn’t an important thing. However, another interpretation is that the lack of data at the NFL level makes finding the effect of arm strength insanely difficult.

Think about it like this. Imagine I told you that there was gold to be found in the body of water closest to you. To me that body of water is a river, so for the rest of this example I’ll be talking about a river. But maybe for you it’s a lake or an ocean or your friend’s bathtub. Whatever. You want to find this gold because you think having gold would be better than not having gold. So you go out and buy all the equipment necessary to pan for gold. You get the sorter pieces and the dirt sucker and everything else and you go stand in the river for a few hours and try to find this gold. Now, if you stood in the same spot panning for gold for four hours and didn’t find gold, would it be reasonable for anyone to assume that I’m wrong and that there is no gold in the river?


Crude Drawing of Where Gold is in a Fictional River
Crude Drawing of Where Gold is in a Fictional River

No, it would be ridiculous to say that. Maybe you were panning in the wrong spot. Maybe the screen you were using was too big and all the gold was little and slipping through. There could be many reasons why you didn’t find gold in the river.

Analytical findings are like gold. Just because you don’t find one, doesn’t mean that they aren’t there. This is a concept called “statistical power” and in the NFL it’s a huge problem. Our ability to find effects generally increases the more data we have. Think of it like this – more data makes our gold panning screens smaller. It allows us to find ever smaller nuggets of gold. In the NFL, the data is very sparse. There are only 32 teams playing 16 games each with maybe 30 passing attempts in each game. This pales in comparison to basketball’s 82 games and baseball’s 162. Compared to other sports, an effect in the NFL has to be fairly large before our screens will catch it. There is so little data coming from the NFL that it’s possible an arm-strength effect exists but there just isn’t enough data to find it.

So, after the Thursday night Ponder debacle, I went on a quest for more power. And in football, if you want more statistical power you need to look at the college level. With many many more teams we suddenly have a lot more power in our data set. I spent most of my summer calculating the same arm-strength metric for every NCAA FBS level quarterback and I ran the same model to see if arm-strength, along with accuracy, can predict useful quarterback outcomes. Low and behold, it does (said the amazed analyst and no one else). Ponder fairs very well on accuracy, but he suffers horribly on arm-strength. With this lesson learned, it’s time to quit dying trying to take the Ponder hill. Ponder is a problem for the Vikings offense. One of many, many problems.

What Exactly Do You Know?

Categories: General Info, Statistics
Comments: 1 Comment
Published on: October 14, 2014

(This post was inspired, in part, by a post Matt Waldman posted on his website a few days ago. If you haven’t already, go check out his site. He does an amazing job over there. The post in question is titled “Deny Emotion and You Only See a Fraction of the Game.”)

The establishment in sports has, for a while now, been telling outsider-sports-analytics types that one of the main barriers to widespread acceptance of analytics rests on the ability of the quants to communicate with the non-quants in ways that the non-quants can understand. I’ve covered the problems of communicating information both to quants and to non-quants in the past. But Matt’s post about emotion made me realize something more.

His general point was about the role emotion plays in sports performance. Specifically, he was contemplating whether or not momentum exists in sports. He cites conversations with analytics experts on the subject of momentum. If you’re familiar with the argument, most analytics experts conclude momentum does not exist. This is a fairly standard finding across many sports – basketball’s hot hand being another classic example. Matt’s reaction to that conclusion is also fairly standard among non-analysts. His argument is that emotion, and the effects of emotion on situations in sports are obviously occurring. Anyone who denies that is missing a huge portion of the signal inherent in the game. At one point he suggests that analysts have possibly never put themselves in physically dangerous situations and felt the impact of emotion first hand.

So, two things with that summation, both having to do with communication. First, assuming experiences that someone has or hasn’t had is a raw nerve for me. The stereotype of the milquetoast academic who simulates experiences rather than having actual experiences looms large over me. I’m guessing it has something to do with my father telling me to put down the video games and go spend some time working on my grandfather’s farm. But, I know the comment wasn’t directed at me and it wasn’t malicious anyway, so let’s put aside any irritation that might shut our brains off. In fact, we need to keep our brains on if we’re going to truly examine the point Matt is trying to get at.

Analysts (myself included) are often guilty of a particular linguistic shorthand. Our job is to find predictive effects. Does changing the way a request is phrased reliably change donations to charity? If I know about your height, can I make an educated guess about your weight? That sort of thing. We can become so familiar, so practiced in that job that, when we talk to other people, we tend to shorten the description of what we’re talking about. When my neighbor asks me what I know about momentum in sports, I say “I’m trying to find out if momentum exists” and he gets excited and engages with the conversation. The problem for analysts, though, is that this gets the conversation off on a disingenuous foot. Because we’re not truly trying to find out if momentum exists. We’re trying to find out if the predictive effect of momentum on some other variable, like scoring, exists. But we repeat the phrasing about studying the existence of momentum so much that we can forget that other people see that collection of words as having a different meaning.

And when we analyze the results in sports, the results a pretty clear. The effect of momentum on any variable we look at is unpredictable. The same is true in the academic literature as well. We cannot predict the effect of emotion on motivation with any reliability. So, I agree that we should probably stop saying that momentum doesn’t exist. The subjective experience of the emotion of the game is a real thing that people feel.  But I will hold fast to the notion that predicting what will initiate a change in momentum and how a change in momentum will impact athletic performance is an unpredictable enterprise.

«page 1 of 8
Welcome , today is Monday, October 5, 2015