Using ICCs to Calculate the Effect of the Quarterback

Categories: NCAA FBS, Statistics
Comments: No Comments
Published on: April 8, 2015

Last time I discussed the importance of correctly modeling the game you are interested if you want to address the problem of data analysis in football. If you are new to the blog, I would suggest reading that post before reading this one. It will give you a good overview of how the analytics are built around here.

For everyone else, a quick refresher. We’re assuming that the correct model of a football offense is shown below.

BasicOffenseModelYards gained on the field begin as a play called by the offensive play caller. They then filter down to the quarterback execution which then filters down to the wide receiver execution. Obviously there are important aspects of the offense we don’t model here, most notably the offensive line and any interactions between the stated roles of the model, but those bridges are a substantial distance down the road.

Now that we have our model, we need two things 1) a question to answer and 2) an analytical tool that can take our question, address the realities of the model, and give us back a number for us to interpret.

The Question

I want to answer one of Bill Connelly’s 45 Reasons to Care about College Football Analytics. These are a set of questions Bill created to drive interest in analyzing college football data. Specifically, I want to begin to address question #5, quantifying how important the quarterback is to the offense. Believe it or not, we can address this question with already existing tools. The first bit of technology we need is the Better Box Score I detailed last week. Here is an example of a Better Box Score.

BetterBoxScoreSo our question is How Important is the Quarterback to the offense. We have our basic model up top and the Better Box Score to help us. Now all we need is the tool. To answer this question we will use Intraclass Correlations (ICC). ICC’s are analytical tools designed to understand similarity across sub-sets of a group. Similarity among group members can be interpreted as the effect of a common factor higher up the hierarchy – in this case the quarterback.

For example, below you see a scatterplot of the four individuals a) played quarterback for Utah State in 2014 and b) targeted two different receivers at least five times over the course of the season. Utah State presents a nice example of this as they had so many quarterbacks and no single quarterback ran away with the team’s attempts. Receptions are on the X axis and Targets are on the Y. Each mark represents a different pass receiver targeted by that quarterback, but the same receiver could be marked on this graph multiple times if they were targeted by multiple quarterbacks. Remember that’s not a problem for us because our model says a pass exists as a connection between quarterback and receiver, not as individual performances. Notice the general patterns in the subsets.

 4-8-2015 - Scatterplot - No Circles

First, Craig Harrison is markedly different from the rest of the quarterbacks on this list. His points are concentrated in the lower right hand corner. The second thing that should be noted is how tightly clustered Darell Garretson’s completion percentage is whereas Kent Myers is more spread out. This may be easier to see if we draw ellipses around each sub-set.

 4-8-2015 - Scatterplot - With Circles

See how Darell Garretson’s ellipse is much more squished compared to Kent Myers’s? That means the receivers targeted by Darell Garretson are more similar to one another than they are with Kent Myers. Likely, this indicates an effect of Darell Garretson getting more consistent performances out of the pass receivers (note that more consistent does not necessarily mean better. One could be consistently throwing balls into the dirt on every pass play and still be consistent).

But we want more than just looking at graphs and guessing if they mean anything. We want to quantify if those circles are meaningfully different from one another. The ICC is a good tool to use here as it returns both a null hypothesis significance test and an effect size of what percentage of variance in a statistic is attributable to a particular “focal person,” in this case the quarterback.

I will be using the terms focal person and partner repeatedly throughout the explanation, so let’s define those terms. A “focal person” is any entity on the higher end of a two-level hierarchy and the “partner” is on the lower level of the hierarchy. So in the hierarchy of our model, the quarterback would be the “focal person” and the receivers would be the “partners.” Note that, in this case, higher up on the hierarchy does not mean “better.” It just means that, when we model the actual game, multiple wide receivers are paired with a single quarterback.

The formula I will use for the ICC is a bit different than what you might find in other sources. Psychologists, most notably David Kenny, have evolved the formula of the ICC so it better matches the questions we care about. The formula we use can be interpreted as an assessment the similarity of the members of a sub-group. It will tell us whether or not receivers targeted by a particular quarterback are more similar to one another than they are to other random points in our data set. Therefore, using the following formula we can assess the percentage of variance explained by having the ball thrown to you by a particular quarterback.

Formula - ICC

  Where k’ = either the number of partners if group size if fixed or, if group size is variable as it is in our case

Formula - K prime

  To calculate our ICCs we first need to choose a dependent variable. I will focus on yards gained (rather than the completion percentage that I showed above). As a teaching example, let’s first calculate the ICC for our Utah State quarterbacks. Here are the data that I have that we’ll be using.

BetterBoxScore - Utah State - 2014

To calculate the ICC, we first run a univariate ANOVA on yards with Quarterback (the focal person) as the independent variable. This returns our Between-subjects and within-subjects variance. In this case those numbers are

Formula - MSbetween MSwithin for Utah StatePlugging those numbers into our formula above, along with calculating k’ gives us an ICC of

Formula - ICC for Utah State

This means that on Utah State during the 2014 season, 18.7% of the variance in passing yards gained can be attributed to the quarterback. Now let’s do this same thing for the entire league, but we have one final wrinkle to overcome, the fact that we have nested hierarchies – receivers within quarterbacks within teams.

To tease all this nonsense apart, we’re going to start at the very top of our hierarchy. Team will be our focal person and quarterback-receiver connections will be he partners. We need to enter more than one season’s worth of data into this analysis because we need to be sure that every team has at least two of the next level down in the hierarchy, in other words quarterbacks. Because of the NCAA’s eligibility rules, this means we need to have at least six seasons of data to guarantee this criteria is met for every single team. So we have data from 2009-2014 in the data set.


Calculating out the Mean Square (MS) between and MS within (a.k.a. between and within groups variance respectively) gives us the following.

Formula - ICC for teams

So 3.1% of the variance in yards can be attributed to the team. This would be anything that is common among all receivers and quarterbacks, so things like the offensive system, facilities, average offensive line ability, average relative defense strength played against, etc.


Now we run the same analysis on the same data but now we change the focal person from team to quarterbacks. Running this analysis gets us the following result.

Formula - ICC for teams and qbs

This result tells us that 6.4% of the variance is attributable to…what? Because it’s not directly true that this results explains everything about the quarterback only. Instead it says 6.4% of the variance is attributable to everything that is held in common among the partners, which would be quarterbacks but would also include, play callers, facilities, etc. So, we need to do a simple subtraction here to get a pure quarterback metric.

Formula - ICC for qbs

And there’s our answer. Quarterbacks in NCAA FBS football have 3.3% of the variance in passing yards attributed directly to them. I also find it very interesting that knowing who the quarterback on a team is will explain almost exactly as much of the variance in passing yards gained as knowing who the play caller is.

2014 Passing Yardage Predictions – Part II

Categories: Fantasy, NFL, Statistics
Comments: No Comments
Published on: August 19, 2014

Welcome back everyone. I took last week off because some very important things were and still are happening in our country. I couldn’t bring myself to talk about a game overlaid on top of another game. It just seemed a little disconnected from the world at large. I think it’s important that we all stay reminded of the events happening in Ferguson, MO. That being said, it’s time to build an audience and nothing builds an audience like new content.

This week I finished a full set of projections for yardage totals of quarterbacks, wide receivers, and tight ends. This new model makes two major corrections compared to the one I posed a couple weeks ago. First, it corrects for 2013 injuries. You’ll note that Julio Jones has a much higher predicted yardage total in this model compared to the previous one. Second, it corrects for changes to the offensive system. I’ll get to why this is important in a minute. For ease of viewing, I’ve added a new page to the banner so you can easily check these tables whenever you need to. Remember that these yardage totals assume the player in question plays all 16 games and any coaches that have changed jobs do not radically alter the schemes they’ve used in the past. I also want to throw out a big thank you to Jeff over at for the offensive coordinator history spreadsheet that made all these projections possible.

One very important caveat before we begin. I don’t have any historical data to check these predictions against. Jeff’s data on targets doesn’t go back far enough for me to do any historical checking on how accurate this model tends to be. So, I have no idea about the uncertainty inherent in this model. We’ll all be learning this together as the season goes on. After the season is over, we’ll check them together. Isn’t science fun?

Top Projected Wide Receivers – Receiving Yards

My first list of projections is for wide receivers, and that list doesn’t come with a lot of surprises. You’ve got your Andre Johnsons, your Dez Bryants, your Brandon Marshalls and your DeSean Jacksons at the top. I don’t really see a surprise on that list until I see Josh Gordan predicted at less than 1,000 yards – assuming he plays all 16 games. And even that is understandable given Cleveland’s quarterback situation. I’ll keep the list updated as depth charts change and injuries occur.

Top Projected Tight Ends – Receiving Yards

Once again, a lot of ho-hum on this list. Jimmy Graham will lead the league in tight end receiving yards, a Detroit Lion will follow him because Detroit will still throw the ball all over the place and defenses will try to lock down Calvin Johnson, blah-blah-blah. You’ll see Levine Toilolo third on that list, but I’m not sure I buy that specific prediction. The model is assuming that Toilolo will step in and take all of Tony Gonzalez’s targets which my human brain tells me isn’t going to happen. I have left that prediction as the model reports it for accuracy’s sake, but on that one, I think we have some justification to adjust it down a bit.

Top Projected Quarterbacks – Passing Yards

I went back and checked the results I’m about to tell you three different times. As I was doing that, I anthropomorphized the mathematical equation and called it a “little dickens” for trying to trick me. But there was no mistake. The inputs I fed into the model were all correct. Furthermore, all the other top five quarterbacks make perfect sense. Most of us expect Carson Palmer, Drew Brees, Tony Romo, and Peyton Manning to have high yardage totals at the end of the season. But I didn’t expect the guy at #1 by a long shot. And so, without further ado, your projected 2014 NFL leader in passing yards – edging Peyton Manning by 98 yards – is…Houston’s Ryan Fitzpatrick.

You’d call a mathematical equation a “little dickens” too if it tried to trick you with such nonsense. After I saw it I looked up the prop bet odds on Ryan Fitzpatrick leading the NFL in passing yards and found that it’s such a ludicrous notion that Vegas isn’t giving action on such a proposition. It seems insane, but let’s keep an open mind and consider this for a second.

Once you think about it, there are several reasons why it makes sense that Ryan Fitzpatrick could lead the league in passing yards this year. First, we know something about what Bill O’Brien likes to do on offense. We know he likes to throw the football and his system is very effective at gaining yards through the air. Any system that makes Matt McGloin look that good has got to have something going for it. We also know that O’Brien provides a lot of opportunities to his best receivers and seems to be able to adapt the passing game around what he has. Second, Houston has the best receiving corps you will find outside of Denver or Chicago. From top to bottom, the wide receivers in Houston know how to get open and know how to get yards after the catch. This will be a second huge bonus to Fitzpatrick’s passing yards. Third, nobody really knows what the status of Arian Foster is. We know he’s busy trying to be the best teammate he can be, but can he still be the productive running back he once was? I have my doubts. And finally, I don’t want to count out the man himself. Fitzpatrick is a serviceable quarterback. He’s not going to take a team on his back or anything, but he’s not horrific either. There’s a reason he’s stuck around in the NFL so long.

So there you go. Lots of fairly boring expectations for receiving yards and one super out-of-left field prediction. Let the season begin!

Quarterbacks to Watch – Mid-Season Edition

Categories: NCAA FBS, NFL Draft, Statistics
Comments: No Comments
Published on: October 24, 2013

We’re a little more than halfway through the college football season, so it’s time to share which quarterbacks have bubbled to the surface of my spreadsheet this season.

You can see the numbers for the 2013 season so far here, so I will focus on quarterbacks who could be draft eligible and are the most likely to succeed at the next level.  Also, I will avoid any quarterbacks I already talked about in my season preview.  Which leaves us with a rather short list.  I was going to at least be able to talk about two quarterbacks, but then this happened.

(AP Photo/The Herald Journal, Eli Lucero)

I was late to the Chuckie Keeton party anyway, so that name is probably not a shock to anyone.   I’m certainly on the Keeton bandwagon, assuming he can come back from injury and be as good as he’s shown in the past.  So let’s talk about a deep sleeper prospect who was just outside my top players to start the season and is having another solid year.

Troy’s Corey Robinson

Corey Robinson Quarterback Corey Robinson #6 of the Troy University Trojans throws a pass during the game against the Ohio University Bobcats during the R&L Carriers New Orleans Bowl at the Louisiana Superdome on December 18, 2010 in New Orleans, Louisiana.
(Chris Graythen/Getty Images North America)

Robinson has everything you want to see statistically from a prospective quarterback.  He is a four year starter, has over 1500 career attempts and is very accurate with his passes.  He’s never had a season with less than a 62% completion percentage.  He’s going to get razzed for having too many interceptions, but that’s not something the data is too terribly concerned with.

As of Saturday, my calculations predict he would have an NFL quarterback rating of right around 76 after four years in the league, earning him a solid backup status on many teams and a starting job on some teams that need help at quarterback.

So, if your team is in the market for a backup quarterback and doesn’t want to spend a draft pick to get him, you have a pretty good option in Corey Robinson.

New Data – 2012 Draft Class – Quarterbacks

Categories: NCAA FBS, NFL Draft, Statistics
Comments: No Comments
Published on: February 1, 2013

Hi Everyone,

I’ve added some additional data from the 2012 draft class to give a better sense of how well Completions Away from Average helps predict performance.  I’m not willing to call these true predictions, since I didn’t make them public until after the season started.  However, the numbers are what they are, so it’s not like they are going to change.

A couple things to note.

#1)  RGIII is rated 4th.  This is obviously too low for the performance he had during the 2012 NFL season.  All I can say is that any good model is going to have some margin of error to the predictions.  The question is, why?  The answer is because of injury.  RGIII didn’t get to play a whole lot the 2009 season which really set him back in the ratings.  These predictions are based on career ratings, and to have career ratings you need to be on the field.  Completions Away from Average has nothing useful to predict if the player isn’t on the field.

#2)  I made a rather big deal about Kellen Moore being passed over on draft day when I first started this blog, but now he’s nowhere to be found.  That one is entirely on me.  I started talking before I had all the data analyzed.  What happened was that I only looked at the data from the 2011 college season rather than waiting until I had all the information from each player’s entire career.  Kellen Moore had a very good 2011 season, but didn’t have very good years prior to that, leaving the total at roughly average.

#3)  I’m still quite proud of Russell Wilson being #2 on this list.  I hope my 2013 #2 (Ryan Aplin) comes through like Wilson did.  They are both very similar quarterbacks.

2013 NFL Draft – Comparing Projections: Smith, Barkley, Glennon, Wilson, & Bray

I thought it might be fun to put down some comparisons between my projections and some quarterbacks that seem to be the “consensus” best picks in the 2013 draft.  I’ve already said who I think are the top 5 draft eligible quarterbacks in a previous post.  But let’s see what my math says about the quarterbacks that I see a lot of others putting at the top of their boards.  Taking these in no particular order.

Geno Smith – West Virginia

2012 CAA – 0.71

Career CAA – 3.78

I won’t harp on this too much because I’ve already done that in other places.  My assessment of Geno Smith is that he’s an average FBS quarterback embedded in an elite offensive system.  He’s not likely to work out as an NFL prospect.

Matt Barkley – USC

2012 CAA – -11.33

Career CAA – 11.64

I was actually a little too hard on Matt Barkley in a previous post.  The old post was based on an analysis that only included the 2012 season, which everyone seems to agree has been an unmitigated disaster for Barkley.  However, he did show flashes in 2010 and 2011 that would warrant someone looking at him as a draft pick.  The thing is, the 2012 season sets him so far back that taking him as a first day pick would be far too risky.  For those interested, the player in my database whose numbers are closest to Matt Barkley is Blaine Gabbert.  Linking those two players together added a fun bit of irony when I started seeing Barkley going to the Jaguars in some mock drafts.  (Update:  I don’t know what I was looking at when I wrote that last statement, but it wasn’t my own data.)  Barkley probably won’t tank horribly, but he’s also unlikely to be a high quality starter in the NFL.

Mike Glennon – NC State

2012 CAA – -18.55

Career CAA – -26.63

Smith and Barkley have really seen their draft stock dropping.  And their loss has been Mike Glennon’s gain.  Glennon is the current flavor of the month for many draft predictors, including Mel Kiper Jr.  But we’ve gone from average (Smith) and workable (Barkley) to a terrible decision.  Mike Glennon is not a good quarterback.  He is not an average quarterback.  There literally won’t be a worse quarterback prospect in the 2013 draft than Mike Glennon.  But he looks the part.  He’s got the height and the frame (whatever that means) and the arm strength (apparently) and the few passes he does complete are crazy exciting plays down the field.  And that excitement pulls your attention away from the every-day, run-of-the-mill passes that he can’t complete.  Things will not go well for a team that starts Glennon.

Tyler Wilson – Arkansas

2012 CAA – 8.69

Career CAA – 19.15

The best of the bunch is Tyler Wilson.  This doesn’t mean that he’s the best quarterback in the draft (see here for that list), but he is the best of the players getting a lot of play at the top of draft boards.  Comparing him to a player last year, his numbers are closest to Kirk Cousins.  And Cousins had some success the little bit he played, especially given that it was under one of the best play callers in the business.  I wouldn’t call him a franchise changing quarterback by any means, but he does have skills.  Given the right offensive coordinator and coaches, he could have some success in the NFL.

Tyler Bray – Tennessee

2012 CAA – 0.74

Career CAA – -16.60

I can’t take credit for this sentiment on Tyler Bray, but I also can’t quote the source because it was on Twitter several weeks ago and now I can’t find it.  Anyway, the Twitterperson said that some team would reach for Tyler Bray and spend the next 2-4 years banging their head against a wall.  I couldn’t have said it any better myself.  (Note:  If you are the source of said quote, let me know and I’ll update the post accordingly)

I think we’ll stop there for now.  I’m planning on updating the numbers page to have a complete list of the Career CAA for all draft-eligible quarterbacks.  I’m waiting until the declare deadline to put it up, but you should see that soon.

page 1 of 1
Welcome , today is Saturday, June 24, 2017