Building a Better Box Score

What is the most common line present in any analytics-based article/post/discussion of the game of football? My answer is some version of “Analytics in football are more difficult/impossible/can’t be done because of how interactive and interdependent the members of a football team are compared to other sports.” Sometimes that basic line is flavored differently, depending on the particular tone of the piece, but that line always seems to be there. It’s a seemingly necessary caveat of the genre. “Of course we can’t know everything about Quarterback X because so much of being a quarterback depends on [insert whatever point one want to make about interactivity on football teams].”

Why do we simply talk about that problem? Why do we constantly talk about the interdependence problem but never fix it? Can it actually be fixed and what would such a solution look like?

The Football Box score – A Model by Ralph Wiggum

To start, I’d like to spend some time developing the idea of why dealing with the interdependence problem is so difficult in football. After all, other industries deal with a similar problem. The entire job of management is interactive and interdependent by nature, we can still figure out who the good managers and the poor managers are. Why do we have such a difficult time in football? I argue it’s because of the modern football box sore, a ubiquitous, pervasive summary of the events on a football field that horribly misrepresents the realities of the game.

A good box score acts like a description of the events of a game, the same way that the model of the solar system you built in middle school acts as a basic description of how the solar system works, how it is set up, and how each component generally relates to the others. Your solar system model was also simplified. You didn’t put all of Jupiter’s 67 moons in your model. You probably didn’t put in the asteroid belt or the daily spin of each planet to an accurate degree (kudos to you if you did), and that’s okay. The purpose of any model is to get the major elements of the system correct while simplifying or eliminating the less important elements. And this is where the box score of a football game falls down dramatically. If a football box score were a model of the solar system, it would be a model created by Ralph Wiggum. This is true to some extent of the entire box score, but I’m going to focus mostly on the passing statistics element as it is the worst offender.

The passing yards box score for Florida State in their blowout loss to Oregon in the first round of the college football playoffs looks something like this.

Passing Yards

Jameis Winston, 29/45, 348 yards, 1 TD, 1 INT

Sean Maguire, 0/3, 0 yards, 0 TD, 0 INT

Receiving Yards

Travis Rudolph, 6 rec., 96 yards, 1 TD

Jesus Wilson, 5 rec., 72 yards, 0 TD

Karlos Williams, 5 rec., 59 yards, 0 TD

Rashad Greene, 6 rec., 69 yards, 0 TD

Dalvin Cook, 3 rec., 24 yards, 0 TD

Ermon Lane, 2 rec., 22 yards, 0 TD

Freddie Stevenson, 1 rec., 12 yards, 0 TD


There are lots of problems with how this data is presented, but let’s focus on two.

Problem 1 – Data Redundancy

The problem that upsets me the most about a football box score is needless redundancy. Every single yard gained by a forward pass is counted twice – once for the quarterback and once for the receiver. This is a problem, a big problem, because it means the data presented here do not reflect what actually happens in a football game. You don’t complete a forward pass and then immediately mark off the gained yardage again. However, in a football box score, because we call the events different things – completions vs. receptions and passing vs. receiving yards – suddenly it becomes okay to double count every single event in the passing game except attempts and interceptions. But they are the exact same event. Jameis Winston’s completion is Rashad Greene’s reception. One cannot happen without the other. Winston earns passing yards and Greene earns receiving yards on the exact same yards gained. We’re not modeling what actually happened in the game. We’re modeling a way to give credit in the most individualized way possible. However, as every single football analytics article will tell you, football is an interactive game. If the game is, in reality, interactive, why do we assign credit for the events in this individualized manner?

Problem #2 – Loss of Information

I think it’s rather ironic that a football box score has so much information redundancy, but it explicitly removes an important piece of information that would allow us to complete some very important analyses on team performance in football.

As an example of the loss of information, let’s look at a different example, this time from Oregon’s Week 1 win over the University of South Dakota. Early season games are useful for this example because it is very likely that the backup of the “high power” football program will spend a great deal of time in the game.

Passing Yards

Marcus Mariota, 14/20, 267 yards, 3 TDs, 0 INTs

Jeff Lockie, 11/12, 113 yards, 1 TD, 0 INTs

Receiving Yards

Byron Marshall, 8 rec., 138 yards, 2 TDs

Darren Carrington, 4 rec., 68 yards, 0 TDs

Dwayne Stanford, 1 rec., 62 yards, 1 TD

Pharaoh Brown, 2 rec., 32 yards, 1 TD

Johnny Mundt, 2 rec., 29 yards, 0 TDs

Keanon Lowe, 1 rec., 18 yards, 0 TDs

Royce Freeman, 1 rec., 11 yards, 0 TDs

Thomas Tyner, 3 rec., 8 yards, 0 TDs

Charles Nelson, 1 rec., 8 yards, 0 TDs

Devon Allen, 1 rec., 5 yards, 0 TDs

Johnathan Loyd, 1 rec., 1 yards, 0 TDs


Here we have two quarterbacks that completed a similar number of passes over the course of the game, but Marcus Mariota gained more than double the passing yards compared to Jeff Lockie. How was that accomplished? Which receivers gained all those yards for Marcus Mariota? Who caught those passes from Mariota to gain so many passing yards for him? Was it because Dwayne Stanford caught one long pass for a touchdown from Mariota? Or did Jeff Lockie complete that pass and only got 51 yards from the other 10 passes he completed? Who was the intended receiver on the six attempts that Mariota did not complete? How many times was any receiver an intended receiver, but did not complete the catch? Did Mariota target the same receiver over and over with no results? Or are his unsuccessful attempts scattered all over the place? We don’t know the answer to any of these questions from the box score description of this particular game. Answering these questions is of critical importance to a better understanding of the game of football.

Our first step to scientifically understanding football is to build a better box score. The question is, what would we want the box score to represent?

Begin with a Model

Before we can build a tool to aggregate data, we need to have a decent idea of what data we want and why we want it. We need to start with a theory of how a football offense works. This is our “model of the solar system” as it were. What things in a football offense are important and what things should we avoid for right now? Here is a figure of my current theory of a football offense.

Basic Model of a Football Passing Offense



This very basic theory says that a football offense begins with the offensive play caller. The play call made then filters down to whichever quarterback is currently in the game, and the ability of the quarterback then filters down to the receivers. Note that we could also include pass catching tight ends and running backs in this model.

Now, I’ve simplified the passing offense to a great extent in this model. Most notably, I’ve removed the impact of the offensive line here. That is by design, but not because I think that the offensive line is unimportant. Instead, I think the offensive line is like adding in the daily spin of the planets to your solar system model. Adding it in becomes incredibly complex and will probably take some specialized data that not is not publically available. For right now, we will keep the offensive line out of our model. Anything else that’s left out of this model we will consider as having such a limited effect that it won’t change our understanding to a large enough extent that we need to account for it.

A Better Box Score

Now that we have our model, we can adjust our box score so that it reflects the important elements inherent in our model. The most important element of the theory says that we must count passes as a single event and account for the coach that called the play, the quarterback that threw the pass, and the receiver that caught it. Such a thing isn’t terribly difficult to create. You can see an example of one below. In this example, I am allowing team to stand in as a proxy for the play caller.



Now all we need is a way to analyze data that fits our theoretical understanding and the data that our better box score is collecting.  Fortunately, such a way exists. Next week.

Are You My Mother?

Categories: General Info
Tags: No Tags
Comments: No Comments
Published on: March 25, 2015

One of my daughter’s favorite books is Are You My Mother? by P.D. Eastman. In the book, a baby bird hatches while its mother is away finding food. The bird decides to leave the nest in search of its mother, all along not being sure exactly who his mother is or what she looks like. He has several ideas, attempting to connect with a kitten, a dog, a cow, and an airplane until he is finally returned to his nest by a hydraulic digger (it’s a very cute story) just before the mother returns to the nest with a worm.

For a while, I’ve felt like my writing on this blog has been akin to that bird and its quest. I feel like I’m trying to connect with different styles of writing, but those styles simply are not me. I’ve tried being irreverent, I’ve tried touting my knowledge without discussing methods, but the words ring hollow in my ears. Last week gave me the opportunity to actually sit and think for a few minutes (but only a few) about what I value and what I really want this blog to be.

I thought about the times I felt valued and like I made some small contribution to a discussion on the internet. The last time that happened was the night everyone was talking about #TheDress. Half the people in my department are vision researchers, so I’ve been subjected to a lot of vision presentations in my time. I had just finished teaching the section on luminance perception in my Intro class and I had something to say. I didn’t say much, but people seemed to find value in it and it felt very good.

I read Bill Connelly’s article about the important, unanswered questions in college football. It’s a good piece designed to spark a search for answers. The interesting thing for me is that the answers to two of the first five questions are sitting on my hard drive right now. Understanding how to answer them isn’t particularly complicated or inventive. It just takes finding the right tools. Tools I know a great deal about due to 10 years of studying interacting, interdependent groups and teams. Tools most other people don’t have because hardly anybody spends 10 years studying interacting teams unless one already has tenure (not me) or gets phenomenally lucky in employment (there I am).

I also thought about times when I didn’t feel right about my whole enterprise of data analysis of football, that being the Patriots deflated ball controversy. That was a weird time to be interested in data analysis of football because someone that isn’t particularly interested in reproducible, publicly available analysis attempted to answer a question. That question wasn’t answered particularly well, and the whole thing really spiraled out of control. However, it did get me thinking of the notion of credibility and how truly important credibility is to me. The guy that answered the question initially isn’t particularly credible, but then I had to confront the question of how much outward facing credibility I have given that we both hide how we arrive at the conclusions we do.

I thought about my academic area as a whole, and how much of a credibility hit it’s taken lately. I’m a social psychologist. You might not be aware of this, but for the last 3-4 years, social psychology has taken something of a walk off the credibility cliff in academic circles. Reports of outright falsification of data[link], poor data management, and “massaging” of results have become too numerous to mention them all. Frankly, it’s been very depressing to be a social psychologist these last few years. I’ve survived by teaching, but I haven’t been able to get excited about research the way I used to. I have a dissertation that was nominated for (but didn’t win) an international award just sitting, unpublished, because I can’t force myself to get it out the door. I’d never really connected my publication struggles to credibility before, but upon reflection I think that may be a major cause, if not the primary one.

So here is where we’re at.

  • Scientific credibility is important to me, both personally and for the field of social psychology and the sub-specialization of interacting groups that I work in
  • I need to do research to keep my day job
  • I am struggling to do research because I have issues with the credibility of certain elements of my field
  • In my own internet space, I am not acting in a way that brings about scientific credibility either to myself or to my chosen field

This is a contradiction I cannot have, and the solution is to change the approach. I’m going to continue publishing here in an attempt to rebuild my own sense of credibility. Hopefully, this will rebuild my taste for research in my day job with the ultimate goal of gaining some job security for once in my life. Starting next week, I’m opening up the kitchen [link] to show everyone how I’m building everything I’ve built here. I’ll be discussing science as it relates to interactive, interdependent teams in the context of professional football. Hopefully, you’ll be able to apply some of what I say to contexts beyond football. I’m going to be spending a lot of time covering techniques and methods used to study interacting teams as they are reasonably uncommon methods of data analysis. I’ll be talking in a very scientific, academic tone with how I approach these concepts as that’s what I am.

This has been my realization. I am not a scout or a journalist. I am not a gambler or a tout, or an information broker. I am a scientist.  Come back next week and we’ll learn something together.

What Exactly Do You Know?

Categories: General Info, Statistics
Comments: 1 Comment
Published on: October 14, 2014

(This post was inspired, in part, by a post Matt Waldman posted on his website a few days ago. If you haven’t already, go check out his site. He does an amazing job over there. The post in question is titled “Deny Emotion and You Only See a Fraction of the Game.”)

The establishment in sports has, for a while now, been telling outsider-sports-analytics types that one of the main barriers to widespread acceptance of analytics rests on the ability of the quants to communicate with the non-quants in ways that the non-quants can understand. I’ve covered the problems of communicating information both to quants and to non-quants in the past. But Matt’s post about emotion made me realize something more.

His general point was about the role emotion plays in sports performance. Specifically, he was contemplating whether or not momentum exists in sports. He cites conversations with analytics experts on the subject of momentum. If you’re familiar with the argument, most analytics experts conclude momentum does not exist. This is a fairly standard finding across many sports – basketball’s hot hand being another classic example. Matt’s reaction to that conclusion is also fairly standard among non-analysts. His argument is that emotion, and the effects of emotion on situations in sports are obviously occurring. Anyone who denies that is missing a huge portion of the signal inherent in the game. At one point he suggests that analysts have possibly never put themselves in physically dangerous situations and felt the impact of emotion first hand.

So, two things with that summation, both having to do with communication. First, assuming experiences that someone has or hasn’t had is a raw nerve for me. The stereotype of the milquetoast academic who simulates experiences rather than having actual experiences looms large over me. I’m guessing it has something to do with my father telling me to put down the video games and go spend some time working on my grandfather’s farm. But, I know the comment wasn’t directed at me and it wasn’t malicious anyway, so let’s put aside any irritation that might shut our brains off. In fact, we need to keep our brains on if we’re going to truly examine the point Matt is trying to get at.

Analysts (myself included) are often guilty of a particular linguistic shorthand. Our job is to find predictive effects. Does changing the way a request is phrased reliably change donations to charity? If I know about your height, can I make an educated guess about your weight? That sort of thing. We can become so familiar, so practiced in that job that, when we talk to other people, we tend to shorten the description of what we’re talking about. When my neighbor asks me what I know about momentum in sports, I say “I’m trying to find out if momentum exists” and he gets excited and engages with the conversation. The problem for analysts, though, is that this gets the conversation off on a disingenuous foot. Because we’re not truly trying to find out if momentum exists. We’re trying to find out if the predictive effect of momentum on some other variable, like scoring, exists. But we repeat the phrasing about studying the existence of momentum so much that we can forget that other people see that collection of words as having a different meaning.

And when we analyze the results in sports, the results a pretty clear. The effect of momentum on any variable we look at is unpredictable. The same is true in the academic literature as well. We cannot predict the effect of emotion on motivation with any reliability. So, I agree that we should probably stop saying that momentum doesn’t exist. The subjective experience of the emotion of the game is a real thing that people feel.  But I will hold fast to the notion that predicting what will initiate a change in momentum and how a change in momentum will impact athletic performance is an unpredictable enterprise.

Beholden to Talented Shitheads: Why We Need Analytics

Categories: General Info, NFL
Comments: 1 Comment
Published on: September 9, 2014

I hope everyone is enjoying the new football season. I’m glad to see the Vikings are 1-0 and the defense looked good, although, it was against the Rams so I’m not sure that means all that much.

I don’t have much to talk about in the way of numbers today. We’ve got one week worth of NFL data which will tell us largely nothing about how the rest of the season will play out and we’ve got two weeks of college football data which will tell us something so minor that we probably shouldn’t bother right now.

Instead, I thought I would talk about one of the more important social issues surrounding football right now. I want to talk about Ray Rice and, specifically, what Ray Rice shows us about the importance of adopting analytic strategies for selecting members of organizations.

Many people think that businesses use analytic strategies like skill testing and personality testing because the tests tell you which individual is the most talented, most productive, most useful potential employee and the business then selects the person who comes out on top of the most important tests. And if you think that, you’d be sort-of right about how the process works, but you’d also be sort of wrong.

Most businesses that use analytic strategies use their tests not to find a single individual, but instead to narrow the pool of possible individuals. Tests are used to cull the group, but they generally aren’t used to make a final decision. High scores are necessary to land the job, but they aren’t sufficient. Once the tests identify the proper pool of applications comes the next, and most vital question an interviewing team can ask, “Can we all work with this person?” Fit within the work culture and ability to get along with co-workers is critical to building a functional organization. Any business using this strategy needs to be very careful that their answers to whether they can work with different people are not biased in ways that violate Civil Rights laws or any moral principles that the company holds to, but in general that’s how companies use tests to select employees. Test them all, generate a pool, but don’t select based solely on high scores but rather on more human elements.

That’s the first way analytics helps you build your organization. You can be sure of selecting talented people that are actually the kind of people you want to work with. And that could be important if you’re trying to build a football team. Many coaches seem to have very high minded policies about avoiding players with domestic violence histories. And while they seem to stick to those principles to greater or lesser degrees depending on the talent of the player in question, we can at least see how this would work. If your analytic strategy returns two players as equally likely to succeed and one of them has a history of domestic violence, you probably go with the other one. But that’s not why NFL teams need to quickly adopt analytics.

Using analytics to select employees is critical when one of your talented and valuable employees makes a mistake so horrendous, so unspeakable that it makes you rethink whether or not you would be able to work with that person ever again. Enter our connection to Ray Rice.

What Ray Rice did was unspeakable. But how the Ravens and the NFL responded to the situation is just as unspeakable. And while I can’t speculate on what was going through Rice’s head when he committed his act, I have been associated with enough employee selection meetings to have a guess at what the Ravens were thinking prior to cutting him.

The Ravens, and all NFL teams, are in an industry where talent is incredibly difficult to identify. Highly trained NFL scouts get evaluations of talent wrong every season. It’s a terrible job to try to be good at because almost no one truly knows what it takes to be a great football player. If the organization can’t reliably identify talent, it becomes very guarded about the talent that has fallen into its lap. And when organizations have limited confidence in their ability to find new talent, they are more willing to forgive egregious actions from the talent they actually have. In essence, organizations can become beholden to talented shitheads.

Selecting players using analytic strategies can break that cycle. When a talented member of the organization moves into territory that the rest of the organization can’t follow, it is a simple matter to separate from that person, regenerate a new pool of potential applicants, and begin the selection process all over again. We don’t have to run our rationalizer ragged trying to find reasons why Action X might be morally repugnant, but doesn’t justify removal of the person from the organization. Instead, the incentives for talented individuals to act like a shitheads evaporate. The organization can afford to be less risk-averse when problems with talented players emerge. If the Ravens had a large scale analytics-based selection process they could have cut Rice in February and found two or three shiny new running backs. Instead, we have the nonsense we all saw this week. Honestly, I fail to see how the status quo is better.

Spaghetti & Advanced Analytics

Categories: General Info
Comments: 1 Comment
Published on: March 12, 2014

For being a German-Russian/Norwegian from North Dakota, I make one tasty spaghetti.  Cooking is one of the few hobbies that I get to indulge with my academic career as it’s difficult to claim that one is too busy to eat.  If you come to my department and ask my co-workers, they will tell you that they’ve all heard about my fabulous spaghetti.  I’m getting the feeling that they’re all getting a little tired of hearing about it without getting to eat any of it.

Now imagine I invited you over to my house for spaghetti.  Or rather, that I told you I would make my spaghetti for you if you would be willing to pay for all the ingredients that go into it.  You’ve heard me boast about my spaghetti.  You know that I talk about the time and care that goes into my spaghetti.  You decide that you are so intrigued that you’ll put down some cash to finally get the privilege of eating this wonderful spaghetti.  You might even start imagining what you’re going to get on the way over.  What could be in this oh-so-hyped spaghetti?  Is it a secret, homemade sauce?  Basil-infused deep-fried meatballs? (Sidebar:  mmmmm…basil-infused deep-fried meatballs)  You won’t know until the meal is prepared, but you’re thinking it must be something good.  Finally, the spaghetti is ready and I drop this in front of you. What would your reaction be?

At best, you’re justifiably annoyed.  At worst, you would curse my name and flee the house, using very forceful language about how much you’d rather be at Olive Garden right now.  You spent good money on something that a college sophomore creates every other day.  Beyond just being upset, you want to make sure that you never get tricked like that again.  What’s your solution?  How are you going to make sure you don’t get bamboozled into buying buttered noodles from a fast talker again?  The solution you propose is important because it calls into question the nature of proprietary knowledge and could ultimately dictate the ability of restaurateurs to make a profit on their business.  Do you want me to open the kitchen next time and show you how I made the spaghetti?  Do you simply not trust anything I say ever again, even after I’ve learned that buttered noodles are not what the rest of the world would call “tasty spaghetti?”

But I’m not a restaurateur (even though that spaghetti is really good, you guys).  I am an academic.  In my professional life, I don’t have a restaurant, but I do have research ideas that could get funded and pay my salary.  I may not have spaghetti, but I do have a model that predicts professional level success from college level inputs.  But I didn’t create the model to be exclusively about football.  It could apply to any interdependent situation.  Football just happens to be an interesting place for me to apply certain ideas.  The fact that the “field” of “sport analytics” is a growing area of interest is also nice because it makes it easier for me to spread interest in my research.  The more people that are interested in my research, the more valuable my ideas are, which means I could eat and pay my mortgage thanks to said ideas.  However, sports analytics as a field has a problem.

About 10 days ago, about 2,000 sports executives, academics, and analytically minded people all took a long and expensive trip to the Sloan Sports Analytics Conference in Boston.  They went there, by many accounts, for the express purpose of not talking to one another.  Sure, conversation was had, jokes were made, papers were presented, and libations were served but by all the accounts I’ve read or listened to nobody officially talked “spaghetti.”  Which is ultimately the problem that sports analytics is developing.

In a competitive environment, having knowledge that no one else has confers an advantage.  Sports teams have picked up on this notion and begun to make their data, their data analytic tools, and their methodologies private and proprietary.  The message to people like me is clear:  Unless the process is secret, it isn’t worth much.  This is a problem for me as I would very much like to use my brain to get money to eat.

So here I sit, not quite sure where to go.  I have no incentive to give anyone details about the methodology, but won’t gain any credibility until I do.  This blog hasn’t particularly taken off in popularity.  I’ve tried to publish my methodology in academic journals but gotten three rejections, two of which were “Well, we think this is neat, but it’s not really a fit for our journal.” And I can’t exactly take it to a football team because 1) only secret knowledge is valuable and 2) NFL teams are not particularly incentivized to adopt useful advanced statistics in the first place.  Given this environment, exactly what is a quant to do?

The Analyst has No Clothes

Categories: General Info, Statistics
Comments: No Comments
Published on: September 7, 2013

I follow a lot of scouts on Twitter.  Mostly because the nonsense they spout makes me angry and I use that anger as motivation to write.  Once in a great while, though, you find a scout that does things the right way.  Or at least the way you would do things if you had the wherewithal to actually want to do that job.  Matt Waldman is in this latter group.  When Matt talks about his process, he makes me believe he’s got something valid.  He says all the right things and avoids saying the wrong things about how he goes about his craft.  You can tell there is something important going on under the hood.  Also, he’s a hell of a writer.  I have spent days studying how he constructs such compelling sentences.

The point is I respect the dude’s work, which is why I was a little disappointed to read this article on his blog about the process of scouting wide receivers.  It’s not the wide receiver scouting part that bothers me.  It’s the part when he talks about why he is not a fan of “analytics.”

I believe analytics have value, but the grading of wide receivers based heavily on speed, vertical skill, and production is an ambitious, but misguided idea. Further the application is the torturing of data to fit it into a preconceived idea and making it sound objective and scientific due to the use of quantitative data.

That quote was incredibly depressing to read.  Mostly because the reader can so easily tell what the word “analytics” means to an intelligent, quality focused scout.  The context around the word is dripping with disdain toward the self-serving, self-interested analyst.  It seems as though the people doing “analytics” that this author has met are more interested in notoriety and getting paid than delivering an accurate answer.  He goes on to make this point.

I’m trying to do the same from a different vantage point. The more I watch wide receivers, the less I care about 40 times, vertical results, or broad jumps. Once a player meets the acceptable baselines for physical skills, the rest is about hands, technique, understanding defenses, consistency, and the capacity to improve.

I liked Kenbrell Thompkins, Marlon Brown, Austin Collie, (retired) Steve Smith, several other receivers lacking the headlining “analytical” formulas that use a variety of physical measurements and production to find “viable” prospects. What these players share is some evidence of “craft”. They weren’t perfect technicians at the college level or early in their NFL careers, but you could see evidence of a meticulous attention to detail that continued to get better.

Take a look at that second paragraph.  He talks about headlining analytical formulas in reference to physical measurements like 40 times, vertical jump, and broad jump results.  Here is the heart of the issue.  Several places doing respectable analysis (pdf here) have tested whether or not things like 40 times, vertical jumps, and broad predict wide receiver production.  That sort is test is the exact thing that analytics can bring to the table.  Statistical analysis of 40 times, vertical jump, and broad jump results will tell you very clearly if the number is in any way meaningful.  And the answer that comes back repeatedly is the answer Matt has already arrived at.  They’re not useful.  Anyone that thinks they can predict who will be a quality wide receiver based on a 40 time is wasting their breath and your time.  So are there people out there really running around building predictive formulas on 40 times?  If there are, those people should not be listened to.  Furthermore, the idea that such people exist makes me feel like a biker gang member that sees a non-member wearing his clubs rocker.

There is a right way and a wrong way to do statistical analysis.  Knowing the right way is not a trivial thing that you can just dive into without training.  Somewhere, you need to learn the correct way to do it. There are lessons to learn and dues to pay and, to hear Matt talk about his experiences, there are people walking around pretending to have the cache that simply don’t have a clue.

You can see this when you read the ESPN story about the Jacksonville Jaguars “analytics” department.  From my perspective, anyone with a brain should have been able to shred those conclusions and recognize how ridiculous they actually were.  Thankfully, someone at ESPN has both a brain and the ability to write and did it for us.  It should not have taken someone in the press to recognize how terrible that analysis was.  The basic premise of any good statistical analysis starts with the notion that the analyst is wrong.  It is then the analyst’s responsibility to work through every other possibility to find the holes.  And once you reach a point where you can’t see the holes in your own work, you give it to someone else to find holes you can’t see.

Given what I’ve seen when I hear NFL people discussing the advice “analytic” people have given them, it’s no wonder that analytics is having trouble gaining respect in NFL circles.  It seems there are a bunch of people talking to NFL decision makers whose analytic methods should be severely questioned.  If what the Jaguars and some other NFL teams are doing with numbers is considered “analytics,” I’m not sure I want to be associated with that term.

Fargo, Roger Maris, and PEDs

Categories: General Info
Comments: No Comments
Published on: August 7, 2013
Photo Credit: Newman Outdoor Advertising

I make my home on the winding banks of the Red River of the North in Fargo, North Dakota. Yes, really, and yes we sometimes do talk like that but please don’t try to imitate the accent.  You won’t get the rhythm right.  Trust me on that.

Fargo is largely defined by its location in the natural world.  Every now and again, I travel outside the state.  When I tell people I meet I’m from Fargo, their first question has never not been about the weather.  The extremes of weather around here seem to be what everyone fixates on, both outsiders and residents.  Every year, the hottest day of the summer will have a high close to 100°F and the coldest day of the winter will have a low of -30°F.  Fargo, N.D. has been voted “America’s Toughest Weather City,” largely because residents of Fargo go to the voting websites and pump up the numbers as a source of civic pride.  But while the weather around here is outwardly treated lightly, it is always on people’s minds.  In Eat, Pray, Love, Elizabeth Gilbert says that every city has a word that every resident carries with them.  The word defines the background mindset of everyone you meet there.  In the book, Rome’s word is “sex.” I would argue that Fargo’s word is “winter.” Winter is very serious business around here.  I’m not even going to spell “serious business” in that weird, internety way because that would detract from the seriousness.  If you’re not paying attention to the weather on a daily basis, you could die.  Every year people lose fingers and toes to frostbite, homeless people unlucky enough to not make it into shelters freeze to death, and homes, photographs, memories, and people are lost to spring floods.

Now, if this description has made you want to move to Fargo and embrace this wintery existence, let me tell you one more thing about the people we respect and admire.  Fargo is the hometown of Roger Maris and embraces Roger Maris like no other hometown celebrity.  A wing of the local hospital is named after Roger Maris, and a yearly celebrity golf tournament which funds the hospital wing is held in his name.  When we moved into our house, the neighborhood welcoming committee introduced my north side neighbor as “Don, who played city-league baseball with Roger Maris,” a description that had some serious cache behind it.  The local mall has a museum dedicated to Roger Maris and the 61 home runs he hit during the 1961 season.  Think about that for a second.  An organization that makes money by renting space to retailers has decided to take some of their space and not let a retailer rent it.  Instead, that space is given to honor Roger Maris.

Around the city, you will get a very clear sense of where Fargoans place of Roger Maris in baseball history.  Many are unwilling to accept that Roger Maris’s single-season home run record has actually fallen.  At the local sporting goods store, you can buy a shirt that reads “The Record is Still 61.”  The local company that supplies billboard space has posted billboards around town with the words “Baseball’s Legitimate Home Run King” and Roger Maris’s picture printed on them (see picture).  The sign company donated these spaces.  Think about that!  An organization that makes money leasing space to advertisers built a sign space and then won’t let advertisers purchase that space.  It is more important to them that everyone within the city limits of Fargo know in no uncertain terms who “legitimately” holds that record.

Which brings us to the current performance enhancing drug scandal currently rippling through Major League Baseball.  Given where I live, I think it’s fair to say that I’ve been indoctrinated to hate users of PEDs.  When I hear about superstars being suspended for PED use, I have an immediate, emotional reaction.  I’m ready to put on my “The Record is Still 61” shirt, join the mob, and protest in a vocal but respectful manner (I am from Fargo after all) for the total removal of any record that these individuals ever played the game.  I want their names blasted off the temple walls like some usurped pharaoh of Ancient Egypt.  I refuse to even speak the name of the current 3rd baseman of the New York Yankees and if I ever attend a Twins-Yankees game in which he plays, I promise you that I will turn my back to the field when he bats.

I recognize that this reaction is entirely emotional.  When I stop to examine exactly why I have such a strong and frankly irrational reaction to the situation, I’m at a loss.  What is the difference between what we have classified as a performance enhancing drug and the chemicals we have classified as legal?  Why am I so angry at the ineffable Yankee(s), Giant, and Brewer but would really be okay seeing Pete Rose in the Hall of Fame?  What makes the PED crime greater to me than gambling?  Because let’s be clear about this.  It’s the PED use that triggers the reaction.  I see talking heads say they are outraged about the fact that “…he looked me in the eye and said ‘I’ve never done this’ and then this evidence comes to light.”  Blah blah blah.  I don’t care about the fact that they all lied about it.  Of course they all lied about it.  And excepting the case of denial, stupidity, or delusions we all knew they were lying.  Lying isn’t the outrage to me.  Using the drug is.

When I hear about PEDs being used in sports, it makes me question why we play the game at all.  It seems like a glorification of outcome over process.  I feel as though there is a “right” way to achieve one’s accomplishments and that the use of PEDs falls outside the scope of that right way.  Again, I can’t explain this reaction logically.  I can’t tell you why a shot of cortisone seems okay to me while a shot of testosterone does not.  All I can tell you is that the game of baseball now feels cheaper and more hollow to me.  Roger Maris can’t get a little love, but we all adore that guy?  What’s more, this feeling isn’t going away.

Fargo is a lot like Roger Maris in a lot of ways.  Outwardly seeming cold, aloof, and having a little too much pride in things that some people think one shouldn’t have pride in.  In Fargo’s case, we proudly describe a week of sub-zero highs as “normal January weather.”  In Maris’s case, he broke the single season home run record, but in a longer season.  At the time, people questioned the legitimacy of his record.  But inwardly, we see it a little differently.  I take pride in my state’s weather because it changes my approach to life.  There is no wasting a beautiful summer day around here because you don’t get many of them.  Roger Maris, to me, stands a symbol of what a man from my section of the world can accomplish.  What you can do when you wake up every day and simply go about your business.  And I don’t see using PEDs as “going about your business.”  It may not be logical or reasoned or rational, but to me, the record is still 61.

First Round Thoughts

Only five players that I have a number attached to were drafted in the first round.  This will be short…until we talk Vikings.

Pick #8:  Tavon Austin

The first wide receiver off the board is also my top rated wide receiver, for whatever that’s worth.  Not much to say here other than I think the Rams made the right choice.

Pick #16:  E.J. Manuel

The first quarterback taken in the draft goes to the Bills.  I like the pick in that I think he will do well in the NFL.  Our handy-dandy math equation tells us to expect Manuel to have a 74.79 NFL Passer Rating by the time his rookie contract is over.  I think Bills did well here.  They avoided the Geno Smith hype and got an interesting playmaker.  It will be interesting to see if they transition to a read-option offense or if they use E.J. as a more traditional quarterback.  I think they could do either.

Pick #21:  Tyler Eifert

I am not a fan of Tyler Eifert, at least as far as pass catching ability goes.  I think I might be alone in that opinion, but I’ll follow the numbers.

Pick #27:  DeAndre Hopkins

I have my wide receivers sorted by Completions Away from Average (CAA).  If you look at that list, you see Hopkins somewhere near the middle of the pack.  But I think that number doesn’t capture his true value.  One of the reasons I really hesitated about posting my wide receiver rankings is that the model really loves slot receivers.  This makes sense for two reasons.  First, slot receivers tend to be matched up against the 3rd or 4th best defensive back in the secondary.  They are working against the less talented, relatively speaking, members of the secondary.  Second, slot receivers typically run shorter routes than the guys on the outside do., so the throws are, relatively speaking, easier.  Combine those two together, shorter routes against the lesser talented defensive backs and you get a combination that loves completions.  So if you look down the list, a lot of slot receivers are at the top.

On the flip side, outside, deep receivers are punished by the model.  The passes are longer and more difficult to complete and are against higher quality defenders.  Given that, it is quite good for a receiver like Hopkins to get to #27.

Pick #29:  Cordarrelle Patterson

Here we go.  Now we’re going to get wordy and mildly upset.  I want to open by saying that I’m not upset with the outcome of this pick.  Cordarrelle Patterson may turn out to be an excellent wide receiver.  Or he may not.  Either way, that’s not what I’m upset about.  This isn’t about Cordarrelle Patterson being the “right pick” or not.  This biggest issue Vikings fans like myself should have about this pick is the process that brought it about.

I’ve read this argument in many other places (see Wages of Wins, Sloan Conference, etc. etc. etc) but it bears repeating here.  Sports are very outcome focused.  We talk endlessly about a particular player being the right fit or a can’t miss prospect, but that is the wrong way to think about the draft.  We should not be concerned about outcomes.  We should be concerned about the processes that are generating our outcomes.  Processes are what gets you sustained success.  Outcomes get first downs and touchdowns.  Processes build dynasties.  I don’t like the process that the Vikings used to make the 29th pick because it doesn’t reflect the reality of the draft.

What realities to I mean?  As economist Cade Massey says, the draft is dominated by randomness.  Predicting NFL performance from college data is remarkably difficult.  Even my own model only predicts 11% of the variance in NFL Passer Rating.  That’s very far away from lights out, sure thing picking.

Error Bars for my Quarterback Predictions

And even if you get it right, there is a chance your guy could get injured and all your perfect forecasting could be for not.  Massey goes on to recommend that in an environment dominated by tremendous uncertainty, your best option is to use as many selections as possible.  This is where the Vikings fell down horribly.

The teams that use the draft properly are teams that have accepted the random nature of the draft and act accordingly.  What you absolutely do not do is sacrifice three picks to move back into the first round.  There is absolutely no guarantee that this pick will work out well.  There is no guarantee that any pick will work out well.  It is a massive roll of the dice to give up all those picks and absolutely the wrong process to use in the draft.  The Patriots, our trade partner, are a team that gets the process right.  They accept that the draft is largely random, move back in the draft, add additional picks and wait for some of them to work out well.

That’s my breakdown of the first round.  Here’s hoping the Vikings hit on the selections they have remaining.  I think they gave up far too much to make a third first round pick.

Tweaking the Model

Categories: General Info, Statistics
Comments: No Comments
Published on: April 14, 2013

Apologies for the unexpected hiatus last week.  I had to fly to Seattle for a wedding.  But now I’m back in the Frozen Tundra and ready to tell you about some stats.

I’ve been adjusting the statistical methodology I use in the last couple weeks.  Nothing has changed about how I calculate the Completions Away from Average metric, but I have been changing how I use that metric to predict NFL success.  Be advised that this is another post that is heavy on technical details.  If you are not a technical reader, the take-home point of this post is that the changes will make my predictions better in the long term, but add more uncertainty in the short-term.

Change #1:  Dependent Variable

The DV I’ve been using for predicting is NFL Passer Rating after three years in the league.  I chose three years arbitrarily.  I thought, “Well…most draft prospects have a decent chance of playing after three years in the league, and we don’t want to get too far away in time from college because we can’t account for growth and coaching, so let’s use three years.”  Arbitrarily isn’t the best reason to choose a DV, so let’s use something with a little more meaning.  I’m going to stick with Passer Rating as it’s a reasonably good per-attempt statistic.  The per-attempt part is what I really care about.  Yes, there are problems with it, but I think the good outweighs the bad.

But choosing three years as an arbitrary time-period doesn’t make much sense.  We don’t have to change this much to make it meaningful, but we should change it.  Rather than Passer Rating after three years in the league, I went with Passer Rating after four years in the league.  Four years is much more meaningful because it is the length of all rookie contracts under the new CBA.  Thus, I’m predicting what a player is most likely to do during his rookie contract.  I think that is much more meaningful than what I was doing previously.

Change #2:  Prediction Model

Changing the prediction model solves two problems, which is handy.  When I have two problems, I like being able to solve them both at the same time.  So what are the two problems that need solving?

Problem #1:  Adding Data

The typical method of designing a statistical model is you have some theory, you collect data in a way to test that theory, and you use the data you collected to create a mathematical formula that minimizes the errors between your theory and the data.  In most cases, my statistical model included, the mathematical formula is fairly simple.  We draw a scatterplot of our data, and draw a straight line through the cloud of data points.  Then all we need is our 8th grade math skills to solve for the important values of that line, y = mx + b, where m is the slope of the line and b is where the line crosses the vertical axis (a.k.a. intercept).  We also know x, in this case Career CAA, for each individual player, so we just solve for y, in this case Passer Rating after four years in the league, and we have our prediction.  This process is called regression modeling.

But then what?  New players will always be entering the NFL.  Also, some of the players that I have in the model haven’t been in the league for four years, so their data will change.  But that’s not a problem, right?  We just keep collecting the data and everything should get even better, right?

What I’m about to say might surprise some of you that are not familiar with traditional methods of statistical modeling.  Continually adding data and using traditional regression techniques will actually make your predictions worse.  Regression models hate adding new data.  They want to take the information you give them the first time and do the best they can with it.  When you take the same model, feed in the old data plus some new data, the model is more likely to follow blind alleys and tell you that unimportant things might be useful predictors.  So the realities of the NFL make traditional modeling techniques less useful.

Problem #2:  Distribution of the DV

Traditional regression assumes a normally distributed dependent variable.  For most things, this is fine.  For Passer Rating this is not fine as passer rating isn’t normally distributed.  In a normal distribution, most cases are in the middle with a few cases on the high and low ends.  However, a per-attempt statistic like passer rating isn’t normally distributed because some quarterbacks won’t have many attempts.  For example, let’s look at Ryan Mallet this year.  He had 4 attempts this season, one of which was intercepted.  Now, the passer rating metric just takes what it has and extrapolates from that information.  The number that is Ryan Mallet’s 2012 passer rating assumes he would throw an interception every 4 attempts, which we humans know is not true.  If he was on the field and made 100 attempts, he wouldn’t expect him to throw 25 interceptions.  To our brain, that’s ridiculous.  But the math equation doesn’t know this.  It’s just taking what we gave it and spitting out a result.  So Ryan Mallett’s passer rating is currently 5.  Some players, like Mallett, have really poor numbers.  Other players, like Kirk Cousins, have really good numbers. We shouldn’t believe either number just yet because they haven’t had enough attempts to get a good picture of what they can do when they are on the field.  This is the nature of the beast in the NFL; some quarterbacks have passer ratings that are extreme compared to others.  But regression models don’t like extreme values.  Regression models like everything to be nice and normal.  Extreme values confuse the model and exert undue influence on the final result.

Solution:  Bayesian Robust Linear Regression

I won’t say too much about the Bayesian portion of this.  When I start talking Bayesian methods in class, my students tend to glaze over like fully fed zombies.  Baaaaaayes.  Regardless, the Bayesian portion solves Problem #1.  Bayesian analyses are built to accept new data in ways that traditional regression models are not.

The Robust Linear Regression part is the really cool part.  In this analysis, we can change the assumption about the distribution of our DV.  We can, for example, assume our values are distributed as a t-distribution, which means most of the values are in the middle, but extreme values are expected and dealt with easily.  So, let’s revise the predictions with this analysis.

New Model

We will set some very general prior predictions before we run the analysis.  We will assume that our DV is distributed as a t-distribution.  We will also assume that possible intercepts and slopes are distributed normally.  The analysis will calculate which degrees of freedom by estimating a value called tau.  Tau is conceptually, but not quite the same as degrees of freedom.  It’s a measure of how fat the tails are.  The closer to 0 this is, the better this analysis will do compared to traditional techniques.  If tau is > 30, we shouldn’t see much difference.  We assume possible values of tau follow a gamma distribution beginning at 1.

So what do we find?  Note, we’re using 4-year NFL passer rating as our DV rather than 3-year as in past analyses.


The first question is if we made a wise choice in assuming our dependent variable is t-distributed rather than normally distributed.  Our most credible estimate of tau from these data is 2.56.  Remember, the closer to 30 this is, the less this matters.  Since the most credible estimate is much, much less than 30, we should improve our predictions substantially with this procedure.

Equation of the Line

We’re still trying to find a line that best predicts our outcomes.  Bayesian regression is a little different in that it gives us lots of different lines that are all credible given the data we have.

In the initial estimates, we have a small sadness.  0 remains a credible slope.  This is what I meant above about increasing uncertainty in the short term.  There is a chance that everything I’ve been talking about is not true and this CAA metric isn’t worth anything.  The nice thing about Bayesian analyses is that they put a probability on this chance.  Given the data we have, there is about a 6% chance that I’ve been blowing smoke at you this entire time.  I’m willing to keep going with that, but you can make your own decisions regarding that.

To create the actual equation, I took the most credible intercept and the most credible slope and stuck them in the same equation.  Note that I’m still new to Bayesian estimation techniques and I’m not quite sure if this is kosher or not.  If I find that it’s not, I will revise the model.

New Equation for Prediction

4-Year NFL Passer Rating = 70.5 + 0.143 * (Career NCAA CAA)

I’ve added a new column to my predictions for the 2013 draft class.  Note that this doesn’t change their relative rankings any, just the estimate of their NFL passer rating four years from now.

Correction to Previous Post

Categories: General Info
Comments: No Comments
Published on: March 12, 2013

I mentioned last time that I was going to run a full ANOVA on the data from my post last time, including some nice confidence intervals and post-hoc tests.  Well, I did that and made these graphs.

This one for Career Receptions

This one for Career Yards

This one for Career Touchdowns

I looked at these graphs and noticed something hinky.  The confidence intervals for rounds 5, 6, and 7 overlap.  That can’t be if I have a significant ANOVA like I reported in the last post.

So I spent some time checking over my calculations and noticed an error.  My sum of squares calculation was pointed at the group standard deviation instead of the group mean.

Upshot of the whole thing?  The points at the end of the last post about the 7th round being better than the 6th are incorrect.  Instead, we see our expected pattern, a J-shaped distribution consistent with random selection of talent coupled with self-fulfilling prophecy associated with draft round.  Once we control for playing time, we should see no differences in performance across rounds, which we can begin to evaluate using Yards per Game Started

«page 1 of 2
Welcome , today is Saturday, June 24, 2017