Fancy Stats: What the bleep do we know?

If you've been watching The Fancy Stats Wars for the past 12 months, you'll know that they are over. The Fancy Stats have won and they're here to stay. And I, for one, welcome our new mathematical overlords. The stats revolution has completely changed the way people watch the game (ironically), the way players are evaluated, it's changed the way I do my playoff predictions, and most importantly it's changed the way I argue with people at the bar.

It used to be that a "Who Is Better?" discussion would come down to subjective arguments, or at best goals, assists, and other things that are easily skewed by linemates or environmental factors. No longer! Now I can just point to the CorsiRel% and say, "See, look how much better Karlsson makes his team compared to PK Subban!" and the poor Habs fan I'm talking to inevitably has no choice but to go back to his beer, soundly defeated. I get to end the discussion because I am right, and I've proved it.

However, lately something has started to bother me, and it all started a few weeks ago. I was talking to a friend of mine and he asked me "What do you think of Corsi?", and I said "I think it's good. I think it's very useful as a tool for player comparison." He responded: "So you think Jake Muzzin is a better player than Drew Doughty, then?" I thought, and said "Well, I wouldn't go that far. Corsi's useful, but no one is suggesting it's the be-all-and-end-all." This is a conversation that's played out a thousand times over the past two years on various corners of the internet, Twitter, and even the radio. Basically, anyone who says "Hey, this isn't consistent!" is accused of missing the point. "This isn't perfect!", people say. "Nobody's saying this is a magic bullet, but you'd be a fool to ignore it.", and to be fair, that appears to be true.

However, I've suddenly found myself aligning with the Fancy Stat skeptics over the past few months, and here's why:

"Why does Stat W say Player X is better than Player Y?" is the only question worth asking!

Seriously, why did Mark Giordano have a better 5v5 Corsi% and 5v5 CorsiRel% than Sidney Crosby last year? Is Kimmo Timonen better than Erik Karlsson because Corsi says so? Ok, the answer is no, but why not? It's the sort of thing that shouldn't be dismissed, because when you're trying to create a model, things that don't fit are important. They show that there's still work to be done.

Believe me, I know Tyler Dellow is sitting in front of a computer somewhere working on his Unified Theory of Hockey Stats, and that guy is so smart and dedicated I have no doubt he'll succeed. However, right now what we have available to us is limited, and no one seems to know where the limitations actually are. Not only that, but very few people seem to be willing to talk about it when I would argue that it's the most important discussion that can happen right now.

Statistics is, at it's core, the application of math to things with variance, and that makes it good for hockey. Hockey has so much variance it's like "Yo dawg, I heard you like variance, so I put some variance in your variance." Playoff series results can be chalked up to variance. Entire seasons can be explained by variance. I don't expect a stat to be perfect. Fenwick Close isn't perfect, but it's still the best predictor of playoff series we have right now. I've seen the graphs. Taking more shots is better than taking less shots. This case is closed. Signal, noise. Sunrise, sunset.

And yet, weird things crop up when you start applying macro stats at a micro level. Maybe video tracking will make a WAR for Hockey possible, but it's clear we're not there yet. As it stands, Corsi doesn't allow for direct player comparison without lots of uncertainty and necessity of context, and what context is most important is entirely dependent on who you ask. Is it zone start %? Is it Quality of Competition? Is it relative shot quality? How important are zone entries really? These are questions whose answers do not have consensus. This is why I tend to be irked when I see stats writers say things like "Stats don't lie". I will grant you that pure numbers are objective, but the interpretation of the usefulness of those numbers are about as subjective as you can get. Hell, people have been arguing over how to use and interpret statistics forever. Maybe stats don't lie, but people can rarely agree on what language they're speaking. Even a statement as "obvious" as "Plus-Minus is a bad stat and Corsi is a good stat" is still a subjective assessment.

I discussed this with another friend who has been coaching hockey for years. The conversation turned towards stats, and I said "Plus-Minus is not a good tool for player evaluation". His response surprised me. He said "I love plus-minus because it's easy. I can take that to a player and say 'Right now, I'm giving up more goals with you on the ice than I am with you off it.' and they get the message that they need to up their game." A coach who prefers Corsi would say that exact same phrase with the word "goals" replaced by the words "shot attempts". Corsi has a larger sample size than plus-minus, and that's what makes it a better stat, but they're both affected by factors that are outside the player's control.

Check out this recent exchange between Bobby Ryan and Sens Twitter's Own Senstats. Bobby Ryan's One True Stats God is shot quality, and you know what, he's kinda got a point. It's not surprising that a guy known as a sniper thinks taking a high quality shot is more important than taking any shot. It doesn't make him unenlightened, it just makes him a guy who focuses on a different part of the game than other people. Maybe shot quality and goaltending are difficult to sustain over the long term, but that doesn't mean they're not important.

So, if you're the sort of person who writes a lot about stats online, here are some things I am interested in:

  1. How important is this really? Seriously, we've got more stats than you can shake a stick at right now. What's so important about *this* one?
  2. What are you subjective biases? What do you think is important even if you can't prove it? Why do you think it's important?
  3. If you had access to God's Hockey Data, what would you look for and how would you do it?
  4. Counter-intuitive insights: What's something the data shows that you wouldn't necessarily expect to hold true?

Let's stop framing stats discussions as a battle between the Enlightened and the Neophytes. You can appreciate #FancyStats and still be skeptical of some of the conclusions they point to. Just because something is true at a large scale level, doesn't mean it's the most important factor in individual cases. This is the nature of the statistical beast.

Now for God's sake, get out there and be insightful!

This FanPost was written by a member of the Silver Seven community, and does not necessarily reflect the beliefs or opinions of the site managers, editors, or Sports Blogs Nation, Inc.