clock menu more-arrow no yes

Filed under:

What the Bleep Do We Know About Fancy Stats? Lots, Actually.

New, comments

A defense of stats without ever using stats

Patrick Wiercioch celebrates an extremely high quality shot
Patrick Wiercioch celebrates an extremely high quality shot
USA TODAY Sports

It was bound to happen at some point if things continued as they were going for the Ottawa Senators and their long suffering fans. At some point, someone was going to get worked up about #fancystats. By many measures, the team is off to a good start this season: a 6-3-2 record, a +4 goal differential and an unexpected bout of optimism. Unlike the lead-in to last year's ultimately forgettable campaign, this year most neutral observers had the Sens finishing well outside of the play-off picture. So one couldn't really fault the fanbase if they weren't entirely appreciative of the stats nerds raining on their parade. The feelings of many were encapsulated by none other than Bobby Ryan a few days ago:

Poor Manny.

There's plenty to unpack in what Ryan's saying, and a  big chunk of it was articulately conveyed by Hockey Twitter celebrity Luke Peristy in his post yesterday that you can read here. Luke poses valid inquiries, and he's encouraging the type of debate that's healthy in the hockey community. How do we evaluate our statistical tools and what are the limits of our knowledge? These are questions well worth asking.

The first critique, the most obvious, and the one that will always come up when a team's record is outstripping its underlying possession metrics is shot quality. How does Corsi account for quality? A shot from the wall isn't as valuable as a shot from the middle of the ice, after all. Shot quality can decide single games at times, but even then it is less influential than you might be inclined to believe. Writers across the hockey blogosphere have taken to tracking scoring chances by hand, and most use a variant of the "home plate" tracking system. From an old Copper n Blue post by Derek Zona:

A scoring chance is defined as a clear play directed toward the opposing net from a dangerous scoring area - loosely defined as the top of the circle in and inside the faceoff dots (nicknamed the Home Plate), though sometimes slightly more generous than that depending on the amount of immediately-preceding puck movement or screens in front of the net. Blocked shots are generally not included but missed shots are.

What's been subsequently documented by the likes of Eric Tulsky is the very strong correlation between Fenwick differential and scoring chance differential. Here I will dive into my somewhat subjective opinion because I think it's important to try to get at the why -- I believe shot attempt differential is so strongly correlated with scoring chance differential because there is an incredible level of parity among players. The difference between Mark Stone taking a shot from the slot versus say Max Pacioretty is slim to none. The vast majority of NHL players' finishing skills are remarkably identical. That is not to say there are not a few outliers on each end, say Erik Condra vs. Steven Stamkos, but the reason shot quality is less important than shot quantity is precisely because the margins are so thin between the vast majority of players.

(It's important to note that the phenomenon I am describing is almost certainly true exclusively of NHL hockey, and even then probably only for the last 12-15 years. You can make a very convincing case that shot quality mattered a lot more in the 1980's because there was a much greater range of talent playing in the league at the time. The average NHL player is uniformly exceptionally good today, and that just was not the case even 25 years ago.)

Further, I believe scoring chances are very closely correlated with shot differential because all teams try to play for shots in the high scoring areas and thus will exhibit similar shot location patterns. There will be some teams that suppress quality at a greater rate than their pure differential suggests, Florida for example has long been a goalie haven for that reason, but the vast majority will fall closely together.  Again, it's not that shot quality doesn't exist but rather that it's a small weight. Even in a single game, if Team A were to be out shot by as slim a margin as 30-25, I am willing to take an even stakes bet they lost the scoring chance battle because no team gets 30 shots from the wall while allowing 25 from the slot. This is the key concept that I want to highlight here: parity, across players, across teams, is much, much higher than one would intuitively believe. If a team could generate nothing but shots in the slot for Steven Stamkos that would be a huge advantage to them, but in the recent past essentially no team has been able to do that.

So, what to make of individual Corsi numbers? That's a different, but equally important question. Most any reasonable #fancystats advocate would suggest you read Corsi individual player numbers as a record of what took place while a given player was on the ice, and not a stand alone evaluation of their play. Especially in the instance of single game samples, it's very possible to have a number of events take place that  are outside of your control given the nine other players on the ice with you. So, I think it's fair to say something like "the Sens sure got whooped while Neil was on the ice tonight, -10 Corsi differential", it's probably unfair to say "Chris Neil sure sucked tonight, -10 Corsi differential" unless you have a whole lot more context to add to that statement. That being said, an individual player's Corsi number tells us a lot about the player very quickly. If the objective of the game is to win by outscoring the other team, and that you accept you are more likely to score more goals by having more shots/scoring chances, it's important that players have positive shot attempt differential in the medium-to-long term. One should absolutely contextualize the numbers as much as possible with things like: what was their quality of teammate, who did they play against, what was their zone deployment, etc but I see that context as adding richness to the analysis not discrediting Corsi as a number.

This isn't to say that there isn't more of great left to be learned because we are truly only scratching at the surface. My preferred analogy is that each more granular metric builds off what already exists, much like a pyramid. At the top of the pyramid is wins, which are fueled by goals, and goals are fueled by shot attempts. We know shot attempts are important, but, truthfully, we still don't know for certain what the biggest factors are in creating shot attempt differential. There's been some early, very interesting work done on the subject of zone entries which might seem to suggest gaining the line with control is a big part of deriving a positive shot differential. To me, the next frontier is figuring out the "why" of Corsi and these types of micro stats very well could be the way to do it. But at the end of the day there are really only two ways to score more goals than the other team: generate higher quality chances or generate more chances. That's it. In the NHL, in 2014, the relevant metric is volume and we should measure teams and players on their abilities to generate that volume. Corsi does just that.