I was checking on the NHL standings this morning. Now, the Blues weren't expected to have a great year, so when they started 4-1-0 everyone got excited. Now we're 6-8-2 and only 1 pt ahead of Florida for the worst record in the league.
Anyway, as I was looking at the standings I got curious about overtime losses and how frequently they occur.
Background: in the NHL tied games would go to a five minute overtime. If anyone scored, they instantly won the game. Otherwise, it ended in a tie. Since a win gets you 2 pts, a loss 0 pts, and a tie is worth 1 pt, essentially both teams would hedge their bets and more-or-less run out the clock so as to keep their hard-earned point. Among its many initiatives to make the game more interesting, the NHL now gives 2 pts to a team that wins in OT but 1 pt to a team that loses in OT to facilitate "trying" to win, and has also added a post-OT shootout so that no game can end in a tie.
So I was looking at the standings, wondering if anything made some teams more likely to rack up OTL's than others, and if I could find it without looking very hard for data. I thought it was reasonable that teams with about the same number of wins as losses would simply go to OT more often, and thus lose in OT more often. So I took all the data from the three years this rule has been in effect and regressed OTL against the absolute value of (W-L).

Nothing says it more clearly than this picture. There is no correlation. The p-value is 0.954. R-sq, the measure of how much of the variability in one variable is explained by the other, is 0.00003. So teams around .500 have the same number of OTLs as teams at the bottom or top of the standings.
Now here's where I got perplexed. Just for kicks I ran the same regression, just without the absolute value. So now the question is whether better teams have more OTLs or fewer OTLs.

Clearly, there is a strong relationship here. It turns out that the p-value is 0.04, though the r-sq is still only 0.05. The equation is OTL = 9.65 - 0.04*(W-L). So better teams have fewer OTLs.
Unfortunately, it took me until I was preparing this last graph to realize that of course they do! Because W means wins but L means regulation losses. Or put another way, "OTWs" are included in W.
So say every team played 100 games with 45 wins, 45 losses, and 10 going to OT. Half the teams go 7-3 for the year in OT and half the teams go 3-7. So half the teams have a record of 52-45-3 and half the teams have a record of 48-45-7.
So the two data points are {(7,3) and (3,7)}. In other words, even when I cooked the data to come out even it didn't, so my model can't be used.
So I screwed up and put a confounding variable in my model. Oh well. If I figure out an easy fix I'll update, but at this point I'm not scrapping the whole post.
re: that article: I hate women. Except you, Carly. And Tina Fey.
ReplyDelete