So why include them at all? I think PWR and KRACH are really good places to start when evaluating teams. In my mind there are three important factors when judging a team: their record (i.e. whether or not they won or lost), the context of those wins and losses within strength of schedule, and how they won or lost. KRACH and PWR address the first two (I think, more on that later), but not the third.

(Also, it’s important to remember that every single formula, metric, and algorithm will contain bias because at the end of the line there is still a human being creating them who is biased in terms of which inputs they think matter and which formulae are useful in manipulating the data.)

But to the title…

1. No one has ever given me an answer to what I think is a very important question about PWR and KRACH.

We know from the pandemic, and from any season early on that PWR and KRACH need two things to be accurate — a large enough sample size, and enough out of conference play for the rating of teams from different conferences to be meaningful. Early in the season, and during the 2020-2021 season which had one (I think) out of conference series KRACH and PWR were just noise.

Which begs the question: how large of a sample size of out of conference games is necessary to have a high degree of confidence in KRACH’s or PWR’s accuracy? For a given team, for a given conference, and across the body of college hockey as a whole?

And as a follow up, how close are the teams that play fewer out of conference games due to more conference games (the WCHA, Hockey East), starting late (The Ivy League schools), or simply not wanting to (NEWHA) to a level where their PWR/KRACH standing is less accurate?

This is especially important to parse for the WCHA because 1. they usually have the top teams, and 2. their schedules are even more insular because most of them play the same out of conference opponent, Lindenwood.

If a team already has a small OOC schedule and they cancel a series, or they simply don’t schedule as many games, where is the point where we start to lose trust in their PWR standing? THIS METRIC DECIDES WHO GOES TO THE NCAA TOURNAMENT, WHY DOES NO ONE WANT TO ANSWER THIS??

There is probably more of a sliding scale than there being a specific point where the rankings are useful or not useful. If teams play a million out of conference games, we are very confident in PWR or KRACH ranking teams against others out of conference. If teams play 20 out of conference total across all of college hockey, where are we? 60% confident? 80%? If PWR is going to decide who makes the NCAA tournament, there should be an answer to this question.

2. Post and St. Michael’s this Season

Post and St. Michael’s have split with one another and St. Michael’s has a draw against Franklin Pierce making Post 1-8-0 and St. Michael’s 1-8-1. Yet PWR and KRACH both rate Post above St. Michael’s meaning that Post is being rewarded for losing because they have simply had better teams on the schedule. Remember, PWR and KRACH only care about wins and losses. Post could have lost all 8 games 10-0 and St. Michael’s 1-0 and Post would still be higher.

So PWR and KRACH can be manipulated by schedule.

This one is admittedly less of an issue — we’re talking about #41 and #42 here. If this mattered between teams making the NCAA tournament or not it would be between teams that still won a bunch of games and had put together a decent enough resume. And this will, presumably even out with the completion of the full conference schedule.

What is the answer then?

I have long been a critic of deciding teams’ relative strengths solely by wins and losses. That said, the more games that are played, the more I trust wins and losses (contextualized within strength of schedule) to accurately rate teams. PWR also takes into account head to head matchups between teams, which I think is also important.

After that, I think there needs to be a clear explanation of how much out of conference play is needed for PWR to be highly accurate. If that can’t be done, or if the number is such that a lot of teams are near the threshold season to season, I think there needs to be a goal differential metric (also contextualized within strength of schedule, essentially GRaNT.)

This mirrors how teams are ranked within their conferences and which tiebreakers are used in the event of ties.

  1. Wins and losses (teams usually play the same conference schedule aside from not being able to play themselves, so no need to contextualize)
  2. Head to head wins or losses (already accounted for by PWR)
  3. Goal differential

Alternatively many conferences use record against some number of the top teams before or instead of goal differential. I think this is not as good because then you have to decide which number of top teams is most useful. Is it 10? 11? 15? You also have to account for a disparate number of games by different teams against whatever that number is. Taking the PWR or KRACH formula and applying it to goals for and against instead of wins and losses seems simpler, less biased, and more accurate.