The Massey Ratings and Women’s Football

For the past several years, the Women’s Football Alliance (WFA) has relied on Massey Ratings for guidance in determining playoff berths and the playoff seedings that decide home field advantage. To help clarify an often unclear process, I have taken it upon myself the past two seasons to provide weekly Massey updates, so that women’s football players and fans can see where their favorite team fits in the playoff picture at any given time. Because I provide these weekly updates, I often receive questions (and complaints) about the Massey Ratings.

Not surprisingly, every time a team doesn’t receive the ranking in the Massey Ratings that they feel they deserve, a few outspoken critics pop up out of the woodwork to blast the system. It never fails. I’m not opposed to such criticisms in principle, but what I do object to is that most of the time, these complaints are rooted in ignorance and a lack of understanding of what the Massey Ratings do (and do not) represent. Surprisingly, while I have commented extensively about the Massey Ratings on social media, after looking through my files I don’t think I have ever tackled the topic of the Massey Ratings in an article before.

So here it is…your guide to the Massey Ratings in women’s football. To be clear, I am a huge, huge proponent of the Massey Ratings and their use to help seed women’s football teams in a postseason format. After reading this article, I think you’ll better understand why.

An Introduction to the Massey Ratings

The Massey Ratings were developed by Kenneth Massey, a sports statistician who graduated from Virginia Tech. Massey applies his rating system to a very large number of sports leagues, men’s and women’s, college and pro. Click on the “Women” tab under the scroll-down and see just how many women’s leagues this guy covers. It’s pretty astounding, actually.

Massey Ratings have been applied to women’s football since 2003. You can see his final annual ratings for the WFA here and for the IWFL here. (And, just for kicks, the NWFA here and the WPFL here.) Just click on any year to see his full end-of-year ratings for that league that season.

The WFA is the only league that currently utilizes these Massey Ratings for playoff purposes. For one thing, you have to have a certain number of teams and data points (games) for Massey to be able to run its calculations, and I don’t believe the WSFL/USWFL has ever had enough teams and interactions to be listed by Massey.

The IWFL has steadfastly refused to use Massey for years now. The IWFL actually used it to guide their playoff seedings many years ago, and when discussing Massey, someone inevitably brings up a long-since-debunked rumor that the Supergroup left the IWFL in 2010 over the league’s use of the Massey Ratings. Believe me, the Supergroup had many reasons to leave the IWFL (hello, Kezia!), but their use of the Massey Ratings wasn’t a big factor…or they wouldn’t have made sure that the new league they left for now uses them as well.

Anyway, when you peruse these annual ratings, the first thing that should jump out at you is how remarkably accurate they are. This isn’t a fly-by-night system…Massey has been putting out rankings on women’s football for years, and his track record speaks for itself.

There are others who have tried to put together statistical women’s football ranking systems (*cough* Zermelo *cough*) that have yielded woefully, stunningly bad results over the years. In many cases, the people compiling them are, with all due respect, women’s football fans and not statisticians. Many women’s football fans still publicize and promote horribly flawed ranking systems, which can give all statistical ranking systems a bad name. (Here’s a tip: if your football rankings are considered “EZ”, they might be too easy. And hosting them on a free Wix website isn’t a good sign, either.)

Massey, on the other hand, is highly regarded in this field. The best part about Massey is that I (and you) can see the final results his system has yielded over years and years. Other ranking systems like Zermelo are so bad and are so rightly embarrassed about the results they have produced over the years (like once rating the Pittsburgh Force ahead of the Chicago Force) that they don’t host historical results on their site for all to see…the bad outcomes get hidden and washed away ASAP. Meanwhile, while there will always be disagreement with any ranking system, Massey seems to get it about as “right”, year after year, as any ranking can, and it stands behind the results it has produced over the years by making them accessible years after the season is over.

When people don’t agree with the Massey Ratings, they usually come after me because I am such an ardent supporter of their use, conceptually, by women’s football leagues. Let me be clear: just because I strongly support the use of the Massey Ratings doesn’t mean I always agree with the actual ratings and where they have decided to rank teams. In the long run, Massey has proven over the years that it tends to get it right almost all of the time. Again, you can look at the historical rankings of WFA, IWFL, NWFA, and WPFL teams above for yourself, and if you do, you’ll quickly realize that Massey is very, very good at compiling rankings that accurately reflect the competitiveness of league teams in a given year. That’s why the Massey Ratings, unlike so many others, is the one whose use I highly recommend.

But just because I support the Massey Ratings doesn’t mean I agree every single time with where every single team is placed. When you think about it, unless I were subjectively ranking the teams myself every single week, I would have a few disagreements with any external ranking system. That’s the beauty and subjective nature of sports.

What I really love about the Massey Ratings is that even on the few occasions when I don’t agree with its rankings, I can always understand why it’s rating teams the way it is. It’s like when you have a rational, adult debate with someone, and you don’t agree with them, but yet you see their point. Once you understand what Massey is taking into consideration in rating teams (and, just as importantly, what it’s not taking into consideration), you will reach a place of enlightenment in which you can sometimes disagree with Massey but still understand its reasoning and agree there is a solid basis for its rating…even if you disagree with the outcome.

Winning Isn’t Everything…and It’s Not The Only Thing

How does Massey work? Well, I was a statistics major in college, so I’m fortunate enough to be able to keep up with more of this than most. His system is proprietary, however, so no one but Ken Massey knows exactly how it works. But he gives you enough of a taste to get the flavor of it, anyway. Here’s a primer on Massey’s methodology.

For those of you who aren’t interested in such complexities, however, let me provide a Cliffs Notes version. Massey ranks teams primarily using five criteria: result (win or loss), strength of schedule, margin of victory, location of game, and date of game, with the first three being most important.

Winning is the most important thing, obviously…the more you win, the higher you will rank. A one-point win is much, much better than a one-point loss.

However, and this is the first thing that usually throws people, winning and losing isn’t the sole criterion. After all, if it were, there would be no need for the Massey Ratings at all…just rank teams based on their straight records and be done with it. And yet, every single year, someone is outraged when a 1-3 team is ranked above a 3-1 team in Massey. Seriously, every single year. It never fails.

What makes Massey a much better system is that it takes into account strength of schedule to rank teams. As the Massey site explains it: “[Massey’s] goal is to account for the differences in schedule. When there is a large disparity in schedule strength, win-loss records lose their significance…It is necessary to achieve a reasonable balance between rewarding teams for wins, convincing wins, and playing a tough schedule. This issue is difficult to resolve, and rating systems exist that are based on each of the extremes.”

Read the bolded statement again, because indeed they do, and disparities in schedule strength are wide in women’s football. Look, we all know teams in women’s football with inflated win/loss records, because they played a cake schedule (either due to geographic factors or due to lobbying for such opponents). And we similarly know teams that play really hard schedules whose win/loss records aren’t reflective of how truly competitive they are. This is a huge, huge problem in women’s football, and it’s the reason why using straight records to determine playoff berths and seedings is a very bad idea, especially in larger leagues.

The Ramifications of Relying on Straight Records

Imagine for a moment if straight win/loss records were used to determine playoff berths and seedings in a league like the WFA. A WFA Tier I team like, for example, Pittsburgh, would have no incentive – none – to play a fellow Tier I team; they’d simply load up their schedule with Tier II and Tier III teams, destroy them, and earn a perfect 8-0 record.

The D.C. Divas and Boston Renegades have a fierce home-and-home rivalry that dates back to 2010. But if, under our hypothetical scenario, Pittsburgh were virtually guaranteed to cruise to an 8-0 record every year under a system where only straight record matters for playoff purposes, why would either Boston or D.C. want to continue that rivalry? If the two teams split, they’d both wind up with a 7-1 record at best and below Pittsburgh in the standings…all because they dared to play each other twice rather than go with a less-challenging schedule.

Long story short, D.C. and Boston would refuse to play each other to give themselves a fair shot at the same 8-0 record as Pittsburgh and would follow their lead in loading up on nothing but Tier II and Tier III opposition. Chicago would follow suit, and very quickly, Chicago, D.C., Boston, and Pittsburgh would commit to never playing each other in the regular season ever again.

(And yes, I know that theoretically the WFA as a league sets every team’s schedule of opponents for them. But if you don’t think that teams can exert pressure on the league into getting a schedule that’s favorable to them – particularly four Blue Blood, Tier I teams – you’re living in a fantasy land. No Tier I team would willingly play a more difficult schedule than another Tier I team if playoff berths and seedings were determined by straight records. That’s the point.)

Now, under the above hypothetical scenario…how damaging is that for women’s football? D.C.-Chicago, D.C.-Boston, D.C.-Pittsburgh, Chicago-Boston, Pittsburgh-Boston…these are the games that fans want to see, the games that truly promote our sport, the games that we can show to outsiders and skeptics of women’s football and say, “Look, the level of competition in this sport is legit.” Why would we want to incentivize teams to not play those games? Why? That’s pure madness.

The bottom line is that using straight records to determine playoff berths and seedings incentivizes teams to play a weak schedule and avoid games against good teams, and that’s counter-productive to the sport as a whole. As a sport, we should be doing the exact opposite: rewarding teams and providing incentives for them to play tough schedules against tough teams, giving them credit for stepping up to a challenge and convincing high-level teams that they should be regularly playing each other.

The Massey Ratings provide exactly that incentive. Rather than simply ranking teams by straight record – which would be “EZ” but misleading – Massey takes strength of schedule into consideration when ranking teams. It rewards teams that play an aggressive schedule of outstanding opponents, and it conversely punishes those that do not.

To be fair, straight records actually work fine for smaller women’s football leagues like the USWFL. In those leagues, there is usually plenty of direct interaction between all member teams in any given geographic region, so strength of schedule doesn’t vary that much between teams. Furthermore, Massey needs a certain number of teams and games to create worthwhile ratings, so many smaller leagues often don’t have enough data to be listed in Massey anyway.

Ironically enough, this now also applies to the IWFL, which has only 15 viable teams competing in their 2017 season. For years the IWFL relied on straight records to determine playoff berths and seedings, even when it had far too many teams to be using that system. The IWFL’s refusal to use an impartial source to guide such decisions caused annual nightmares and an atmosphere where things were decided subjectively at the whim of the league’s leaders. Frederick and Disney abused that power and developed a reputation for subjectively rewarding some teams for loyalty and punishing others for disloyalty. It became outrageous, and they refused for years on end to listen to reason or to adhere to an impartial system for making playoff selections.

The end result is that teams have fled the IWFL in droves. In fact, the IWFL is so small this season that Massey doesn’t have enough data to create rankings for the league yet and may not be able to do so at all this season, which would be the first time in 15 years that the IWFL is too small to be tracked by Massey. It appears that after years of refusing to use any impartial source to guide their playoff berths and seeds, the IWFL has now shrunk to a size that fits their station. Funny how that works.

Anyway, here is a compelling link that outlines some of Massey’s considerations when calculating a team’s strength of schedule. I only include it because it’s a very interesting read, especially for those WFA teams who are wondering why their strength of schedule is what it is.

Basically, “strength of schedule” has a different definition for every team, based on how good that team is. A good team is deemed to have a hard schedule if it must play other good teams, while a bad team is considered to have a hard schedule if it does not play any other bad teams. Games between equally matched teams are more influential to a team’s overall rating. For example, the #1 team’s strength of schedule is better if it plays #2 and #40 than if it plays #20 and #21, because it’s much more likely to take a loss against the former schedule than the latter.

Using the Massey Ratings to help determine playoff berths and seedings encourages teams to schedule tough opponents and give fans the matchups they want to see. Using straight records encourages “race to the bottom” scheduling, while a system like Massey incentivizes teams to “reach for the top” when scheduling opponents.

Other Massey Factors

The other big consideration in the Massey Ratings is margin of victory, but it’s important to understand what that entails. The most important detail is that margin of victory becomes exponentially less and less of a factor the larger the margin becomes. Winning by 15 points instead of five gives a team much more of a margin of victory bump than winning by 75 as opposed to 65.

The impact of margin of victory fades out very quickly after 30 points, and by the time you get to about 40 points, margin of victory is basically irrelevant. Winning by 40 and winning by 100 is treated roughly the same in Massey. And again, winning by a large margin over an inferior opponent is nearly disregarded as opposed to winning by a large margin over a quality opponent.

Here’s how Massey explains it: “[Massey] does consider scoring margin, but its effect is diminished as the game becomes a blowout. The net effect is that there is no incentive to run up the score. However, a “comfortable” margin (say ten points) is preferred to a narrow margin (say three points)…in summary, winning games against quality competition overshadows blowout scores against inferior opponents.”

There are some teams that incorrectly believe blowing out their opponents will help them in the Massey Ratings. By and large, that doesn’t work. Those teams are usually ranked highly anyway (because they’re good enough to blow other teams out), but Massey has done a good job historically of not rewarding them for that specifically.

The two other factors Massey takes into account, albeit as minor considerations, are the location and date of the game. Each team gets a “home advantage” rating based on how they perform at home as opposed to on the road, and teams that win, perform well, or schedule games in tough road environments get a small bump based on that game’s location. Finally, a small bump is given to games late in the season as opposed to games earlier in the season; although every game counts, a slight bump is given to teams playing well late in the year.

Location and date of the game are both factors in the Massey calculation, but they are much smaller considerations than win/loss record, schedule strength, and margin of victory. Those are the three big ones.

The Past Matters…Until It Doesn’t

There is one other factor Massey uses to calculate its early rankings, and that is previous year’s results. But it’s a disappearing factor, which needs to be explained.

When I first started following the Massey Ratings for women’s football teams, Massey wouldn’t even come out with its first ratings of the year until about halfway through the season. Even in 2016, Massey didn’t come out with its women’s football ratings until Week 3 of the WFA season. The reason was because until Week 3 of the season, Massey didn’t have enough data points to put together a coherent ranking of teams…it needed several games to be played before it could start evaluating the relative strengths of teams.

Beginning in 2017 (and thanks to the critical mass of teams in the WFA this year), Massey started rating teams from the very beginning of the season for the first time. Massey came out with 2017 preseason ratings, and it released the Massey Ratings beginning in Week 1 of the 2017 season. How, you may ask, did it do this?

Well, Massey compiled a preseason rating for the 2017 WFA season for every team, based on their past performance in previous seasons. As 2017 game results began to roll in, Massey started to give that “previous season performance” factor less and less weight and make its ratings more and more dependent upon current season results. After a few data points had been obtained for each team, the “previous season performance” consideration was negated out of the formula completely.

Here is how Massey explains this factor: “Preseason ratings are typically derived as a weighted average of previous years’ final ratings. As the current season progresses, their effect gets damped out completely. The only purpose preseason ratings serve is to provide a reasonable starting point for the computer. Mathematically, they guarantee a unique solution to the equations early in the season when not enough data is available yet.”

The point is, by this point in the season, what a team did in previous seasons has no bearing on their Massey Rating for this season. Early in the season, before a team has played a relevant number of games, Massey takes into consideration how they performed in recent years to figure out how to rank them. But by the time a team has played a few games, how they performed in prior seasons ceases to be a relevant factor in their calculations.

It’s a reasonable way for Massey to provide meaningful rankings early in the season, as opposed to how it has operated in the past, which is just to wait until about halfway through the season to release its first rankings. But the key takeaway from this discussion is that by the time your team has played a few games this year, how you performed in seasons past ceases to matter to Massey. Every team is evaluated solely on this year’s merits.

This is important, because you often hear teams argue that because of their success in previous seasons, they should automatically be given the benefit of the doubt this year. Massey disagrees, and I disagree, too. Every year is a new year, and while most teams don’t change that much from year to year, every team has to prove itself all over again every season.

When ranking teams in 2017, you can’t point to your achievements in 2016 or 2015 or 2004…and I’m saying that as someone who works in the front office of the back-to-back WFA national champions in 2015 and 2016, so you know I mean it. Even my beloved D.C. Divas can’t be expected to be given an inherent advantage in 2017 just because we’re “defending” this or “reigning” that. A team needs to be evaluated on its own merits in terms of winning and losing and who they played this year…and if Massey thinks that the Cleveland Fusion have a better resume than the D.C. Divas so far this season, then so be it. No one is entitled to anything…you have to earn it.

What Doesn’t Matter To Massey?

Now that we’ve gone over what Massey does factor in, let’s quickly review a few things Massey does not factor in. First of all, as we just said, after a few games have been played in 2017 and a team has established a track record of achievement for the current season, their performances in previous seasons fly out the window. They cease to matter, as it should be.

Second, you’ll often see a team argue, “But we’re the most talented team in the league!” Again, that may be true…but that’s also subjective. Massey is only interested in objective data and in how all that talent manifests itself in game results against good opponents. Even if it manifests itself in blowout scores over inferior opponents, Massey isn’t going to reward a team for that, because it wants to challenge you to challenge yourself with good opposition. Massey doesn’t care about injuries, roster changes, or even the weather…unless and until it shows up in the game outcomes.

Third, the WFA now classifies teams into three “tiers” – Tier I, Tier II, and Tier III. While those designations matter with respect to the WFA’s postseason format, Massey doesn’t acknowledge them in any way. Massey knows that quite often a great Tier II team is better than an average or bad Tier I team, and that a great Tier III team is often better than an average to bad Tier II team. Massey completely ignores tiers and ranks WFA teams based on their merits, regardless of tier.

This is important because since the WFA debuted the tier system last year, you now hear some fans argue, “Our schedule is tough because we played X number of Tier I (or Tier II) teams!” It’s a faulty argument…you don’t get more credit for beating a bad Tier I team than you would for beating a good Tier II team, just because the former team was given an arbitrary “Tier I” label that has nothing to do with their on-field strength. It’s common for teams desperately trying to spin their strength of schedule favorably to boast about the number of bad Tier I teams they have beaten. But Massey’s going to rate a WFA team as strong or weak without any consideration to its tier designation.

Fourth, while it is often used for this purpose, Massey is not designed to predict what team is going to win in the future…its job is to tell you who has the best resume of achievement so far in the current season. Sometimes you’ll see a team ranked at spot #X and a team ranked one spot below them in spot #Y, and you’ll firmly believe that the team ranked #Y could beat the team ranked #X. And that may very well be true. But what Massey is arguing is that at this point in time, the team ranked #X has a better past resume of success this season than the team ranked in spot #Y. If you look at what each team has accomplished in a given season and take into consideration win/loss record, SOS, and margin of victory, you can usually see why Massey feels that way.

On that note, no matter what the ranking system is, there will be occasions where a lower-ranked team defeats a higher-ranked team. That’s inevitable, and when it happens, it doesn’t mean that Massey is flawed in any way. Just because one team has a better resume so far this season than another doesn’t mean it will automatically beat that team in the future. Maybe the higher-ranked team was overrated, and maybe the lower-ranked team was underrated, which can happen (especially early in the season) with any ranking system. Or maybe it was a genuine upset, where a weaker team just had a better day than a stronger one, which happens frequently in sports.

Regardless, every single year, a lower-ranked team will beat a higher-ranked team in Massey, and some critic will try to use that as proof that Massey is fundamentally flawed. Seriously, every single year. It never fails. But again, that will happen all the time with any ranking system, anywhere. (If it never happened, there’d be no need to play the games at all, right? Just give the win each week to the higher-rated team in your perfect ranking system.)

When a lower-ranked team in Massey beats a higher-ranked team, all that means is that a team with a heretofore better resume was beaten by a team with a heretofore worse one. In most cases (but not all), that win is enough to give the winning team a better resume than the losing team the following week, as Massey constantly adjusts its rankings as new data rolls in. Every week is a new week (just like every year is a new year), and past performance is not indicative of future results.

How SOS Dropped The Dallas Elite

I’m now going to roll this discussion right into an example of the most recent Massey “controversy”. In the latest ratings, the Dallas Elite were dropped to the #3 seed in the WFA’s Western Conference, which was met with outrage from some folks who feel that Dallas deserves to be ranked higher. Before I get started on that, however, I need to state something for the record.

There are some people out there that undoubtedly already think – or will think, after reading this – that I have some vendetta against the Dallas Elite. I know that any time I make a commentary of this nature, some people think I’m “attacking” their favorite organization. There are probably fans out there of every single women’s football team that feel I somehow don’t give their team enough respect.

Truth be told, I love Dallas. They’re an outstanding franchise and they have a remarkable team, year in and year out. I have had the good fortune to see them play twice in person, and trust me, they’re a dynamite team. They’re one of the true Blue Bloods of this sport, and I respect the heck out of them, their coaches, and their players. That’s the honest-to-goodness truth. I have a ton of respect for the Dallas Elite, and that’s true regardless of whether that respect is reciprocated back toward me or not.

As I mentioned, Dallas went from being seeded #1 in the Western Conference to #3 in the Western Conference this week, a move that left many in Dallas in an uproar. Massey’s rationale for docking Dallas in the rankings is astonishingly simple – it hates Dallas’ strength of schedule.

Going into last Saturday, Dallas had played three games against the Austin Outlaws, Houston Power, and Arlington Impact. Massey rated Arlington and Austin as a middling, decent teams, but it considered Houston to be outright bad. So that meant that Dallas had played two middling teams and one bad one…in Massey’s estimation, of course. (I want to be clear that I’m not calling these teams bad, although there’s certainly reason to argue their performances thus far this year have been less than stellar. But again, whether I personally agree or disagree with Massey’s evaluation of Austin, Houston, and Arlington…I can understand its point.)

Well, last weekend, Dallas stomped the Houston Power, 70-0…and dropped two seeds in the Western Conference race. On the surface, I understand the frustration of the Dallas partisans, but let’s take a closer look.

By playing the Houston Power a second time, Dallas’ 2017 schedule has now featured what Massey considers two middling teams and two bad teams. But it gets worse: the Austin Outlaws traveled to play the Arkansas Wildcats (another team Massey isn’t overly impressed by) and Austin was blanked, 26-0. As such, Massey changed its mind on Austin – a team it felt was middling last week – and now believes they are merely bad, too.

So coming into Saturday, Massey felt Dallas had played two middling teams (Arlington and Austin) and one bad one (Houston), but after Saturday, Massey felt Dallas had played one middling team (Arlington) and three bad ones (Austin and Houston twice). Dallas’ strength of schedule now ranks 47th among 65 WFA teams, the worst of any team ranked in the Top 20.

Now, Dallas did beat Houston, 70-0…but remember, Massey isn’t overly impressed with that extreme of a margin of victory, particularly over a team as weak as the Houston Power. Dallas’ only saving grace, schedule-wise, is a win over the Arlington Impact, who are currently ranked #32 overall in the WFA. Which is good, but not great, especially if that’s your best win.

(I’ve also heard folks contend that Arlington’s 72-0 victory over the “reigning Tier III champion” Acadiana Zydeco last weekend is a sign of Arlington’s strength. Again, Massey isn’t fooled. Acadiana’s 2016 performance and WFA3 championship has nothing to do with how good they are this year, and with all due respect to the Zydeco, they’re really, really struggling this year. The Zydeco are 0-4 against one of the worst schedules in the league and are currently ranked #62 – fourth from last – in the 65-team WFA. Arlington was expected to win decisively over a team like that.)

Long story short, Dallas may #BeElite, but their schedule is not. There’s no getting around it. We can briefly discuss why it’s not good, but let’s first come to an agreement that it isn’t.

All Too Familiar

It’s pretty clear that the Dallas Elite’s poor schedule is killing them in the Massey Ratings. And while their schedule is the issue, the way their roster is constructed is keeping them from finding a quick solution. I know I’m poking a hornet’s nest here, but we can’t have an honest discussion about the Dallas situation without addressing it.

This offseason, the Dallas Elite picked up Rasan Gore and Jamie Fornal from the Chicago Force; the year before that, they picked up Olivia Griswold from the Pittsburgh Passion. Dallas picked up a couple players this offseason from the D.C. Divas, too. Now, I want to be very, very clear – I’m not accusing Dallas of anything improper here. I don’t think player movement between Blue Blood teams is all that beneficial to the sport, but that’s a separate issue, and there’s certainly nothing illegal about Dallas picking up players from other good teams. I want to be clear about that.

However, the issue at hand is that Dallas needs to play high-quality opponents to have a shot at the Massey Rating they deserve. Obviously, Dallas is not averse to playing such opponents and even traveling to do so, as they proved by going out to D.C. for a regular season game last year. But it takes two teams to tango…you can’t try to force other great teams to play you.

While I give credit to Dallas for upgrading their roster with the acquisitions of Gore and Fornal, I can imagine that Chicago wouldn’t be too keen on playing a regular season game against their former stars. By the same token, do you really think Pittsburgh wants to go up against Dallas with Griswold on the other sideline?

And then there’s D.C., who played Dallas in the regular season last year. Not only did Dallas pick up a couple of players who were with the Divas last season, but there’s also that little matter of a certain “agent to the stars” baselessly impugning the credibility of the Divas’ 2016 national championship victory over Dallas. Dallas eventually put an end to that circus, but only after it lasted far, far longer than it needed to.

(I never actually addressed that whole brouhaha here, and I certainly don’t plan to now. At the end of the day, the D.C. Divas will forever remain the first back-to-back champs of the WFA. No one can take that away, and I’ve never been prouder to be a part of this organization. Carry on.)

Back to the subject at hand: I want to once again be very clear that I’m not accusing the Dallas Elite of anything improper with all these player movements. But when players from Chicago, Pittsburgh, and D.C. find themselves on the Elite roster, it’s no surprise when those teams lose interest in playing Dallas in a regular season game and going up against their former players, especially considering the travel distances involved.

Now, I’m sure Dallas could travel to Minnesota or Kansas City or even to the West Coast for a regular season game…but it might be cheaper for them to just hold serve in their region in the regular season and cross their fingers that they get a better playoff seed than those teams, despite their schedule. If it turns out that Dallas needs to travel for the playoffs, it’s no worse from a financial perspective than the alternative, which would have likely required them to travel in the regular season anyway. It’s kind of a wash.

I’ve heard some Dallas fans lament the high cost of playoff travel and the expensive prospect of possibly having to go to Minnesota for a playoff game. But let me remind everyone: Dallas plays only three regular season road games, and they are in Arlington, Austin, and Houston. Houston was probably the longest road trip at, what, four hours?

The point is that Dallas’ regular season travel budget is pretty light…I’d be surprised if they’ve had to shell out for a single hotel, flight, or charter bus in the entire regular season. So if they get hit with some playoff travel expenses, they’re not going to get a lot of sympathy on that front…they should be banking their money from their virtually non-existent regular season travel fund.

(Ideally, we could solve Dallas’ strength of schedule woes in the future by finding a Blue Blood team out there willing to play the Elite on a regular basis. Is there any Blue Blood out there who is dying to play a few high-profile opponents, maybe because they’re trapped in a collapsing IWFL at the moment? I see great potential someday soon for a Utah-Dallas rivalry…at least until a bunch of ex-Falconz players start popping up on Dallas’ roster. I’m only kidding. Kind of.)

How The West Will Be Won

Let’s not forget, too, that it’s still early in the season. Dallas was a #1 seed in the West last week; this week they are the #3 seed. Who knows where they’ll be next week, and certainly where they’ll be after Week 10 is still very much an open question.

There’s a lot of season still to be played, but that’s actually not all that true for Dallas, at least as far as the regular season is concerned. The bad news for the Elite is that I don’t see their schedule strength getting much better. They only have two regular season games remaining: against Austin and Arlington, who, as we’ve covered, aren’t highly regarded by Massey this year. Dallas is expected to maul those teams, and I think everyone in the sport would be surprised if it turned out otherwise.

The best analogy I can give is that the Dallas Elite are like a golfer who is already in the clubhouse on Sunday after having shot a six-under-par. Dallas’ problem is that there are two golfers out there – Central Cal and the Minnesota Vixen – who are still on the course with nine-under-par and six-under-par scores of their own. Dallas can do little to improve their standing at this point, given their schedule…all they can really do is kick back in the clubhouse and hope that the course swallows up their top two Western playoff competitors.

And that could happen. The Minnesota Vixen play the Chicago Force this week, and if the Force do to Minnesota what they have done to many, many a team throughout their history, Dallas is certainly within striking distance of the #2 seed in the West.

The Elite have much farther to go to catch Central Cal, but you know what? Central Cal deserves a little bit of respect. One of the most head-turning results of last weekend was that Central Cal walked into San Diego and handed the Surge their first home loss – ever – in seven years. And it’s not like it was a squeaker…Central Cal won, 44-7.

I have a ton of respect for Dallas, I really do, but Central Cal has a pretty darn strong resume at the moment, too. And I dare say it’s stronger than Dallas’ right now. Which is not to say that Dallas wouldn’t mop up with Central Cal if they played them in the playoffs…maybe they would, maybe they wouldn’t. Personally, I think it would be a great Western Conference final, if that’s what we eventually get, but there are a lot of teams that will have something to say about that before we get there.

Again, it’s early. Right now, Dallas is a #3 seed, but they could be a #2 seed as early as next week, and there’s still a lot of season to be played. If Dallas winds up as the #2 in the West with the prospect of playing the Western Conference final on the road…well, I don’t think playing their first playoff road game in their franchise’s existence will kill them. It’s certainly not worth a Massey mutiny. (Come on, there are a few Boston Renegades players who chuckled at that. Or groaned. Either way, it’s a reaction.)

One final note on the Dallas Elite situation…after obliterating teams 60-0 every week, I guarantee you that there are some folks in Dallas (who would never admit this publicly) who are secretly happy about the whole deal. Massey might give Dallas the greatest motivational gift they could ever ask for, and if it’s hard to get your team fired up and focused when rolling over inferior foes every week, there’s nothing like a low Massey ranking to give the ladies a chip on their shoulder. Not that they needed it, of course, but a good coaching staff will know how to use that to its advantage.

Is There A Better Way?

As you can see, while people may not always agree with the Massey Ratings, they serve a vital purpose in women’s football. Some people in disagreement with Massey have ignorantly suggested that the Massey Ratings are “fixed”…like Ken Massey is rubbing his hands together in Tennessee plotting against certain women’s football teams.

The Massey website has some good advice for those people:

“The implementation of a computer rating algorithm is completely objective. So if the computer gives your team a bad (or good) rating, it shouldn’t be taken personally. You have the right to disagree with the computer, but more than likely this is evidence of your own subjectivity. I do not meddle with the algorithm to “fix” the ratings. The model defines certain criteria that determine a team’s rating, and the results are published on this website without any human intervention.

Computer ratings have two main advantages. One, they can deal with an enormous amount of data (hundreds of teams and thousands of games). Second, they can analyze objectively – every team is treated the same.

This latter property is often a two-edged sword because it can cause disagreement with public opinion, which is stoked by the media. The public demands an objective system that plays no favorites and doesn’t encourage a team to run up the score. However, they don’t always agree with the inevitable consequences of such a system.

True, insufficient data can produce abnormal and flawed results. However, computers have no ego, so a good model will correct itself and provide remarkable insights long before a human will become aware of (and admit) his mistake. In general, it doesn’t take long for computer ratings to overtake a human poll in terms of accuracy and fairness.”

Along those lines, Coach Bleep brought up an excellent point, as he often does, in an online discussion. If you’re not going to use Massey to help decide playoff berths and seedings, what should you use? We already discussed how using straight records provides perverse incentives with respect to schedule strength, so what else is there?

Someone proposed using a human poll, specifically a Coaches Poll, as guidance in the matter. Honestly, I would love to see a weekly Coaches Poll, just like college football has. I think that’d be a great thing for promoting the sport.

Using such a poll for determining playoff berths and seedings, however, would be one of the worst ideas I have ever heard. (It’s not as bad as John Spatz’ idea several years back of leagues forcing teams to fit under an experience “salary cap”, where elite teams would be forced to kick veteran players off of their roster in the hope that the vets would relocate to lesser local teams and create parity. It’s not as bad an idea as that. But it’s close.)

I don’t even know where to begin enumerating the dozens and dozens of ways such a poll would be subject to manipulation and outright rigging by league coaches. First, let’s say Boston and Chicago are neck-and-neck for the top seed in the East, and the coaches get to decide. Obviously Boston is going to put themselves at #1…but why would they put Chicago at #2? Why not put them at #4? Or #10? Anything to kill their ranking?

People might say, “Come on…no coach would kill their credibility by doing that!” This is about playoff seeding, with thousands of dollars hanging in the balance. I’ve seen teams try to run up huge scores on inferior opponents, keeping their starters in and throwing bombs up by 60 points for the slightest Massey edge…you don’t think they’d try to manipulate the Coaches Poll with that much at stake?

So maybe we don’t let Boston and Chicago vote on #1 or #2 then. (I don’t know how you’d decide they’re the two favorites and exclude them, but let’s go with it anyway.) Well, D.C. still gets a vote. What if they’re in a close battle with Pittsburgh for #3? You don’t think Chicago would reach out and say, “Hey, if you vote us for #1, we can vote you for #3.” Don’t think that’s possible?

Meanwhile, let’s say (hypothetically) that Boston is more of a running team, while Chicago is a wide open passing team. Now let’s say you’re a Western Conference contender with a strong run defense but weak pass D. Clearly, Boston is a better matchup for you in a possible national championship game. Is there any question Boston’s going to get your vote for the #1 seed in the East?

Let’s not even discuss the logistics of the thing. Who’s voting? Who’s tabulating the votes? And if we can’t get teams to upload their game film and submit their player statistics to the league in a timely fashion, what in the world makes anyone think they’d fill out a Coaches Poll in a reasonable timeframe?

And let’s not forget…if the Massey Ratings say that Dallas is the #3 seed in the Western Conference, at least folks in Dallas are just mad at a dispassionate computer. If a Coaches Poll put Dallas at #3 in the West, Dallas fans would be furious with – oh, I don’t know – every other coach in the league. That’d be great for league morale. If there are accusations of a computer system being “fixed”, can you imagine the conspiracy theories that would be floating around right now if the decision was in the hands of rival coaches?

The Massey Ratings are the best possible way to fairly, impartially determine playoff berths and seedings. Point blank.

Massey’s Biggest Flaw

Is Massey perfect? No, of course not. No system is. And as I mentioned before, Massey often provides outcomes that I don’t agree with, but at least I usually understand its logic. Massey also is unable to consider the results of interleague contests in its calculations, which can penalize a team for scheduling a tough interleague matchup.

But Massey’s biggest flaw – by far – lies in how it deals with forfeits. Massey’s treatment of forfeits is astonishingly simple: Forfeited results are not factored into the computer ratings in any way. That’s it.

If a team forfeits, it’s like the game was never played. (Because it wasn’t.) No one is penalized, no one is rewarded, and although Massey catalogs and lists forfeits on its site, it plays absolutely no role in the rating calculations of teams. They are completely ignored.

One thing to note about forfeits is that I always score them 1-0, because a game literally cannot end with that score in competitive play. (Using American rules…which even the Montreal Blitz do!) However forfeits are scored, though, Massey just ignores them.

Then there’s the question of what you do when a team starts to play a game and then forfeits, which has happened more often this year than in any year I can recall. Is that a forfeit, or isn’t it? Personally, I hate – hate – the policy of giving an artificial “40-0” win to a team for a forfeit of a game in progress. Why is it that if you show up, play a couple snaps, and then the opponent forfeits, you win 40-0, but if they didn’t play those couple snaps, it’s 1-0 (or 2-0, or 6-0, or whatever)? Makes no sense.

My policy of scoring games – and what I feel should be the WFA’s rule – is that if a team cannot make it to halftime, it’s a 1-0 forfeit, even if there was some game played. At halftime or beyond, it’s an official game, with the final score at the conclusion of play standing. That’s how baseball does it; you have to play five innings, or four and a half if the home team is leading. It makes perfect sense here too.

A quarter or a quarter and a half does not a full game make, and it’s problematic to consider it one. If it’s a 14-0 game early in the second quarter and the game is called, it’s unfair to the winning team to be considered by Massey as having “only” won by 14…they would have won by a much larger margin if they had been given more time. But it’s strange to presume it would automatically have been 40-0, any more than you can presume it would have been 30-0 or 100-0. On the other hand, if you haven’t built a sizeable lead by halftime over a team on the verge of forfeiting, then that one’s on you.

How Massey Handles Randall Stevens Teams

Massey’s treatment of forfeits leads to several outcomes that often confuse fans. For instance, 67 teams made the WFA schedule and were submitted to Massey for ratings, but just before kickoff of the season, two of those teams – the Nebraska Stampede and the Central Florida Shine – announced they were forfeiting out their seasons. (I like to call these teams “Randall Stevens” teams…if you’ve seen The Shawshank Redemption, you might get the reference. These teams don’t exist…except on paper.)

Anyway, Nebraska and Central Florida were already submitted to Massey for ranking purposes, so they continue to be listed in the Massey Ratings. Fans might expect these two teams to drop straight to the bottom of the rankings, because they are “losing” (forfeiting) every game. But that’s not what happens. Remember, Massey ignores forfeits…so in Massey’s eyes, Nebraska and Central Florida aren’t winless, they just haven’t yet played a game. They are both literally 0-0 in Massey’s eyes.

(By the way, Nebraska’s 0-0 is ranked higher than Central Florida’s 0-0. I’m fairly certain this is because, again, teams’ previous season results are used to help rank them in the preseason, until there are enough data points from this season to provide an accurate rating. Right now there are no 2017 data points for Nebraska or Central Florida, so Massey is rating 0-0 Nebraska – who has played in prior seasons – ahead of a 0-0 expansion Central Florida squad.)

Obviously, with a 0-0 record, neither Nebraska (currently #49) or Central Florida (currently #63, or fifth from last) are all that highly ranked. Still, it really bothers teams that are ranked below Nebraska and Central Florida that they’re behind a team (or two) that is forfeiting out the year. But again, because of how Massey handles forfeits, it doesn’t know these teams have announced they are forfeiting out the year…it just thinks they, for some reason, haven’t started their season yet.

So Nebraska and Central Florida don’t drop straight to the bottom; instead, they hover around the arithmetic mean. They aren’t seen as total losers, just teams that haven’t started their year yet and, as such, have no 2017 track record. But since we all know they are forfeiting out the year and not part of the WFA’s 2017 season, they can be completely ignored. They, like the forfeits they’re made up of, don’t really exist…except on paper.

(This should go without saying, but I’ll say it anyway…teams that are forfeiting out the year can’t make the WFA playoffs. Of course, that presumes that a 0-0 team would be in a position to make the playoffs at the end of the year, which I would find hard to believe. But if it did happen, they would forfeit the spot, and it would go to the next team in line. Can’t believe I have to state that, but there you go.)

Another thing that bothers fans is that teams often forfeit and move up in the rankings the following week. Why does that happen? Because in Massey’s eyes, teams that forfeit aren’t penalized…they aren’t rewarded either, but they aren’t penalized. It’s just like they had a bye week. And teams often move up or down in Massey during their bye weeks, as their strength of schedule is adjusted on a weekly basis as new game results roll in.

By the way, in years of observing the Massey Ratings, I haven’t found that a strong team having an opponent forfeit on them hurts all that much. Usually the team that forfeited was not highly rated, and since Massey considers the game to be unplayed, the strong team now appears to have a much better overall strength of schedule. It costs them an official win, but from the stronger team’s perspective, forfeits rarely hurt them.

Strategic Forfeits

The bigger question is, “Is there a strategic advantage to forfeiting a game?” And the answer to that is, “Probably not…but maybe.” The idea of a strategic forfeit has been floated for years. I recall having a pretty enlightening discussion about it several years ago on Dion Lee’s board, before he shut it down and killed off several years’ worth of interesting women’s football conversations.

Anyway, back in 2014, I actually discussed the possibilities of a strategic forfeit with Militia Cheerleader as the Boston Militia were clinging to the top seed in the Eastern Conference. The D.C. Divas were not far behind and set to host Boston in the 2014 regular season finale, but Militia Cheerleader was trying to ascertain if a strategic forfeit would actually lock up the top seed for the Militia.

We never did come to a definitive answer, and ultimately, we were both aware that our conversation was largely academic. We both knew that the people involved in the Boston organization are far too competitive to ever actually consider doing that, anyway. (Boston came to D.C., won by a single point, got home field advantage throughout the playoffs, and stormed to the national title.)

Whether or not a forfeit could help a team in Massey is usually unclear at best. People usually suggest forfeiting against a strong opponent, but since Massey factors strength of schedule heavily into its calculations, taking a strong opponent off of your schedule kills your SOS and seems counterproductive. On the flip side, no one wants to give up the possibility of an easy win against a weaker opponent.

So I don’t know. Forfeiting against a strong opponent I think could yield a strategic advantage under the right circumstances…especially if you know you would otherwise get killed if you played the game. But even then, you lose the SOS benefit. Might still be worth it, though. The problem, of course, is that you’re presuming you would get blown out and not surprise yourself by actually playing a competitive game.

I heard one coach jokingly suggest that maybe they could forfeit the rest of their games and still make the playoffs. Well, perhaps. But in all seriousness, there are three obvious problems with that strategy. One, I’m not certain that it would work, and heaven help you if you tried it and then came up just short of the playoffs. Two, you might have been able to make the playoffs anyway by actually playing your games, which means you deprived your team of competition for no reason. And three, how are you supposed to recruit players for next year if you start massively forfeiting games this year? That kind of defeatist attitude isn’t conducive for recruiting players in the long run. Playing games is the whole reason they pay money and spend months training…you don’t deprive them of that just to try to game the Massey Ratings.

With all that said, forfeits are one of the very few areas where I’d like to see the WFA overrule Massey for its playoff berths and seedings. It’s too late to do anything about it this year, but for future years, I’d propose a rule like this:

Playoff berths and seedings are determined by Massey Ratings, except:
a) Teams that forfeit one game are automatically dropped one seed in the playoffs from their usual Massey Rating. If they were the last seed in the playoffs, they are replaced by the highest-finishing team that did not forfeit, and
b) Teams that forfeit two games are automatically ineligible for the playoffs.

That would not only remove any incentive at all for a strategic forfeit but would also ensure that any team making the playoffs has enough players on the roster to actually finish the playoffs.

Your Guide to the Massey Ratings

That’s a brief (or not so brief) overview of the Massey Ratings. Here are the main takeaways:

Using straight records to rank women’s football teams encourages “race to the bottom” scheduling that hurts the sport as a whole.
Because schedules vary so widely in women’s football, strength of schedule has to be factored heavily when ranking teams in larger leagues.
The Massey Ratings provide a system for doing this and have a decade-plus track record of rating teams in women’s football leagues in a very accurate manner.
The Massey Ratings are an impartial way of determining playoff berths and seedings, allowing these decisions (which have thousands of dollars at stake) to be made in an objective manner, free of human bias, favoritism, and interference.
The biggest area where Massey Ratings could be improved at the league level is by assessing postseason penalties for forfeiting games, removing any possibility of gaining an advantage by a strategic forfeit.

Look, I understand that no system of ranking teams is going to be without controversy. But women’s football has unique challenges that require unique solutions, and the sport is fortunate to have the Massey Ratings around to provide clarity and create the most equitable, impartial postseason setup possible. While occasionally criticized by some within the sport, the Massey Ratings serve a vital role in incentivizing the high-level games women’s football needs to grow, and that’s something every women’s football fan can get behind.