www.chronicle.com /article/the-rankings-farce

The Rankings Farce

Colin Diver 20-25 minutes 4/6/2022

On July 1, 2002, I became president of Reed College in Portland, Ore. As I began to fill the shelves in my office with mementos from my previous life as a law-school dean, I could feel the weight already lifting from my shoulders. “I’m no longer subject to the tyranny of college rankings,” I thought. “I don’t need to worry about some news magazine telling me what to do.”

Seven years before my arrival at Reed, my predecessor, Steven S. Koblik, decreed that Reed would no longer cooperate with the annual U.S. News Best Colleges rankings. As a practical matter, this meant that college staff members would no longer have to invest hours in filling out the magazine’s annual surveys and questionnaires. Most importantly, it signaled that Reed would no longer be complicit in an enterprise it viewed as antithetical to its core values. And it would no longer be tempted to distort those values to satisfy dubious standards of excellence.

The fact that Reed had taken this rebellious stance was one of many features that attracted me to apply for its presidency. I took it to be a statement that Reed viewed education as a path to a genuinely fulfilling life, not just a ticket to a high-paying job. The college defined its goal as imparting learning, not just conferring credentials. It measured itself by internal standards of academic integrity, not just external applause.

There is a growing cottage industry of college evaluators, many spurred by the commercial success of U.S. News. I call it the “rankocracy” — a group of self-appointed, mostly profit-seeking journalists who claim for themselves the role of arbiters of educational excellence in our society. It wasn’t just the U.S. News rankings that were incompatible with Reed’s values. Virtually the whole enterprise of listing institutions in an ordinal hierarchy of quality involves faux precision, dubious methodologies, and blaring best-college headlines. To make matters worse, the entire structure rests on mostly unaudited, self-reported information of dubious reliability. In recent months, for example, the data supporting Columbia’s second place U.S. News ranking have been questioned, the University of Southern California’s School of Education has discovered a “history of inaccuracies” in its rankings data, and Bloomberg’s business-school rankings have been examined for perceived anomalies.

Reed College’s rebellion against rankings was a sign that it viewed education as a path to a genuinely fulfilling life, not just a ticket to a high-paying job.

Maintaining Reed’s stance turned out to be more of a challenge than I had realized. Refusing to play the game didn’t protect us from being included in the standings. U.S. News and its coterie of fellow rankocrats just went ahead and graded the college anyway, based on whatever data they could scrape up and whatever “expert” opinions they could sample. Every once in a while, when I saw that U.S. News had once again assigned us a lower number, I would feel those old competitive juices flowing. In moments like that, I had to take a deep breath or go for a walk. And throw the magazine into the trash.

I came by my rankings aversion honestly. In 1989, I became the dean of the University of Pennsylvania’s law school. The next year, U.S. News began to publish annual rankings of law schools. Over the next nine years of my deanship, its numerical pronouncements hovered over my head like a black cloud. During those years, for reasons that remained a complete mystery to me, Penn Law’s national position would oscillate somewhere between seventh and 12th. Each upward movement would be a cause for momentary exultation; each downward movement, a cause for distress.

My admissions dean reported that prospective applicants were keenly attuned to every fluctuation in the annual pecking order. So were my alumni. If we dropped from eighth to 10th, alumni would ask what went wrong. If we moved up to seventh, they would ask why we weren’t in the top five. Each year, Penn’s president would proudly present to the Board of Trustees a list of the university’s schools whose ranking numbers had improved. (She’d make no mention of those whose numbers had slipped.)

During that time, I also served as a trustee of my undergraduate alma mater, Amherst College. By the standards of the rest of the world, U.S. News treated Amherst very kindly, almost always placing it in the top two liberal-arts colleges in the nation. Amherst was far too genteel to boast publicly. But the topic often arose at the fall meeting of the Board of Trustees, right after the release of the latest U.S. News Best Colleges edition. If Amherst came in second, someone would always ask, “Why is Williams College ahead of us again?” I came to understand that, in the world of college rankings, everyone feels resentment, frustration, and anxiety. Everyone thinks they are being treated unfairly, except during those fleeting moments when they sit at the top of the sand pile.

Like many members of my generation, my education in gourmet cooking began by watching Julia Child’s syndicated TV show, The French Chef. Judging by my occasional attempts at haute cuisine, I was not a very good student. But I do remember one important lesson: When you combine a lot of ingredients into a stew, you want to bring out the flavor of each one. You should still be able to taste the bacon and the porcini mushrooms in the beef bourguignon.

Everyone thinks they are being treated unfairly, except during those fleeting moments when they sit at the top of the sand pile.

The art of composing a college ranking is like preparing a stew. You select a group of ingredients, measure each one carefully, combine them in a strict sequence, stir, cook, and serve. If you do it just right, you might end up with a delicious, classic French dish. If you do it badly, you end up with gruel.

The rankings of U.S. News and its followers typically produce gruel. A careful look at the “recipes” for preparing these rankings shows why.

To create its 2022 listings of national universities, for example, U.S. News combined 17 different ingredients, grouped under nine headings (graduation and retention rates, social mobility, graduation-rate performance, undergraduate academic reputation, faculty resources, student selectivity, financial resources, alumni giving, and graduate indebtedness) to produce an overall score for each ranked college, on a scale of one to 100. The data fed into this recipe derive from replies to the magazine’s annual statistical questionnaires and peer-evaluation surveys. Most of the quantitative information is also available from the U.S. Department of Education, but some (such as class-size data or the alumni-giving rate) is not. Since there is often a time lag in federal reports, U.S. News takes some pride in publishing data that are, in at least some instances, more current.

In a practice begun back in 1997, U.S. News adjusts some of the metrics in its formula in an attempt to measure institutional value added. These calculations use proprietary algorithms to estimate the extent to which an institution’s performance on a particular criterion is higher or lower than one might expect, given the distinctive characteristics of the institution and its student body. For example, in addition to calibrating the raw overall graduation rate, U.S. News also includes something called “graduation-rate performance,” to reward institutions, such as Berea College, that achieve a higher graduation rate than might be expected, given the academic preparation of their students.

illustration of gold badge with number 1 with black X

Tyler Comrie for The Chronicle

Other comprehensive rankings have used formulas that are broadly similar to those used by U.S. News. For its 2022 edition, the Wall Street Journal/Times Higher Education rankings employed 15 measures, grouped under four headings (resources, engagement, outcomes, and environment). Some of its factors (graduation rate, for instance) are also used by U.S. News. Several others, such as various survey-based ratings of student engagement and postgraduate salaries, are more distinct. Washington Monthly divides its rankings into three portions, each comprising many factors, while Forbes uses well over a dozen (including an institution’s alumni representation in the Forbes “30 Under 30" list). Niche, a platform that both recruits for colleges and helps parents and students find the right institutions, surely wins the prize for formulaic complexity, by somehow managing to incorporate over 100 ingredients (via Bayesian methods and “z-scores”) into a single ordinal list of 821 best colleges.

Taken individually, most of the factors are plausibly relevant to an evaluation of colleges. But one can readily see that any process purporting to produce a single comprehensive ranking of best colleges rests on a very shaky foundation.

Problem No. 1: Selection of Variables

How do rankocrats decide what to include or leave out in their formulas? What we call a “college education” has literally hundreds of dimensions that could potentially be examined. While there is widespread agreement about the general purposes of higher education, when it comes to rankings, that consensus quickly dissolves into argument.

Why, for example, does U.S. News look at spending per student, but not endowment per student? Why does it measure faculty salaries but not faculty research output? Why does it calculate graduation rate but not postgraduate earnings? Why do some rankings systems include racial and ethnic diversity, while most ignore it? Indeed, why do some formulas use just a handful of variables, while others incorporate dozens or even hundreds? At best, the rankers give vague replies to such questions, offering no supporting evidence for their preferred variables. Very rarely do they explain why they have left out others, including those that their competitors use.

Problem No. 2: Assigning Weights to Variables

Equally arbitrary is the process of determining what weights to assign to the variables. The pseudoscientific precision of the mathematical formulas used in the most popular rankings is really quite comical. For 2022, U.S. News decreed that the six-year graduation-rate factor was worth precisely 17.6 percent in its overall formula, and the freshman-to-sophomore-year retention rate, exactly 4.4 percent. Washington Monthly somehow divined that its Pell graduation-gap measure (comparing the graduation rate of lower-income Pell Grant recipients with non-Pell recipients) factored in at 5.55 percent of its overall rating, while a college’s number of Pell students receiving bachelors’ degrees deserved a measly 2.8 percent.

U.S. News has long been well aware of the arbitrariness of the weights assigned to variables used in its formulas. In 1997, it commissioned a study to evaluate its methodology. According to Alvin P. Sanoff, managing editor of the rankings at that time, its consultant concluded: “The weight used to combine the various measures into an overall ranking lacks any defensible empirical or theoretical basis.” The magazine evidently just shrugged its shoulders and kept right on using its “indefensible” weighting scheme. As have all the other formulaic rankers, one strongly suspects.

Problem No. 3: Overlap Among Variables

A third problem is the degree of overlap among variables — a condition statisticians call “multicollinearity.” In statistical terms, the ranking formulas purport to use several independent variables (such as SAT scores, graduation rate, class size, and spending per student) to predict a single dependent variable (numerical rank). It turns out, however, that most of the so-called independent variables are, in fact, dependent on each other. A 2001 analysis found “pervasive” multicollinearity in the formula then used by U.S. News, with many pairs of variables overlapping by over 70 percent. For example, a college’s average SAT score (for its entering students) and its graduation rate were almost perfectly correlated.

Why is this a problem? When factors such as SAT scores and graduation rates are collinear, the true impact of either one on colleges’ overall rankings can be quite different from the weighting percentage nominally assigned by the formula. For example, the 2001 study found that an institution’s average SAT score actually explained about 12 percent of its ranking, even though the U.S. News formula nominally assigned that factor a weight of only 6 percent. The SAT statistic had this outsized influence because it directly, and strongly, affected seven of the 14 other variables. For this reason, Robert Zemsky and Susan Shaman argued quite persuasively in their 2017 book that it takes only a tiny handful of variables to explain almost all of the differences in the U.S. News rankings. In other words, many of the factors so carefully measured and prominently featured by the magazine are just window dressing.

Furthermore, most of the criteria explicitly used by U.S. News (and, by extension, most of the other comprehensive rankers) turn out to be heavily dependent on an unidentified background element: institutional wealth. This should be intuitively obvious for the faculty-resources and financial-resources measures. As studies have repeatedly shown, however, the degree of institutional wealth also corresponds directly with the level of entering students’ SAT scores, freshman retention rates, graduation rates, alumni giving, and even peer reputation. A ranking that gives separate weights to each of those factors ends up largely measuring the same thing.

Problem No. 4: the Salience of Numbers

A further problem with the rankocrats’ systems is the outsized impact exerted by the numerical scores that those systems produce. Scholars call this quality “salience” — that is, the tendency of one measure to dominate all the others, simply because of its greater visibility. Taking an example from the 2022 U.S. News edition, we can ask whether the University of California at Berkeley (ranked 22nd among national universities) is really better than its downstate neighbor, the University of Southern California (27th). These two numbers said yes. Yet, when you look at the underlying data (to say nothing of all the qualitative factors ignored by the formula), the only plausible conclusion is that the two colleges, while very different, were equivalent in overall quality. Those colleges’ total scores on U.S. News’s magic 100-point scorecard (82 and 79, respectively) were also almost identical. Berkeley seemed to be superior on some measures (peer evaluation and student excellence), and USC on others (faculty resources and financial resources). Yet there it was, in neon lights: No. 22 versus No. 27 in rank.

As one moves further down the ladder, the numerical differences among the colleges — and surely the real quality differences — shrink to the vanishing point. Ursinus and Hendrix Colleges, two very fine small liberal-arts colleges, received overall raw scores of 58 and 55 from U.S. News. Yet Ursinus was ranked 85th (in a tie) among national liberal-arts colleges, and Hendrix 98th (also in a tie). The notion that, in this case, a student should choose Ursinus over Hendrix simply because of these numerical differences is ludicrous. But, as many scholars have documented, rankings numbers speak loudly, often drowning out other, more edifying ways of assessing an institution’s strengths and weaknesses.

In a 2007 study of the enrollment decisions made by high-achieving students who attended Colgate University between 1995 and 2004, Amanda Griffith and Kevin Rask noted that over half of the surveyed students chose Colgate merely because it was ranked higher than the other colleges to which they were admitted. This deciding factor, they observed, was independent of other measures of academic quality, such as student/faculty ratio or expenditures per student. A 2013 investigation examined the impact of a 1995 decision by U.S. News to increase the number of institutions that were ordinally ranked. Before 1995, colleges that received raw scores between 26th and 50th in its formula were merely listed alphabetically in a “second tier.” The researchers found that when the magazine began assigning a specific number to those additional institutions, they experienced a statistically significant increase in applications, wholly independent of any changes in the underlying quantitative measures of their academic quality.

Problem No 5: Fiddling With the Formula

Compounding the inherent arbitrariness of the rankings’ methodology, rankocrats keep changing it, so as to render comparisons from one year to the next essentially meaningless. Ever since 1983, U.S. News has made repeated alterations in the variables used in its formula, the weights assigned to those factors, the procedures for measuring them, and the number of colleges listed.

Why does U.S. News keep changing its recipe? Many observers accuse the publisher of instituting changes just for the purpose of shaking things up, to generate enough drama to keep readers coming back year after year. Its editors firmly deny that charge. Instead, they typically give rather vacuous explanations for the changes, often citing “expert” opinion. But, unlike academic experts, the magazine’s editors don’t cite the results of peer-reviewed studies to substantiate their assertions.

In fact, it’s not difficult to guess the reasons for at least some of the changes. One can readily explain several adjustments — for example, the belated inclusion of social mobility and college affordability — as responses to widespread criticism of the formula’s blatant wealth bias. Other revisions reflect efforts to discourage cheating. U.S. News has been engaged in an ongoing Whac-a-Mole exercise with institutions bent on gaming their system. Find a loophole, close it. Find another loophole, close that one. Ad infinitum.

Additional alterations may have been made to avoid the embarrassment of implausible results. In the magazine’s first ranking of law schools, Yale finished first, and Harvard wound up an ignominious fifth. That implausibility was quickly corrected by subsequent rankings formulas. Until quite recently, it’s been Yale (first) and Harvard (second) at the top. A more celebrated example involves the ranking of the undergraduate program at the California Institute of Technology. In 1999, the U.S. News statisticians made an obscure change in the way the magazine plugged spending per student into its overall score computation. As a result, Caltech (which spends much more per student than its peers) vaulted from ninth place in 1999 to first place in 2000. Oops! Soon Caltech settled back to its “proper” position in the pecking order, below the perennial top dogs.

The Caltech episode illustrates a related problem: buyer’s remorse. Since a college’s numerical position in the hierarchy can bounce around from year to year, often for reasons that bear no relation to changes in its underlying quality, applicants who rely on those numbers to make college choices can get unpleasant surprises. Imagine an applicant who, in 2000, chose Caltech because it was ranked first in U.S. News, in preference to, say, Princeton (then fourth). A year later, that person wakes up to discover that the two institutions have traded places. By graduation time, Princeton is still first, while Caltech has sunk to eighth.

Problem No. 6: One Size Doesn’t Fit All; the ‘Best College’ Illusion

Just as there is no single best stew, there can be no single best college. It takes real chutzpah to claim that a formula comprising arbitrarily chosen factors and weights, which keep changing from year to year, can produce a single, all-purpose measure of institutional quality. Of course, all of the rankocrats concede this fact and take pains to advise readers to use their numerical listings only as a starting point in the search, not as an absolute method for making decisions. In service to that advice, most publications offer numerous single-dimension assessments in addition to their comprehensive best-colleges lists. And many of them supply tools to help prospective applicants construct even more-personalized intercollege matchups. (Usually for a fee, of course.)

And yet all of the rankers use their best-colleges lists as public-relations bait to hook their audiences. By the time curious readers get to the underlying information and the specialized rankings, they have been told by a seemingly authoritative organization what the correct ordering of colleges is, from best to worst. The unstated message comes through loud and clear: “Berkeley is better than USC. Ignore that relative assessment at your peril.”

What we have, in sum, is a group of popular rankings that simplify the complexity of evaluating a college’s performance by arbitrarily selecting a collection of measures, many of which overlap substantially, and then assigning equally arbitrary weights in order to purée them together into a single offering. The result is a tasteless mush.

This essay is adapted from the author’s forthcoming book, Breaking Ranks: How the Rankings Industry Rules Higher Education and What to Do About It (Johns Hopkins University Press).