Clapping and Counting: Assessment’s (Very Brief) Past

Academia tends to have a fairly narrow view of its own administrative history, especially when it comes to the strange, epiphenomenal genres of assessment that underwrite it. Faculty assessment discourse—applicant dossiers, reader's reports, letters of recommendation—all have an aura of deep entrenchment about them.  Like most of the trappings of bureaucratic being, these sorts of things feel like they've been around forever.

But they haven't. A rather bizarre example: well into the late-17th century, faculty promotions at the University of Königsberg were made by way of a kind of academic clap-o-meter. Applicants landed coveted chair positions based largely on the amount and intensity of the applause their lectures generated.[1] Some of this ovation was of the figurative rather than the actual sort: disputation-dissertations (that is, publications) and invitations to scholarly societies counted too, but only as outward signs of diligence, not explicit markers of accomplishment. In general academic success in Baroque Germany was much more a matter of social charisma, renown, and, well, clapping, than anything like a publication history or "contribution to the field."

It wasn't until the mid-18th century, with the rising professionalization of the research university, that reference letters began mentioning in more detail a candidate's dissertation or specific publications (in the case of Königsberg, the shift from clapping to counting publications was largely owing to the 1749 Prussian decree mandating publication as a requirement for promotion). In other words, as the goal of higher education became more and more about producing professional scholars, and as notions of "professional" became allied with a specific number of publications (in Prussia, it was three), methods of assessment became much more tied to brute counting over the airier impressions of a crowded room and thundering applause.[2]

The history of editorial peer review's assessment metrics tracks closely with the evolution of the recommendation letter. Again, it was the mid-18th century that saw the dawning of something like a bona fide genre of peer review. [3] The 1752 Royal Society of London's formation of a "Committee on Papers" marks the typical start of the practice of external assessment of journal content; though, as Kathleen Fitzpatrick points out, this received history is complicated by "the existence of at least one earlier instance of formalized peer review in a scientific journal: the Royal Society of Edinburgh seems to have had such a system in place as early as 1731".[4]

Prior to this, review was less about concerted efforts to assess a piece of writing for its scholarly merits and more about state-sponsored censorship—acceptability based not on academic rigor but on social suitability. [5] Outside of censors and officialdom, the enterprise of review mostly took the form of scholarly society membership meetings, public lectures and discussions, and the casual exchange of letters. It was an uncodified and conversational process, often done within physical earshot of those being reviewed. In other words, much like the recommendation letter, this early form of peer review was invested in gauging social reception rather than any kind of objective content value.

But why this very brief jaunt through the prehistory of academic assessment?

For one, it highlights the fact that review methods have a long past rooted in the wider circle of public reception. We tend to think of assessment genres like the letter of recommendation and the reader's report as closed-circuit discourses. If circulation occurs, it occurs in a tight loop between editor and reviewer, or search committee and referrer. Likewise, we tend to think of the methodology of assessment as content-based—reviewing the article, not its public performance, or reviewing the person-as-producer, not as social being. But it was not always thus. And, in fact, there are relatively recent examples of how new the idea of this sort of blind external review really is. "Science and The Journal of the American Medical Association," Fitzpatrick reminds us, "did not vet manuscripts through outside reviewers until the 1940s."[6]

For another, history shows faculty assessment to be a dynamic genre closely shadowing (and reflecting) large-scale developments in academic culture and scholarship. As scholarship moved from the public lectern to the dissertation and the royal society journal, assessment methods also morphed. And as counting and the ratification of the scientific method became essential to state and academic discourse, so too did letters of recommendation and review become focused on measures like publication count and objective appraisals of hypothesis, method, and proof.[7]

Beyond Reaction: Peer Review’s (Slightly Briefer) Present

All of which is to say: there has never been a bedrock code of conduct for how peer review ought to function. If anything, the lesson from the past is that assessment functions best when we treat it less like a received script and more like a fluid genre of scholarship, itself in need of constant critique and continual updating.

This is the point John Eisen makes in a post titled "Stop deifying 'peer review' of journal publications."  Eisen is reacting to a specific microbiological episode of a few years ago in which NASA scientists (who claimed in a Science paper to have discovered an arsenic-munching microbe) were forced by public critique to retreat behind the bulwarks—the "sacred boundary"—of disciplinary peer review. What Eisen points out, though, is that consecrating narrow versions of journal peer review, especially as a protection against valid multi-channel criticism, only helps to signal the brittleness of the practice. Rather, Eisen declares:

Peer review should be—and in fact with most scientists is—continuous. It should happen before, during and after the "peer review" that happens for a publication. Peer review happens at conferences – in hallways – in lab meetings – on the phone – on skype – on twitter – at arXiv – in the shower – in classes – in letters – and so on. Scientific findings need to be constantly evaluated – tested – reworked – critiqued – written about – discussed – blogged – tweeted – taught – made into art – presented to the public – turned inside out – and so on.

Ultimately, peer review's fluid past is a way of reframing the notion of peer review's seemingly revolutionary future. That is, if there's a revolution afoot, it's one that is completely in keeping with a fairly steady rate of change throughout its not-so-entrenched history. Developments like the pre-print archive (notable examples include arXiv and bioRxiv) or the emergence of mega-journals like PeerJ and the Open Library of Humanities that emphasize open and participatory review, are not (or not simply) radical reactions to a failed review enterprise. They are instead permutations, bellwethers of the increasingly open and collaborative ways good scholarship gets done and wants to be counted.

Speaking of counting, it's crucial to at least mention in passing (and in closing) the continued importance of publication metrics, specifically the activity that's come to be known as bibliometrics (as well as altmetrics and article-level metrics). Much like Clay Shirky's gracefully aging Web 2.0 bon mot—"It's not information overload. It's filter failure"—the challenge for publications that engage open or otherwise alternative forms of peer review (and even those that don't) is not necessarily about legitimization per se, but in how these publications plan to be counted. As we've seen, metrics—the act of finding viable proxies for scholarly quality—have always been a challenge whether we're talking about H-index or an impassioned slow-clap. In the case of changing peer review practices, the trend is decidedly toward more, not less, review. But, as Eisen discusses above, these new modes of review happen throughout the research process and they happen in a disaggregated welter of places, from pre-prints, to data repositories, to blogs, Twitter, news media, reference managers like Zotero and Mendeley, not to mention at the level of the journal itself. The explosion of review and review venues means that the pressure is on to find new ways of translating these kinds of assessments (back) into a coherent genre. And luckily there are significant efforts underway to do just this with tools and services like ImpactStory, AltMetric, and PLOS's Article-Level Metrics.

In the end, something in this socially-minded genre of open review and its attendant assessment tools feels like a healthy merger of the 17th-century German clap-o-meter and the content-based counting brought on by the Enlightenment university. The difference is that we now have much wider (and deeper) venues where review can exist and much more nuanced tools for gauging and aggregating these assessments. The challenge—as always—is to ensure that we continue to watch the watchers: that we not let any one mode of assessment dominate the field, that we respect openness as an ethics of scholarship, but also as an ethics of evaluation and counting.

