Untangling biases and nuances in double-blind peer review at scale

It’s important to get reviewing right, and remove as many biases as we can. We had a discussion about how to do this in COLING, presented in this blog post in interview format. The participants are the program co-chairs, Emily M. Bender and Leon Derczynski.

LD: How do you feel about blindness in the review process? It could be great for us to have blindness in a few regards. I’ll start with the most important to me. First, reviewers do not see author identities. Next, reviewers do not see each other’s identities. Most people would adjust their own review to align with e.g. Chris Manning’s (sounds terribly boring for him if this happens!). Third, area chairs do not see author identities. Finally, area chairs do not see reviewer identities in connection to their reviews, or a paper. But I don’t know how much of this is possible within the confines of conference management. The last seems the most risky; but reviewer identities being hidden from each other seems like a no-brainer. What do you think?

Reviewers blind from each other

EMB: It looks like we have a healthy difference of opinion here 🙂 Absolutely, reviewers should not see author identities. With them not seeing each other’s identities, I disagree. I think the inter-reviewer discussion tends to go better if people know who they are talking to. Perhaps we can get the software to track the score changes and ask the ACs to be on guard for bigwigs dragging others to their opinions?

LD: Alright, we can try that; but after reading that report from URoch, how would you expect PhD students/postdocs/asst profs to have reacted around a review of Florian Jaeger’s, if they’d had or intended to have any connection with his lab? On the other side, I hear a lot from people unwilling to go against big names, because they’ll look silly. So my perception of this is that discussion goes worse when people know who they’re contradicting—though reviews might end up being more civil, too. I still think big names distort reviews here despite getting reviewing wrong just as often as the small names, so having reviewers know who each other are makes for less fair reviewing.

EMB: I wonder to what extent we’ll have ‘big names’ among our reviewers. I wonder if we can get the best of both worlds though by revealing all reviewers names to each other only after the decisions are out. So people will be on good behavior in the discussions (and reviews) knowing that they’ll be associated with their remarks eventually, but won’t be swayed by big names during the process?

LD: Yes, let’s do this. OK, what about hiding authors from area chairs?

Authors and ACs

EMB: I think hiding author identities from ACs is a good idea, but we still need to handle conflicts-of-interest somehow. And the cases where reviewers think that the authors should be citing X previous work when X is actually the author’s. Maybe we can have some of the small team of “roving” ACs doing that work? I’m not sure how they can handle all COI checking though.

LD: Ah, that’s tough. I don’t know too much about how the COI process typically works from the AC side, so I can’t comment here. If we agree on the intention—that author identities should ideally be hidden from ACs—we can make the problem better-defined and share it with the community, so some development happens.

EMB: Right. Having ACs be blind to authors is also being discussed in other places in the field, so we might be able to follow in their footsteps.

Reviewers and ACs

LD: So how about reviewer identities being hidden from ACs?

EMB: I disagree again about area chairs not seeing reviewer identities next to their reviews. While a paper should be evaluated solely on its merits, I don’t think we can rely on the reviewers to get absolutely everything into their reviews. And so having the AC know who’s writing which review can provide helpful context.

LD: I suppose we are choosing ACs we hope will be strong and authoritative about their domain. Do you agree there’s a risk of a bias here? I’m not convinced that knowing a reviewer’s identity helps so much—all humans make mistakes with great reliability (else annotation would be easier), and so what we really see is random effect magnification/minimization depending on the AC’s knowledge of a particular reviewer, where a given review’s quality varies on its own.

EMB: True, but/and it’s even more complex: The AC can only directly detect some aspects of review quality (is it thorough? helpful?) but doesn’t necessarily have the ability to tell whether it’s accurate. Also—how are the ACs supposed to do the allocation of reviewers to papers, and do things like make sure those with more linguistic expertise are evenly distributed, if they don’t know who the reviewers are?

LD: My concern is that ACs will have bias about which reviewers are “reliable” (and anyway, no reviewer is 100% reliable). However, in the interest of simplicity: we’ve already taken steps to ensure that we have a varied, balanced AC pool this iteration, which I hope will reduce the effect of AC:reviewer bias when compared to conferences with mostly static AC pools. And the problem of allocating reviews to papers remains unsettled.

EMB: Right. Maybe we’re making enough changes this year?

LD: Right.

Resource papers

LD: An addendum: this kind of blindness may prove impossible for resource-type papers, where author anonymity may become an optionally relaxable constraint.

EMB: Well, I think people should at least go through the motions.

LD: Sure—this makes life easier, too. As long as authors aren’t torn apart during review because someone can guess the authors behind a resource.

EMB: Good point. I’ll make a note in our draft AC duties document.

Reviewing style

LD: I want to bring up review style, as well. To nudge reviewers towards good reviewing style, I’d like reviewers to have the option of signing their reviews, with signatures available to authors at notification only. The reviewer identity would not be attached to a specific review, but rather general, in the form “Reviewers of this paper included: Natalie Schluter.” We known adversarial reviewing drops when reviewer identity is known, and I’d love to see CS—a discipline known for nasty reviews—begin to move in a positive direction. Indeed, as PC co-chairs of a CS-related conference, I feel we in particular have a duty to address this problem. My hope is that I can write a script to add this information, if we do it.

EMB: If the reviewers are opting in, perhaps it makes more sense for them to claim their own reviews. If I think one of my co-reviewers was a jerk, I would be less inclined to put my name to the group of reviews.

LD: That’s an interesting point. Nevertheless I’d like us to make progress on this front. In some time-rich utopia it might make sense to have the reviewers all agree whether or not to sign all three, and only have their identities revealed to each other after that—but we don’t have time. How about, reviews may be signed, but only at the point notifications are sent out? This prevents reviewers knowing who each other is, and lets those who want to hide, do so—as well as protecting us all from the collateral damage that results from jerk reviewers.

This could work with a checkbox—”Sign my review with my name in the final author notification”—and the rest’s scripted in Softconf.

EMB: So how about option to sign for author’s view (the checkbox) + all reviewers revealed to each other once the decisions are done?

LD: Good, let’s do that. Reviewer identities are hidden from each other during the process, and revealed later; and reviewers have the option to sign their review via a checkbox in softconf.

EMB: Great.

Questions

What do you think? What would you change about the double-blind process?

17 thoughts on “Untangling biases and nuances in double-blind peer review at scale

  1. Thanks for the thoughtful and transparent discussion! Regarding signed reviews: what exactly is the incentive for a reviewer to opt in? I could imagine that reviewers who are senior would want to disclose their identity to make it seem more authoritative—in the worst case it could discourage them from putting in the effort to write a thorough review! I could also imagine that non-senior reviewers would only opt in if the review was positive. If a paper has N positive reviews and N signatures, it will often be easy to guess who wrote the positive reviews, but I don’t know if that is valuable to the authors in any way.

    This also interacts with the question of whether the program committee is published. If a paper is about a niche topic that only a few PC members have expertise with, authors can presume that they reviewed the paper, and if positive reviewers have opted in, the negative reviewers can be inferred.

    • Hi Nathan, good questions. If you’ve written a review you’re proud of, or content to stand by, you can now sign it; that’s all. Conversely, if you’re not proud of a review to the point of not wanting to sign it, maybe the review should be re-thought. It’s possible to think of situations where one is proud of one’s review but would rather withhold their identity – that’s why the process is ostensibly double-blind, after all – and we don’t want to police that, so the setup is self-regulated.

      Yes, it’s possible to guess reviewer identity, though the last data I saw on this showed that reviewers greatly overestimated their real success in doing so. And of course, reducing the search space of potential reviewer identities when trying to work out who did a bad (i.e. aggressive or superficial) job is not completely awful.

      Reflecting on these points, I guess there’s also the chance that the identity opt-in works as what Barbara Plank might term “serendipitous data”. We may find that those reviewers who opt-in write longer reviews, have more sway over a paper’s final outcome, for example; it’d be interesting to see how well this corresponds to reviewer quality.

      • Thank you for your comments — I think one thing we could be clearer about is what is signed when, and for whom. The proposal in the blog post (at least my understanding!) was that the reviews would be anonymous to the author & to the other reviewers until the decisions were made (but known to the AC). At the end of the process (post-decision), reviewer names would be revealed to co-reviewers, and, if a reviewer so chose, to the author.

  2. Some thoughts:

    Reviewers anonymous to one another. I don’t know if it matters so much. As a reviewer, I always read the other reviewers’ comments to see if I made an error or missed something. But I usually don’t initiate discussions, and when I engage in discussions, I often intend my comments to be more for the area chairs than for the other reviewers. In general, I think there’s too much desire in our community to get reviewers to agree with each other. Difference of opinion is legitimate; I recall at least one case as area chair where reviewers had genuine disagreements about the merits of a paper, and there was no point in continuing the discussion among them. So yes, read other reviews as a check, but don’t try too hard to influence or be influenced by other reviewers. When a paper is controversial, it should be left as a judgment call for the area chairs and program chairs.

    Authors anonymous to area chairs. I think it’s a good idea: as area chair I don’t think I paid much attention to the identities of authors, and the only use I had for this information was in identifying conflicts of interest. But even those I typically wasn’t able to identify myself; rather, a reviewer would suspect that a particular paper posed a conflict of interest, and then I’d check the authors to verify if this was indeed the case. So perhaps authors can be hidden from ACs by default, but could be revealed if needed.

    Reviewers anonymous to area chairs. I think this is not workable. First of all, review assignment would be very difficult, and reviewers would end up getting papers that are not as well matched. Also, communication among reviewers and ACs often goes through side channels (for example when there’s an issue with START), breaking anonymity. But most importantly, I think knowing the identity of the reviewer helps the AC contextualize the review, and thus works as a check against reviewer bias.

    Signing reviews. It would be interesting to hear why CS is characterized as “a discipline known for nasty reviews”, and why the program chairs feel that there’s a problem that needs addressing. I don’t feel that I’m on the receiving end of a lot of nastiness; when I write reviews I try to find the positive aspects of a paper, but I’m also direct when identifying problems, and I hope this directness is not perceived as nasty. What bothers me a lot more than perceived nastiness is when I receive a review that appears unprofessional, especially if I get the impression that a reviewer did not bother to read or think deeply about my paper.

    At any rate, I see limited value in revealing reviewers’ names to the authors; I guess I can’t imagine what authors would do with this information. What would be more interesting is the (obligatory) publication of names of accepting referees. Of course I can’t take credit for this idea: it was put forward by Geoffrey K. Pullum, Stalking the perfect journal, Natural Language and Linguistic Theory 2(2):261–267, 1984. I think it’s worth a try.

    • Thanks for your thoughts. I agree that convergence isn’t a must at all. And it’s bound to be easier to avoid being over-influenced by unconscious biases if you don’t even know who the other reviewers are.

      Almost all the data I have about CS reviewers being rough is anecdotal, but there were a few threads of this in CACM and around NSF reviewing; (a) https://cacm.acm.org/blogs/blog-cacm/123611-the-nastiness-problem-in-computer-science/fulltext and (b) https://cacm.acm.org/blogs/blog-cacm/134743-yes-computer-scientists-are-hypercritical/fulltext . Evidence indicates that signed reviews tend to be more thorough and have less aggressive language – hence signing’s being mooted as a measure here. It’s not driven by openness, but by quality, because while the former is nice, the latter is paramount to the paper curation process.

      That’s an interesting idea from Geoffrey Pullum – thanks for the reference. I note that review quality is the very first argument made there for having reviews signed. Indeed, it’s interesting to see how progress has been in the 33 years since this article raised these issues.

      • Does the evidence you cited come from opt-in signed reviewing? If it wasn’t opt-in I would expect that reviewers might feel compelled to put more effort in because they know they can’t hide.

        One incentive to consider: Promise to recognize excellent reviewers, and stipulate that reviewers must sign at least one review in order to qualify.

        Another experiment that would be interesting (not sure if it’s been done): ask authors when they submit reviews to self-assess thoroughness. (This is not necessarily the same thing as confidence: one could have low confidence because a paper is not within one’s area of expertise, yet still read it carefully and give thoughtful feedback.) I would guess that a) stressed reviewers are aware that their reviews aren’t particularly thorough, and b) knowing that self-assessed thoroughness will be taken into account by ACs will incentivize some reviewers to put in more effort.

        • The main reason for keeping reviewers anonymous is to allow them to be critical without fear of repercussion. And because each reviewer gets a tiny sample of papers to review, there’s a non-negligible chance that they get a batch of not great papers. Requiring reviewers to sign one review in order to get recognized creates a perverse incentive to give a positive review to at least one paper in the batch even when none deserve it — which is the exact opposite of encouraging quality reviews.

          I think Pullum’s point about publishing the names of accepting referees is to put some pressure against giving favorable reviews to shoddy work. This might be less of a problem at a conference like COLING, where historical acceptance rates are fairly low to begin with, and the problem is more rejection of good work than acceptance of bad work.

          I agree that reviewer stress and time compression is a big problem (I definitely suffer from it), and I think the only solution is less reviewing. This requires a community-wide effort and cannot be handled on a conference-by-conference basis. Unfortunately the trend is for each conference to place more and more burdens on reviewers, so a person’s only recourse is to opt out of program committees.

  3. I strongly agree with Ron’s comments on discussion among reviewers. It should be for elaboration and clarification to help area chair to make the best decision, and not for reaching consensus. My experience with the ‘reaching consensus’ approach is that it often reached when one side gave up for one reason or another (and often not necessarily the right one).
    As for signed review, in a way similar to Pullum’s argument, I can think of one scenario where it will definitely help quality of review. That is, when a reviewer is making innovative and original contribution to the research. A signed review of an accepted paper in this context will allow the idea to be properly acknowledged and could even lead to productive collaboration between the authors and reviewer. This option may be crucial in persuading the reviewer to make truly helpful comments stimulated in the process of reviewing (instead of hiding them for his/her next paper).
    Lastly, why is the anonymity of submission assumed? In the time of arXiv and so much information available online, the anonymity is pretty much impossible to maintain. Strictly speaking any submission that is already deposited in arXiv or available in some form online is no longer anonymous. Should we reject all these papers? Isn’t this anonymity only protects potential plagiarism and double submissions as the reviewer cannot double-check? Isn’t this anonymity also facilitated the anecdotal claims that reviewers stole ideas from reviewed papers? [At least, if the reviewer likes an idea of a paper that is not accepted and uses it later, s/he is obliged to cite/credit the authors if s/he knows the names.] And don’t we all agree that journal reviews generally have higher quality and perhaps this is because many of them are single-blind?

    • Why is the anonymity of submission assumed? Well, one credits the reviewer with some scruples, and trusts they are content to regulate themself well by writing the review without peeking. I know I don’t go looking up the paper on arXiv when I get it to review – neither the inclination nor time to do it. Additionally, the reported unconscious bias has been that papers from known names/labs are more likely to get positive reviews, without an opposing effect. So we’re not guaranteed trouble if people spend time stalking arXiv to look up authors; and it’s not a complete disaster if the reviewer happens to have stumbled on the paper before the review process, because they will – being a good honest reviewer – either have helpful feedback, or realize they might not write a fair review due to their privileged knowledge, and declare a COI during paper allocation.

      I don’t think that single-blind is a causal factor of better quality in journal reviews. That’s an interesting notion. Is there anything more to read on that?

      • I am afraid I found the argument for anonymity circular. If ultimately we have to rely on scruple of the reviewers, then it does not matter what system we adopt. I personally have more trust of the system when the authors’ name is not blind. This means that when biased act happened, the reviewers have to commit such acts knowing the name of the authors. It is typically considered to be much harder to commit a biased act when the victims are known. In addition, a bias committed when the reviewer knows/surmises the identity of the author or was simply territorial and tries to block all competitors is much easier to prove when the review is not blind. Anonymity in the face of arXiv and other online data creates many more opportunities for reviewers to err damage trust in the system. We can trust the scrupulous reviewers but can we assure that 100% of reviewers are scrupulous? Can we trust the avid posters of arXiv etc. to hold the same standard under anonymity when it damage their chance? I am afraid I see anonymity exemption of arXiv etc., while all papers there are fully citable, a oxymoron and a great damage to trust in the process.

        • Perhaps I should make my position clearer. Previously, I am quite open to all possible formats of reviewing and consider that each journal/conference can choose what it consider to be the most appropriate as long as it is clearly stated and carefully implemented. However, given the popularity of arXiv and many academic social media, I now see the anonymity of submission a non-tenable position unless we want to disqualify all these ‘previously published’ papers. It seems that we don’t. Hence I think the only viable solution is to go non-anonymous submission, which will in fact save the organizers a lot of aggravation.

        • Thank you for engaging in discussion with us!

          I strongly disagree with one of your premises, though — while there are overt/conscious acts of bias, there is also unconscious bias (which tends to favor well-known researchers, well-known labs, and people in dominant demographics). While we can, at least to a certain extent, rely on the scruples of reviewers to behave accordingly if we ask them not to go looking for preprints on arXiv or elsewhere, even the most well-meaning reviewers can’t effectively account for unconscious bias.

  4. Very nice discussion, thanks! I agree with many things that have been said here and in the comments. Some points I don’t think have been raised.

    Re reviewers blind from each other: I’ve heard from several people (at least four) that their willingness to disagree with a powerful reviewer (eg someone senior who might review their grant proposals at some point) is often quite close to zero when they know the powerful reviewer will know who they are. FWIW, All those who have mentioned this to me come from underrepresented/historically-excluded populations in the NLP/CL/ML community.

    Re signing reviews: I’m not entirely sure what the motivation is here. It’s true I’m more likely to sign my review if I’m confident in it (and probably if it’s positive [and probably also if I’m in some position of authority]) but the general argument seems to get the causality backwards here. In general I’m not a huge fan of “opt in” things because people will naturally opt in iff it benefits them in some way, and I don’t think benefiting reviewers is what you’re trying to solve here.

    One related question, not exactly on blindness, is the question of whether reviews are made public for accepted paper. I personally think this is a really nice practice.

    • If the author was diligent in addressing reviewer comments, then parts of the review will be irrelevant in conjunction with the published paper. What’s the point of publishing those? Is there value to the (short) editing history of the paper?

      Maybe edited reviews can be useful, but this puts yet another burden on reviewers.

      • I’m curious to the answers here, too! Maybe it’s useful for the community to see what the reviewers thoughts the merits of the paper were/what they were skeptical about?

    • Thanks, Hal — I hadn’t get encountered the motivation you cite for keeping reviewers anonymous to each other. (I’ve always been personally irked when that happens, because I like to know who I’m talking to!) But that is a really important angle to consider.

      I imagine that our proposal (reviewers’ names revealed to each other at the end) won’t help in this case, because someone who feels that way will likely worry that the person they’re disagreeing with is powerful. Would you agree?

      Re benefitting reviewers — in a sense we are trying to figure out ways to benefit reviewers, to entice better reviewing/more effort out of them, because they get ‘more’ for it. But I can see that maybe this isn’t an effective move in that direction.

  5. Yes, indeed very comprehensive discussion on whether reviewers identities should be disclosed. It seems to me that the different perspectives variations very often depend on implementation and on the gives and takes of benefits. There is, however, less discussion on the blindness of submission. This in fact is currently, from my view, a very serious threat to the future of our field. Let’s start with a very simple principle of academic ethics that I hope most of us can agree on:

    A citable paper is not anonymous.

    Based on this, the logical conclusion is that non-anonymous review is incompatible with acceptance of papers posted on arXiv and other social media etc. Please think logically, making wrong decision on this will seriously endanger CL’s reputation as a field.

Leave a Reply

Your email address will not be published. Required fields are marked *