Workshop review process for ACL, COLING, EMNLP, and NAACL 2018

This guest post by the workshop chairs describes the process by which workshops were reviewed for COLING and the other major conferences in 2018 and how they were allocated.

For approximately the last 10 years, ACL, COLING, EMNLP, and NAACL have issued a joint call for workshops. While this adds an additional level of effort and coordination for the conference organizers, it lets workshop organizers focus on putting together a strong program and helps to ensure a balanced set of offerings for attendees across the major conferences each year. Workshop proposals are submitted early in the year, and specify which conference(s) they prefer or require. A committee composed of the workshop chairs of each conference then undertakes a review process of the proposals, and decides which proposals to accept, and an assignment of venues. This blog post explains how the process worked in 2018, and largely followed the guidance on the ACL wiki.

We began by gathering the workshop chairs in August 2017. At that time, workshop chairs from ACL (Brendan O’Connor, Eva Maria Vecchi), COLING (Tim Baldwin, Yoav Goldberg, Jing Jiang), and NAACL (Marie Meteer, Jason Williams) had been appointed, but EMNLP (which occurs last of the 4 events in 2018) had not. This group drafted the call for workshops, largely following previous calls.

The call was issued on August 31, 2017, and specified a due date of October 22, 2017. During those months, the workshop chairs from EMNLP were appointed (Marieke van Erp, Vincent Ng) and joined the committee, which now consisted of 9 people. We received a total of 58 workshop proposals.

We went into the review process with the following goals:

  • Ensure a high-quality workshop program across the conferences
  • Ensure that the topics are relevant to the research community
  • Avoid having topically very similar workshops at the same conference
  • For placing workshops in conferences, follow proposer’s preferences wherever possible, diverging only in cases where there existed space limitations and/or substantial topical overlap

In addition to quality and relevance, it is worth noting here that space is an important consideration for workshops. Each conference has a fixed set of meeting rooms available for workshops, and the sizes of those rooms varies widely, with the smallest room holding 44 people, and the largest holding 500. We therefore made considerable effort to estimate the expected attendance at workshops (explained more below).

We started by having each proposal reviewed by 2 members of the committee, with most committee members reviewing around 15 proposals. To aid in the review process, we attempted to first categorize the workshop proposals, to help align proposals with areas of expertise on the committee. This categorization proved quite difficult because many proposals intentionally spanned several disciplines, but it did help identify proposals that were similar.

Our review form included the following questions:

  • Relevance: Is the topic of this workshop interesting for the NLP community?
  • Originality: Is the topic of this workshop original? (“no” not necessarily a bad thing)
  • Variety: Does the topic of this workshop add to the diversity of topics discussed in the NLP community? (“no” not necessarily a bad thing)
  • Quality of organizing team: Will the organisers be able to run a successful workshop?
  • Quality of program committee: Have the organisers drawn together a high-quality PC?
  • Quality of invited speakers (if any): Have high-quality, appropriate invited speaker(s) been identified by the organisers?
  • Quality of proposal: Is the topic of the workshop motivated and clearly explained?
  • Coherence: Is the topic of the workshop coherent?
  • Size (smaller size not necessarily a bad thing):
    • Number of previous attendees: Is there an indication of previous numbers of workshop attendees, and if so, what is that number?
    • Number of previous submissions: Is there an indication of previous numbers of submissions, and if so, what is that number?
    • Projected number of attendees: Is there an indication of projected numbers of workshop attendees, and if so, what is that number?
  • Recommendation: Final recommendation
  • Text comments to provide to proposers
  • Text comments for internal committee use

As was done last year, we also surveyed ACL members to seek input on which workshops people were likely to attend. We felt this survey would be useful in two respects. First, it gave us some additional signal on the relative attendance at each workshop (in addition to workshop organizers’ estimates), which helps assign workshops to appropriately sized rooms. Second, it gave us a rough signal about the interest level from the community. We expected that results from this type of survey are almost certainly biased, and kept this in mind when interpreting results.

Before considering the bulk of the 58 submissions, we note that there are a handful of large, long-standing workshops which the ACL organization agrees to pre-admit, including *SEM, WMT, CoNLL, and SemEval. These were all placed at their first-choice venue.

We then dug into our main responsibility of making accept/reject and placement decisions for the bulk of proposals. In making these decisions, we took into account proposal preferences, our reviews, available space, and results from the survey. Although we operated as a joint committee, ultimately the workshop chairs for each conference took responsibility for workshops accepted to their conference.

We first examined space. These 4 conferences in 2018 each had between 8 and 14 rooms available over 2 days, with room capacities ranging from 40 to 500 people. The total space available nearly matched the number of proposals. Specifically — had all proposals been accepted — there was enough space for all but 3 proposals to be at their first choice venue, and the remaining 3 at their second choice.

Considering the reviews, the 2 reviews per paper were very low-variance: about ⅔ of the final recommendations were identical, and the remaining ⅓ differed by 1 point on a 4-point scale. Overall, we were very impressed by the quality of the proposals, which covered a broad range of topics with strong organizing committees, reviewers, and invited speakers. None of the reviewers recommended 1 (clear reject) for any proposal. Further, the survey results for most borderline proposals showed reasonable interest from the community.

We also considered topicality. Here we found that there were 5 pairs of workshops where each requested the same conference as their first choice, and were topically very similar. In four of the pairs, we assigned a workshop to its second choice conference. In the final pair, in light of all the factors listed above, one workshop was rejected.

In summary, of the 58 proposals, 53 workshops were accepted to their first-choice conference; 4 were accepted to their second-choice conference; and 1 was rejected.

For the general chairs of *ACL conferences next year, we would definitely recommend continuing to organize a similarly large number of workshop rooms. For workshop chairs, we stress that reviewing and selecting workshops is qualitatively different than reviewing and selecting papers; for this reason, we recommend reviewing the proposals among the committee rather than recruiting reviewers (as was previously pointed out by the workshop chairs from the previous year). We would also suggest having workshop chairs consider using a structured form for workshop submissions, since a fair amount of manual effort was required to extract structured data from each proposal document.


For ACL:
Brendan O’Connor, University of Massachusetts Amherst
Eva Maria Vecchi, University of Cambridge

Tim Baldwin, University of Melbourne
Yoav Goldberg, Bar Ilan University
Jing Jiang, Singapore Management University

Marie Meteer, Brandeis University
Jason Williams, Microsoft Research

Marieke van Erp, KNAW Humanities Cluster
Vincent Ng, University of Texas at Dallas

