As we stated at the outset, one of our goals for COLING 2018 has been “to create a program of high quality papers which represent diverse approaches to and applications of computational linguistics written and presented by researchers from throughout our international community”. One aspect of the COLING 2018 review process that we designed with this goal in mind was the enumeration of six different paper types, each with its own tailored review form. We first proposed an initial set of five paper types, and then added a sixth and revised the review forms in light of community input. The final set of paper types and review forms can be found here. In this blog post, we report back on this aspect of COLING 2018, both quantitatively and qualitatively.
Submission and acceptance statistics
The first challenge was to recruit papers from the less common paper types. Most papers published a NLP venues fit either our “NLP Engineering Experiment” or “Resources” paper type. The table below shows how many of each type were submitted, withdrawn, and accepted, as well as the acceptance rate per paper type. (The “withdrawn” number is included because these are excluded from the denominator in the acceptance rate, as discussed here.)
Not surprisingly, the “NLP Engineering Experiment” paper type accounted for more than half of the submissions, but we are pleased that the other paper types are also represented. We hope that if this strategy is taken up in future COLINGs (or other venues) that it will continue to gain traction and the minority paper types will become more popular.
These statistics all represent the paper type chosen by the authors at submission time, not necessarily how we would have classified the papers. More discussion on this point below.
Author survey on paper types
As described in our post on our author survey, the feedback on the paper types from authors was fairly positive:
We wanted to find out if people were aware of the paper types (since this is relatively unusual in our field) before submitting their papers, and if so, how they found out. Most—349 (80.4%)—were aware of the paper types ahead of time. Of these, the vast majority (93.4%) found out about the paper types via the Call for Papers. Otherwise, people found out because someone else told them (7.4%), via our Twitter or Facebook feeds (6.0%), or via our blog (3.7%).
We also asked if it was clear to authors which paper type was appropriate for their paper and if they think paper types are a good idea. The answers in both cases were pretty strongly positive: 78.8% said it was clear and 91.0% said it was a good idea. (Interestingly, 74 people who said it wasn’t clear which paper type was a good fit for theirs nonetheless said it was a good idea, and 21 people who thought it was clear which paper type fit nonetheless said it wasn’t.)
Not knowable from that survey is whether/to what extent we failed to reach people who would have submitted e.g. a survey paper or reproduction paper, had they only known we were specifically soliciting them.
Reviewer survey on paper types
We also carried out a survey of our reviewers. This was sent with more delay (on 25 May, though reviews were due 10 April), and as some survey respondents pointed out, we may have gotten more accurate answers if we’d asked more quickly. But, there was plenty else we were worrying about in the interim! The response rate was also relatively low: only 128 of our 1200+ reviewers answered the survey. With those caveats, here are some results. (No question was required, so the answers don’t sum to 100%.)
- We asked: “Did you feel like the authors chose the appropriate paper type for their papers?” 69.5% chose “Yes, all of them”, 26.6% “Only some of them”, and 0.8% (just one respondent), “No, none of them.”
- We asked: “For papers that were assigned to what you thought was the correct paper type, did you feel that the review form questions helped you evaluate papers of that type?” 29.7% chose “Yes, better than usual for conferences/better than expected”, 57% “Yes, about as usual/about as expected”, 6.3% “No, worse than usual/worse than expected”, 1.6% “No, the review forms were poorly designed”
- We asked: 36.7% chose “For papers that were assigned to what you thought was an incorrect paper type, how problematic was the mismatch?” 21.9% “Not so bad, even the numerical questions were still somewhat relevant” and no one chose “Pretty bad, I could only say useful things in the comments” or “Terrible, I felt like I couldn’t fairly evaluate the paper.” (58.6% chose “other”, but this was mostly people who didn’t have any mismatches.)
Our take away is that, at least for the reviewers who responded to the survey, the differentiated review forms for different paper types were on balance a plus—that is, they helped more than they hurt.
How to handle papers submitted under the wrong type?
Some misclassified papers were easy to spot. We turned them up early in the process browsing the non-NLP engineering experiment paper types (since we were interested to see what was coming in). Similarly, ACs and reviewers noted many cases of obvious type mismatches. However, we decided against reclassifying papers. The primary reason for this is that, despite there being some clear cases of mistyped papers, many others would not be. It would be impossible to go through all papers and consider reassigning their types and do so consistently. Furthermore, the point of the paper types was to allow authors to choose what questions reviewers would be answering about their papers. Second-guessing this seemed unfair and non-transparent.
Perhaps the most common clear cases of mistyped papers were papers we considered NLP engineering experiment (NLPEE) papers that were submitted as computationally aided linguistic analysis (CALA) papers. We have a few hypotheses about why that might have happened, not mutually exclusive:
(1) Design factors. CALA was listed first on the paper types page; people read it, thought it matched and looked no further. (Though in the dropdown menu for this question in the submission form on START, NLPEE is first.)
(2) Terminological prejudice. People were put off by “engineering” in the name of NLPEE. We’ve definitely heard some objections to that term from colleagues who take “engineering” to be a derogatory term. But we do not see it that way at all! Engineering research is research. Indeed, a lack of attention to good engineering in our computational experiments makes the science suffer. Furthermore, research contributions focused on building something and then testing how well it works seem to us to be well characterized by the term “engineering experiment”. It’s worth noting that we did struggle to come up with a name for this paper type, in large part because it is so ubiquitous but we couldn’t very well call it “typical NLP paper”. In our discussions, Leon proposed a name involving the word “empirical”, but Emily objected strongly to that: linguistic analysis papers that investigate patterns in language use to better understand linguistic structure or language behavior are very much empirical too.
(3) Interdisciplinary misunderstanding. Perhaps people working on NLPEE-type work from more of an ML background don’t understand the term “linguistic analysis” or “linguistic phenomenon” as we intended it. The CALA paper type was described as follows:
The focus of this paper type is new linguistic insight. It might take the form of an empirical study of some linguistic phenomenon, or of a theoretical result about a linguistically-relevant formal system.
It’s entirely possible that someone without training in linguistics would not know what terms like “linguistic phenomenon” or “formal system” denote for linguists. This speaks to the need for more interdisciplinary communication in our field, and we hope that COLING 2018 will continue the COLING tradition of providing such a venue!