So far, there have been many things to measure of our review process at COLING. Here are a few.
Firstly, it’s interesting to see how many reviewers recommend the authors cite them. We can’t evaluate how appropriate this is, but it happened in 68 out of 2806 reviews (2.4%).
Best paper nominations are quite rare in general. This gives very little signal for the best paper committee to work with. To gain more information, in addition to asking whether a paper warranted further recognition, we asked reviewers to say if a given paper was the best out of those they had reviewed. This worked well for 747 reviewers, but 274 reviewers (26.8%) said no paper they reviewed was the best of their reviewing allocation.
Mean scores and confidence can be broken down by type, as follows.
|Computationally-aided linguistic analysis||2.85||3.42|
|NLP engineering experiment paper||2.86||3.51|
We can see that reviewers were least confident with position papers, and were both most confident and most pleased with survey papers—though reproduction papers came in a close second in regard to mean score. This fits the general expectation that position papers are hard to evaluate.
The overall distribution of scores follows.