Q&A best practices: Introduce yourself

At COLING 2018, we’ll be asking question askers in the Q&A sessions to introduce themselves (name and affiliation) before asking their questions, because we’d like to see this practice spread as a norm in the community. When our field was smaller, it may have been the case that most everyone knew everyone else on sight and could recognize each other’s voices. That’s surely not true now!

We want to emphasize that this advice is for everyone, regardless of whether you expect most people to know who you are. The speaker whose paper you’re asking a question about might well have heard of you, but not recognize you on sight. And the speaker might appreciate the chance to follow up with you later! Likewise, people in the audience appreciate knowing who is asking questions. Even if you’re pretty sure everyone in the audience knows who you are, it’s still important: perhaps not everyone can see you. Furthermore, if the more well-known speakers adopt this practice, it makes it more comfortable for less-established scholars to do so.

Presenting your academic work at a conference – applicable tips and advice

by Nanna Inie, researcher and practitioner in Digital Design at Aarhus University.

PRESENTING YOUR ACADEMIC WORK AT A CONFERENCE – applicable tips and advice

Everybody likes a good conference presentation. It is your chance to catch the interest of a room of interested peers that might cite your work and help spread it for you. Whether you are presenting a talk or a poster, you have the opportunity to sprout the interest of those conference attendees that just accidentally happened to be there because you were co-located with a talk or poster they actually came to see.

Perhaps even more importantly, you also take upon yourself the risk of boring the living hell out of those attendees that chose to give you just a little bit of their most valuable resources: their time and attention. That is something to be respectful of.

In this post I would like to share some applicable tips for presentations – both oral and posters. While academia is certainly a distinct communication genre with its own merit, and conference talks should not aim to be TED talks or advertising campaigns, there are some key rhetorical and visual strategies that would improve most academic presentations. After all, your goal is to convince your audience that your work is trustworthy, thorough, and above all interesting, so anything that might make that conclusion easier for them is a win for you.

PRESENTING YOUR WORK WITH TALKS AND SLIDES
Structure is everything, and the general rule is to keep the introduction to taking up 10% of your speech, content 80%, and conclusion 10%. The next general rule is to relate everything you include in your speech structure to one single purpose. That does not mean you can not tell stories about your data or your experiments, but only if they contribute to exemplifying the point you are trying to make.

As an academic, you are lucky enough to have already written the content you are trying to communicate – but your presentation should not just be a summary of your paper. It should be better – more appetizing – you have the time to focus on the really juicy parts of your results. Unless related work is literally a cornerstone of your contribution – is it necessary for your point? Ask for every single piece of information you put in your talk: would the talk suffer if I took this out? What does it contribute to the purpose? And is there a way I could amplify its contribution to the point – even if this means repeating why you are including this information to your audience.

Your first step is to decide which one core message you would like the audience to leave your talk with. If they forget everything else, what is the one sentence you want them to remember (except from “cite my work”)? It’s probably in your conclusion somewhere, but you might find it in the discussion, the results or even your research question. Once you have decided what your main purpose of the talk is, you can start building your talk around it. Here are 5 applicable tips for how to get your message across in a confident, convincing way:

1. Use crescendo. Keep the audience’s attention throughout your speech by building to a climax, rather than peaking too soon. Conference talks are quite short, and this works to your advantage. The audience barely has time to get bored. If you peak their interest early, they barely have time to fade away before your point can be made (NB: starting off with presented related work is not peaking your audience’s interest!). Once you have made it to your point, end quickly. For a short example of a talk that does this brilliantly, I refer you to this TED-talk “How to start a movement” by Derek Sivers (https://www.ted.com/talks/derek_sivers_how_to_start_a_movement). In this talk, Derek Sivers manages to tell his story, exemplified by a video recording in real time, and it works perfectly as a crescendo building the audience’s interest in “What on earth is this going to lead to?”. Once the video has ended, he recaps his points using no slides, ending quickly thereafter.

2. Pick a narrative structure. This will help your speech to be more memorable to your audience. Here are 3 of the most common speech narratives (there are many forms, but these are applicable to most academic presentations):

The Tower Structure: This is how many academic talks are structured. You use bits and pieces of information (which are interesting to the audience) to build your argument. Once you have finished this structure, you can show the audience the power of the totality of the argument you have created.

Mystery Structure: The mystery structure is about presenting a problem or question to your audience that they are desperate to know the answer to. You want to keep them in the dark throughout your talk for this structure, not revealing the answer until the very end. You might present hints and clues to the solution along the way, including the audience in your journey towards your crucial message.

Ping Pong Structure: The ping pong structure is ideal if you expect the audience to contradict your point. In this structure you present both sides of the argument, one after another, in such a way that the audience can follow both sides and stay curious about which side ultimately wins.

3. Consider adding pauses rather than additional information. Pauses are as powerful as white space in posters. If you have noticed, almost all TED talks that use slides work with “break slides” – deliberately empty slides that leave the screen black and forces the audience’s attention back to you. If you are building your argument and want to add that little extra weight on a sentence, consider either using a 4-5 second long pause after your sentence (5 seconds can feel like an eternity when you are presenting, but to the audience it is just enough time to let the argument sink in) – perhaps even repeat the argument again in a slightly different way after your pause. You might want to consider leaving your screen without slides for the introduction of your talk, to make sure you have the audience’s undivided attention, or leaving some slides black when you want to explain something technical that needs audience concentration.

4. Support with slides, don’t explain. That brings us to a little discussion about slides. Slides can be fantastic and they can be awful. They can support and they can distract, depending on how they are used. You have probably heard about keeping your slides to only bullets before. It’s still true. But it doesn’t necessarily solve anything – as it turns out, bullets can be just as long and text heavy as normal sentences. Do you actually need slides to present your work? I would like to pose that you don’t – unless you have visual material that highly underpins your crucial point. Models, images, graphs. Other than that, slides should not really be necessary. If you would like to use slides, use them to support the audience, not to support you. That’s what your notes are for. There is nothing wrong with writing your key message on a slide so your peers can photograph and tweet it, but then think about: what would your peers consider worth tweeting? And make it fun – don’t be afraid to add living images or videos (check the sound though), especially of your data. It is always fun to see the data – even if your data is a program, consider doing a screencast of it running and adding that as a silent video in the background while you explain what is cool about it.

5. Stay enthusiastic. If you do not think your results are fun and interesting, chances are your audience won’t either. Try your very best to identify exactly what made you interested in this problem originally, and convey that to your audience. Sometimes that involves explaining how your results might be used in the future – I have sometimes taken the liberty to add slides that were just called “Imagine a thing that …” and used that to explain how my results could be transformed into systems that would change the world. Sometimes illustrated by less than perfect stick figure illustrations – but the point here is not to show off as a designer, but to show the audience that I really, really want to tell them how fantastic this system could be – even if I couldn’t draw it very well.

HOW TO PRESENT WITH POSTERS
The number one mistake academic posters make is cramming in too much information on a piece of paper. Depending on how your (published) paper is written, your poster often does not need more information than is in the conclusion section:

  1. What is the research question(s)?
  2. How have you engaged with your data (type of study, experiment, etc.)?
  3. What are your results?
  4. What are the implications of your results?
  5. Authors, affiliation, funding.

That’s it. Most of these should be explainable in 1-3 bullet points of one-two sentences each, not much more. If your poster is based on a paper, you can print out copies of the paper and keep those next to the poster, so that interested parties can grab one.

Layout
People from left to right language systems read from left to right and top to bottom, which is how you have to help them digest the information you would like to present. Clearly defined boxes makes it easier for the eye to navigate the poster, especially in a horizontal (landscape) layout. Boxes do not have to be defined by borders (in fact, if you do not know what you are doing, I highly advise against using borders), but can be created by background colors or even white space. When you design posters, white space or negative space should be your new best friend – at least 40% of the poster should be free of text. This will make the text that did make the cut come much more into focus.

Consider highlighting your most interesting points with a colored box and white text, as for the title and conclusion below (notice how the other text seems “boxed” by white space but without actually having borders):

In my opinion, if you are not using a proper layout program (like Adobe InDesign), don’t be afraid of using tables to help you get that alignment right. It is much better to keep the layout conservative and maintain proper alignment than it is to experiment too much and end up with boxes that are a couple of centimeters off because PowerPoint just wants to watch the world burn. In the image above, I have just created one big table with three vertical columns, merged the two top left cells and, most importantly, adjusted the cell padding (the space, in pixels, between the cell wall and the cell content) to be high, thus automatically getting that nice white space.

Finally, don’t underestimate the power of an attention-grabbing poster. In the poster below, the concept of “long tail” was translated into a lemur, giving an opportunity to use a really big picture of a cute animal. Cheap trick as it might be, it actually does attract people’s attention, and if you are hanging out by your poster for a couple of hours anyway, you might as well not do it alone.

Other techniques can make a poster attractive. There two below are made of wood; the first is interactive, and one should turn round sections of it to reveal information. The second can be rearranged, connected with velcro – but the novel medium is enticing in itself.

Graphic resources
If you wish to upgrade the visuals of your poster, I highly recommend looking into three elements: images, icons, and fonts. I’ve added three resources for these that are free and easy to use:

Photos: https://www.pexels.com/
Gorgeous, free stock photos that will make your slides or posters so much more appealing

Icons: Looking for simpler illustrations? https://www.flaticon.com/categories is your friend!

Fonts: Sans-serif is generally better looking on screens and for headlines. https://www.fontsquirrel.com/ has many beautiful, free fonts – if you just need something that works quickly, you can’t go too wrong with Open Sans, Helvetica Neue or Rubik – bold for headlines and light or normal for body text.

Best of luck for your presentation!

 

Nanna Inie is finishing up a PhD in Digital Design at Aarhus University. As well as being founder of the largest TEDx event in Denmark, Nanna has previously been acclaimed with audience favorite poster at CHI during a stay at the UCSD Design Lab, and owns a video production company.

PC chairs report back: Paper types and the selection process

As we stated at the outset, one of our goals for COLING 2018 has been “to create a program of high quality papers which represent diverse approaches to and applications of computational linguistics written and presented by researchers from throughout our international community”. One aspect of the COLING 2018 review process that we designed with this goal in mind was the enumeration of six different paper types, each with its own tailored review form. We first proposed an initial set of five paper types, and then added a sixth and revised the review forms in light of community input. The final set of paper types and review forms can be found here. In this blog post, we report back on this aspect of COLING 2018, both quantitatively and qualitatively.

Submission and acceptance statistics

The first challenge was to recruit papers from the less common paper types. Most papers published a NLP venues fit either our “NLP Engineering Experiment” or “Resources” paper type. The table below shows how many of each type were submitted, withdrawn, and accepted, as well as the acceptance rate per paper type. (The “withdrawn” number is included because these are excluded from the denominator in the acceptance rate, as discussed here.)

Type Submitted Withdrawn Accepted
Acceptance rate
NLPEE 657 85 217 37.94%
CALA 163 28 45 33.33%
Resource 106 7 32 32.32%
Reproduction 35 0 17 48.57%
Position 31 6 8 32.00%
Survey 25 3 12 54.55%
Overall 1017 129 331 37.27%

Not surprisingly, the “NLP Engineering Experiment” paper type accounted for more than half of the submissions, but we are pleased that the other paper types are also represented. We hope that if this strategy is taken up in future COLINGs (or other venues) that it will continue to gain traction and the minority paper types will become more popular.

These statistics all represent the paper type chosen by the authors at submission time, not necessarily how we would have classified the papers.  More discussion on this point below.

Author survey on paper types

As described in our post on our author survey, the feedback on the paper types from authors was fairly positive:

We wanted to find out if people were aware of the paper types (since this is relatively unusual in our field) before submitting their papers, and if so, how they found out. Most—349 (80.4%)—were aware of the paper types ahead of time.  Of these, the vast majority (93.4%) found out about the paper types via the Call for Papers. Otherwise, people found out because someone else told them (7.4%), via our Twitter or Facebook feeds (6.0%), or via our blog (3.7%).

We also asked if it was clear to authors which paper type was appropriate for their paper and if they think paper types are a good idea. The answers in both cases were pretty strongly positive: 78.8% said it was clear and 91.0% said it was a good idea. (Interestingly, 74 people who said it wasn’t clear which paper type was a good fit for theirs nonetheless said it was a good idea, and 21 people who thought it was clear which paper type fit nonetheless said it wasn’t.)

Not knowable from that survey is whether/to what extent we failed to reach people who would have submitted e.g. a survey paper or reproduction paper, had they only known we were specifically soliciting them.

Reviewer survey on paper types

We also carried out a survey of our reviewers. This was sent with more delay (on 25 May, though reviews were due 10 April), and as some survey respondents pointed out, we may have gotten more accurate answers if we’d asked more quickly. But, there was plenty else we were worrying about in the interim! The response rate was also relatively low: only 128 of our 1200+ reviewers answered the survey. With those caveats, here are some results. (No question was required, so the answers don’t sum to 100%.)

  • We asked: “Did you feel like the authors chose the appropriate paper type for their papers?” 69.5% chose “Yes, all of them”, 26.6% “Only some of them”, and 0.8% (just one respondent), “No, none of them.”
  • We asked: “For papers that were assigned to what you thought was the correct paper type, did you feel that the review form questions helped you evaluate papers of that type?” 29.7% chose “Yes, better than usual for conferences/better than expected”, 57% “Yes, about as usual/about as expected”, 6.3% “No, worse than usual/worse than expected”, 1.6% “No, the review forms were poorly designed”
  • We asked: 36.7% chose “For papers that were assigned to what you thought was an incorrect paper type, how problematic was the mismatch?” 21.9% “Not so bad, even the numerical questions were still somewhat relevant” and no one chose “Pretty bad, I could only say useful things in the comments” or “Terrible, I felt like I couldn’t fairly evaluate the paper.” (58.6% chose “other”, but this was mostly people who didn’t have any mismatches.)

Our take away is that, at least for the reviewers who responded to the survey, the differentiated review forms for different paper types were on balance a plus—that is, they helped more than they hurt.

How to handle papers submitted under the wrong type?

Some misclassified papers were easy to spot.  We turned them up early in the process browsing the non-NLP engineering experiment paper types (since we were interested to see what was coming in). Similarly, ACs and reviewers noted many cases of obvious type mismatches. However, we decided against reclassifying papers. The primary reason for this is that, despite there being some clear cases of mistyped papers, many others would not be. It would be impossible to go through all papers and consider reassigning their types and do so consistently. Furthermore, the point of the paper types was to allow authors to choose what questions reviewers would be answering about their papers. Second-guessing this seemed unfair and non-transparent.

Perhaps the most common clear cases of mistyped papers were papers we considered NLP engineering experiment (NLPEE) papers that were submitted as computationally aided linguistic analysis (CALA) papers. We have a few hypotheses about why that might have happened, not mutually exclusive:

(1) Design factors. CALA was listed first on the paper types page; people read it, thought it matched and looked no further.  (Though in the dropdown menu for this question in the submission form on START, NLPEE is first.)

(2) Terminological prejudice. People were put off by “engineering” in the name of NLPEE. We’ve definitely heard some objections to that term from colleagues who take “engineering” to be a derogatory term. But we do not see it that way at all! Engineering research is research. Indeed, a lack of attention to good engineering in our computational experiments makes the science suffer. Furthermore, research contributions focused on building something and then testing how well it works seem to us to be well characterized by the term “engineering experiment”. It’s worth noting that we did struggle to come up with a name for this paper type, in large part because it is so ubiquitous but we couldn’t very well call it “typical NLP paper”. In our discussions, Leon proposed a name involving the word “empirical”, but Emily objected strongly to that: linguistic analysis papers that investigate patterns in language use to better understand linguistic structure or language behavior are very much empirical too.

(3) Interdisciplinary misunderstanding. Perhaps people working on NLPEE-type work from more of an ML background don’t understand the term “linguistic analysis” or “linguistic phenomenon” as we intended it. The CALA paper type was described as follows:

The focus of this paper type is new linguistic insight. It might take the form of an empirical study of some linguistic phenomenon, or of a theoretical result about a linguistically-relevant formal system.

It’s entirely possible that someone without training in linguistics would not know what terms like “linguistic phenomenon” or “formal system” denote for linguists. This speaks to the need for more interdisciplinary communication in our field, and we hope that COLING 2018 will continue the COLING tradition of providing such a venue!

 

COLING 2018 Accepted Papers

Here is the list of papers accepted at COLING 2018, to appear in Santa Fe. This list was delayed until the best paper process was completed, to make sure that these awards were selected without committee members being able to know the identity of paper authors.

Congratulations to all authors of accepted papers; we look forward to seeing you in New Mexico!

  • A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation – Surafel Melaku Lakew, Mauro Cettolo and Marcello Federico.
  • A Computational Model for the Linguistic Notion of Morphological Paradigm – Miikka Silfverberg, Ling Liu and Mans Hulden.
  • A Knowledge-Augmented Neural Network Model for Implicit Discourse Relation Classification – Yudai Kishimoto, Yugo Murawaki and Sadao Kurohashi.
  • A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis – Yicheng Zou, Tao Gui, Qi Zhang and Xuanjing Huang.
  • A Multi-Attention based Neural Network with External Knowledge for Story Ending Predicting Task – Qian Li, Ziwei Li, Jin-Mao Wei, Yanhui Gu, Adam Jatowt and Zhenglu Yang.
  • A New Approach to Animacy Detection – Labiba Jahan, Geeticka Chauhan and Mark Finlayson.
  • A New Concept of Deep Reinforcement Learning based Augmented General Tagging System – Yu Wang, Abhishek Patel and Hongxia Jin.
  • A Position-aware Bidirectional Attention Network for Aspect-level Sentiment Analysis – Shuqin Gu, Lipeng Zhang, Yuexian Hou and Yin Song.
  • A Practical Incremental Learning Framework For Sparse Entity Extraction – Hussein Al-Olimat, Steven Gustafson, Jason Mackay, Krishnaprasad Thirunarayan and Amit Sheth.
  • A Prospective-Performance Network to Alleviate Myopia in Beam Search for Response Generation – Zongsheng Wang, Yunzhi Bai, Bowen Wu, Zhen Xu, Zhuoran Wang and Baoxun Wang.
  • A Reinforcement Learning Framework for Natural Question Generation using Bi-discriminators – Zhihao Fan, Zhongyu Wei, Siyuan Wang, Yang Liu and Xuanjing Huang.
  • A Retrospective Analysis of the Fake News Challenge Stance-Detection Task – Andreas Hanselowski, Avinesh PVS, Benjamin Schiller, Felix Caspelherr, Debanjan Chaudhuri, Christian M. Meyer and Iryna Gurevych.
  • Ab Initio: Automatic Latin Proto-word Reconstruction – Alina Maria Ciobanu and Liviu P. Dinu.
  • Abstract Meaning Representation for Multi-Document Summarization – Kexin Liao, Logan Lebanoff and Fei Liu.
  • Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion – Mir Tafseer Nayeem, Tanvir Ahmed Fuad and Yllias Chali.
  • Adopting the Word-Pair-Dependency-Triplets with Individual Comparison for Natural Language Inference – Qianlong Du, Chengqing Zong and Keh-Yih Su.
  • Adversarial Domain Adaptation for Variational Neural Language Generation in Dialogue Systems – Van-Khanh Tran and Le-Minh Nguyen.
  • Adversarial Multi-lingual Neural Relation Extraction – Xiaozhi Wang, Xu Han, Yankai Lin, Zhiyuan Liu and Maosong Sun.
  • Aff2Vec: Affect–Enriched Distributional Word Representations – Sopan Khosla, Niyati Chhaya and Kushal Chawla.
  • All-in-one: Multi-task Learning for Rumour Verification – Elena Kochkina, Maria Liakata and Arkaitz Zubiaga.
  • An Attribute Enhanced Domain Adaptive Model for Cold-Start Spam Review Detection – Zhenni You, Tieyun Qian and Bing Liu.
  • An Empirical Study on Fine-Grained Named Entity Recognition – Khai Mai, Thai-Hoang Pham, Minh Trung Nguyen, Nguyen Tuan Duc, Danushka Bollegala, Ryohei Sasano and Satoshi Sekine.
  • An Exploration of Three Lightly-supervised Representation Learning Approaches for Named Entity Classification – Ajay Nagesh and Mihai Surdeanu.
  • AnlamVer: Semantic Model Evaluation Dataset for Turkish – Word Similarity and Relatedness – Gökhan Ercan and Olcay Taner Yıldız.
  • Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension – Mao Nakanishi, Tetsunori Kobayashi and Yoshihiko Hayashi.
  • Ask No More: Deciding when to guess in referential visual dialogue – RAVI SHEKHAR, Tim Baumgärtner, Aashish Venkatesh, Elia Bruni, Raffaella Bernardi and Raquel Fernández.
  • Aspect and Sentiment Aware Abstractive Review Summarization – Min Yang, Qiang Qu, Ying Shen, Qiao Liu, Wei Zhao and Jia Zhu.
  • Aspect-based summarization of pros and cons in unstructured product reviews – Florian Kunneman, Sander Wubben, Antal van den Bosch and Emiel Krahmer.
  • Assessing Composition in Sentence Vector Representations – Allyson Ettinger, Ahmed Elgohary, Colin Phillips and Philip Resnik.
  • Attending Sentences to detect Satirical Fake News – Sohan De Sarkar, Fan Yang and Arjun Mukherjee.
  • Authorless Topic Models: Biasing Models Away from Known Structure – Laure Thompson and David Mimno.
  • Authorship Attribution By Consensus Among Multiple Features – Jagadeesh Patchala and Raj Bhatnagar.
  • Automated Fact Checking: Task Formulations, Methods and Future Directions – James Thorne and Andreas Vlachos.
  • Automated Scoring: Beyond Natural Language Processing – Nitin Madnani and Aoife Cahill.
  • Automatic Detection of Fake News – Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre and Rada Mihalcea.
  • Bridge Video and Text with Cascade Syntactic Structure – Guolong Wang, Zheng Qin, Kaiping Xu, Kai Huang and Shuxiong Ye.
  • Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis – Andrew Moore and Paul Rayson.
  • Can Rumour Stance Alone Predict Veracity? – Sebastian Dungs, Ahmet Aker, Norbert Fuhr and Kalina Bontcheva.
  • CASCADE: Contextual Sarcasm Detection in Online Discussion Forums – Devamanyu Hazarika, Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann and Rada Mihalcea.
  • Challenges and Opportunities of Applying Natural Language Processing in Business Process Management – Han Van der Aa, Josep Carmona, Henrik Leopold, Jan Mendling and Lluís Padró.
  • Challenges of language technologies for the indigenous languages of the Americas – Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra and Ivan Meza-Ruiz.
  • Context-Sensitive Generation of Open-Domain Conversational Responses – Wei-Nan Zhang, Yiming Cui, Yifa Wang, Qingfu Zhu, Lingzhi Li, Lianqiang Zhou and Ting Liu.
  • Contextual String Embeddings for Sequence Labeling – Alan Akbik, Duncan Blythe and Roland Vollgraf.
  • Cooperative Denoising for Distantly Supervised Relation Extraction – Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan and Ying Shen.
  • Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need! – Steffen Eger, Johannes Daxenberger, Christian Stab and Iryna Gurevych.
  • Deep Enhanced Representation for Implicit Discourse Relation Recognition – Hongxiao Bai and Hai Zhao.
  • Dependent Gated Reading for Cloze-Style Question Answering – Reza Ghaeini, Xiaoli Fern, Hamed Shahbazi and Prasad Tadepalli.
  • Design Challenges and Misconceptions in Neural Sequence Labeling – Jie Yang, Shuailong Liang and Yue Zhang.
  • Design Challenges in Named Entity Transliteration – Yuval Merhav and Stephen Ash.
  • Dialogue-act-driven Conversation Model : An Experimental Study – Harshit Kumar, Arvind Agarwal and Sachindra Joshi.
  • Distance-Free Modeling of Multi-Predicate Interactions in End-to-End Japanese Predicate-Argument Structure Analysis – Yuichiroh Matsubayashi and Kentaro Inui.
  • Distinguishing affixoid formations from compounds – Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert.
  • Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data? – Yi Zhang, Xu SUN, Shuming Ma, Yang Yang and Xuancheng Ren.
  • Dynamic Multi-Level Multi-Task Learning for Sentence Simplification – Han Guo, Ramakanth Pasunuru and Mohit Bansal.
  • Effective Attention Modeling for Aspect-Level Sentiment Classification – Ruidan He, Wee Sun Lee, Hwee Tou Ng and Daniel Dahlmeier.
  • Embedding Words as Distributions with a Bayesian Skip-gram Model – Arthur Bražinskas, Serhii Havrylov and Ivan Titov.
  • Emotion Detection and Classification in a Multigenre Corpus with Joint Multi-Task Deep Learning – Shabnam Tafreshi and Mona Diab.
  • Emotion Representation Mapping for Automatic Lexicon Construction (Mostly) Performs on Human Level – Sven Buechel and Udo Hahn.
  • Employing Text Matching Network to Recognise Nuclearity in Chinese Discourse – Sheng Xu, Peifeng Li, Guodong Zhou and Qiaoming Zhu.
  • Enhanced Aspect Level Sentiment Classification with Auxiliary Memory – Peisong Zhu and Tieyun Qian.
  • Enhancing Sentence Embedding with Generalized Pooling – Qian Chen, Zhen-Hua Ling and Xiaodan Zhu.
  • Exploiting Structure in Representation of Named Entities using Active Learning – Nikita Bhutani, Kun Qian, Yunyao Li, H. V. Jagadish, Mauricio Hernandez and Mitesh Vasa.
  • Exploiting Syntactic Structures for Humor Recognition – Lizhen Liu, Donghai Zhang and Wei Song.
  • Exploratory Neural Relation Classification for Domain Knowledge Acquisition – Yan Fan, Chengyu Wang and Xiaofeng He.
  • Exploring the Influence of Spelling Errors on Lexical Variation Measures – Ryo Nagata, Taisei Sato and Hiroya Takamura.
  • Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media – Isabel Cachola, Eric Holgate, Daniel Preoţiuc-Pietro and Junyi Jessy Li.
  • Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation – Francis Grégoire and Philippe Langlais.
  • Extractive Headline Generation Based on Learning to Rank for Community Question Answering – Tatsuru Higurashi, Hayato Kobayashi, Takeshi Masuyama and Kazuma Murao.
  • Folksonomication: Predicting Tags for Movies from Plot Synopses using Emotion Flow Encoded Neural Network – Sudipta Kar, Suraj Maharjan and Thamar Solorio.
  • From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources – Ilia Kuznetsov and Iryna Gurevych.
  • Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model – Shaohui Kuang and Deyi Xiong.
  • GenSense: A Generalized Sense Retrofitting Model – Yang-Yin Lee, Ting-Yu Yen, Hen-Hsen Huang, Yow-Ting Shiue and Hsin-Hsi Chen.
  • Graphene: Semantically-Linked Propositions in Open Information Extraction – Matthias Cetto, Christina Niklaus, André Freitas and Siegfried Handschuh.
  • Grounded Textual Entailment – Hoa Vu, Claudio Greco, Aliia Erofeeva, Somayeh Jafaritazehjan, Guido Linders, Marc Tanti, Alberto Testoni, Raffaella Bernardi and Albert Gatt.
  • How emotional are you? Neural Architectures for Emotion Intensity Prediction in Microblogs – Devang Kulshreshtha, Pranav Goel and Anil Kumar Singh.
  • Hybrid Attention based Multimodal Network for Spoken Language Classification – Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li and Ivan Marsic.
  • Implicit Discourse Relation Recognition using Neural Tensor Network with Interactive Attention and Sparse Learning – Fengyu Guo, Ruifang He, Di Jin, Jianwu Dang, Longbiao Wang and Xiangang Li.
  • Improving Neural Machine Translation by Incorporating Hierarchical Subword Features – Makoto Morishita, Jun Suzuki and Masaaki Nagata.
  • Integrating Question Classification and Deep Learning for improved Answer Selection – Harish Tayyar Madabushi, Mark Lee and John Barnden.
  • Investigating Productive and Receptive Knowledge: A Profile for Second Language Learning – Leonardo Zilio, Rodrigo Wilkens and Cédrick Fairon.
  • Joint Modeling of Structure Identification and Nuclearity Recognition in Macro Chinese Discourse Treebank – Xiaomin Chu, Feng Jiang, Yi Zhou, Guodong Zhou and Qiaoming Zhu.
  • Knowledge as A Bridge: Improving Cross-domain Answer Selection with External Knowledge – Yang Deng, Ying Shen, Min Yang, Yaliang Li, Nan Du, Wei Fan and Kai Lei.
  • Learning Features from Co-occurrences: A Theoretical Analysis – Yanpeng Li.
  • Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types – Paul Felt, Eric Ringger, Kevin Seppi and Jordan Boyd-Graber.
  • Learning Sentiment Composition from Sentiment Lexicons – Orith Toledo-Ronen, Roy Bar-Haim, Alon Halfon, Charles Jochim, Amir Menczel, Ranit Aharonov and Noam Slonim.
  • Learning Target-Specific Representations of Financial News Documents For Cumulative Abnormal Return Prediction – Junwen Duan, Yue Zhang, Xiao Ding, Ching-Yun Chang and Ting Liu.
  • Learning to Generate Word Representations using Subword Information – Yeachan Kim, Kang-Min Kim, Ji-Min Lee and SangKeun Lee.
  • Learning Word Meta-Embeddings by Autoencoding – Danushka Bollegala and Cong Bao.
  • Low-resource Cross-lingual Event Type Detection via Distant Supervision with Minimal Effort – Aldrian Obaja Muis, Naoki Otani, Nidhi Vyas, Ruochen Xu, Yiming Yang, Teruko Mitamura and Eduard Hovy.
  • Lyrics Segmentation: Textual Macrostructure Detection using Convolutions – Michael Fell, Yaroslav Nechaev, Elena Cabrio and Fabien Gandon.
  • Measuring the Diversity of Automatic Image Descriptions – Emiel van Miltenburg, Desmond Elliott and Piek Vossen.
  • Model-Free Context-Aware Word Composition – Bo An, Xianpei Han and Le Sun.
  • Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches – Shaohui Kuang, Deyi Xiong, Weihua Luo and Guodong Zhou.
  • Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering – Daniil Sorokin and Iryna Gurevych.
  • Modeling with Recurrent Neural Networks for Open Vocabulary Slots – Jun-Seong Kim, Junghoe Kim, SeungUn Park, Kwangyong Lee and Yoonju Lee.
  • Multilevel Heuristics for Rationale-Based Entity Relation Classification in Sentences – Shiou Tian Hsu, Mandar Chaudhary and Nagiza Samatova.
  • Multilingual Neural Machine Translation with Task-Specific Attention – Graeme Blackwood, Miguel Ballesteros and Todd Ward.
  • Multimodal Grounding for Language Processing – Lisa Beinborn, Teresa Botschen and Iryna Gurevych.
  • Neural Activation Semantic Models: Computational lexical semantic models of localized neural activations – Nikos Athanasiou, Elias Iosif and Alexandros Potamianos.
  • Neural Collective Entity Linking – Yixin Cao, Lei Hou, Juanzi Li and Zhiyuan Liu.
  • Neural Machine Translation with Decoding History Enhanced Attention – Mingxuan Wang.
  • Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering – Wuwei Lan and Wei Xu.
  • Neural Relation Classification with Text Descriptions – Feiliang Ren, Di Zhou, Zhihui Liu, Yongcheng Li, Rongsheng Zhao, Yongkang Liu and Xiaobo Liang.
  • Neural Transition-based String Transduction for Limited-Resource Setting in Morphology – Peter Makarov and Simon Clematide.
  • Novelty Goes Deep. A Deep Neural Solution To Document Level Novelty Detection – Tirthankar Ghosal, Vignesh Edithal, Asif Ekbal, Pushpak Bhattacharyya, Srinivasa Satya Sameer Kumar Chivukula and George Tsatsaronis.
  • On Adversarial Examples for Character-Level Neural Machine Translation – Javid Ebrahimi, Daniel Lowd and Dejing Dou.
  • One-shot Learning for Question-Answering in Gaokao History Challenge – Zhuosheng Zhang and Hai Zhao.
  • Open Information Extraction from Conjunctive Sentences – Swarnadeep Saha and Mausam -.
  • Open Information Extraction on Scientific Text: An Evaluation – Paul Groth, Mike Lauruhn, Antony Scerri and Ron Daniel, Jr..
  • Pattern-revising Enhanced Simple Question Answering over Knowledge Bases – Yanchao Hao, Hao Liu, Shizhu He, Kang Liu and Jun Zhao.
  • Personalized Text Retrieval for Learners of Chinese as a Foreign Language – Chak Yan Yeung and John Lee.
  • Predicting Stances from Social Media Posts using Factorization Machines – Akira Sasaki, Kazuaki Hanawa, Naoaki Okazaki and Kentaro Inui.
  • Punctuation as Native Language Interference – Ilia Markov, Vivi Nastase and Carlo Strapparava.
  • Quantifying training challenges of dependency parsers – Lauriane Aufrant, Guillaume Wisniewski and François Yvon.
  • Recognizing Humour using Word Associations and Humour Anchor Extraction – Andrew Cattle and Xiaojuan Ma.
  • Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs – Wenpeng Yin, Yadollah Yaghoobzadeh and Hinrich Schütze.
  • Relation Induction in Word Embeddings Revisited – Zied Bouraoui, Shoaib Jameel and Steven Schockaert.
  • Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from Modern Hebrew – Adam Amram, Anat Ben-David and Reut Tsarfaty.
  • Rethinking the Agreement in Human Evaluation Tasks – Jacopo Amidei, Paul Piwek and Alistair Willis.
  • RNN Simulations of Grammaticality Judgments on Long-distance Dependencies – Shammur Absar Chowdhury and Roberto Zamparelli.
  • Self-Normalization Properties of Language Modeling – Jacob Goldberger and Oren Melamud.
  • Semi-Supervised Disfluency Detection – Feng Wang, Zhen Yang, Wei Chen, Shuang Xu, Bo Xu and Qianqian Dong.
  • Semi-Supervised Lexicon Learning for Wide-Coverage Semantic Parsing – Bo Chen, Bo An, Le Sun and Xianpei Han.
  • Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding – Yutai Hou, Yijia Liu, Wanxiang Che and Ting Liu.
  • SGM: Sequence Generation Model for Multi-label Classification – Pengcheng Yang, Xu SUN, Wei Li, Shuming Ma, Wei Wu and Houfeng WANG.
  • Simple Algorithms For Sentiment Analysis On Sentiment Rich, Data Poor Domains. – Prathusha K Sarma and William Sethares.
  • Sprucing up the trees – Error detection in treebanks – Ines Rehbein and Josef Ruppenhofer.
  • Stress Test Evaluation for Natural Language Inference – Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose and Graham Neubig.
  • Structure-Infused Copy Mechanisms for Abstractive Summarization – Kaiqiang Song, Lin Zhao and Fei Liu.
  • Structured Dialogue Policy with Graph Neural Networks – Lu Chen, Bowen Tan, Sishan Long and Kai Yu.
  • Subword-augmented Embedding for Cloze Reading Comprehension – Zhuosheng Zhang, Yafang Huang and Hai Zhao.
  • Systematic Study of Long Tail Phenomena in Entity Linking – Filip Ilievski, Piek Vossen and Stefan Schlobach.
  • The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities – Marco Del Tredici and Raquel Fernández.
  • They Exist! Introducing Plural Mentions to Coreference Resolution and Entity Linking – Ethan Zhou and Jinho D. Choi.
  • Topic or Style? Exploring the Most Useful Features for Authorship Attribution – Yunita Sari, Mark Stevenson and Andreas Vlachos.
  • Towards a unified framework for bilingual terminology extraction of single-word and multi-word terms – Jingshu Liu, Emmanuel Morin and Peña Saldarriaga.
  • Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies – Taraka Rama and Søren Wichmann.
  • Transition-based Neural RST Parsing with Implicit Syntax Features – Nan Yu, Meishan Zhang and Guohong Fu.
  • Treat us like the sequences we are: Prepositional Paraphrasing of Noun Compounds using LSTM – Girishkumar Ponkiya, Kevin Patel, Pushpak Bhattacharyya and Girish Palshikar.
  • Triad-based Neural Network for Coreference Resolution – Yuanliang Meng and Anna Rumshisky.
  • Two Local Models for Neural Constituent Parsing – Zhiyang Teng and Yue Zhang.
  • Unsupervised Morphology Learning with Statistical Paradigms – Hongzhi Xu, Mitchell Marcus, Charles Yang and Lyle Ungar.
  • Using J-K-fold Cross Validation To Reduce Variance When Tuning NLP Models – Henry Moss, David Leslie and Paul Rayson.
  • Variational Attention for Sequence-to-Sequence Models – Hareesh Bahuleyan, Lili Mou, Olga Vechtomova and Pascal Poupart.
  • What represents “style” in authorship attribution? – Kalaivani Sundararajan and Damon Woodard.
  • Who is Killed by Police: Introducing Supervised Attention for Hierarchical LSTMs – Minh Nguyen and Thien Nguyen.
  • Word-Level Loss Extensions for Neural Temporal Relation Classification – Artuur Leeuwenberg and Marie-Francine Moens.
  • Zero Pronoun Resolution with Attention-based Neural Network – Qingyu Yin, Yu Zhang, Wei-Nan Zhang, Ting Liu and William Yang Wang.
  • A Dataset for Building Code-Mixed Goal Oriented Conversation Systems – Suman Banerjee, Nikita Moghe, Siddhartha Arora and Mitesh M. Khapra.
  • A Deep Dive into Word Sense Disambiguation with LSTM – Minh Le, Marten Postma, Jacopo Urbani and Piek Vossen.
  • A Full End-to-End Semantic Role Labeler, Syntactic-agnostic Over Syntactic-aware? – Jiaxun Cai, Shexia He, Zuchao Li and Hai Zhao.
  • A LSTM Approach with Sub-Word Embeddings for Mongolian Phrase Break Prediction – Rui Liu, Feilong Bao, Guanglai Gao, Hui Zhang and Yonghe Wang.
  • A Neural Question Answering Model Based on Semi-Structured Tables – Hao Wang, Xiaodong Zhang, Shuming Ma, Xu SUN, Houfeng WANG and wang mengxiang.
  • A Nontrivial Sentence Corpus for the Task of Sentence Readability Assessment in Portuguese – Sidney Evaldo Leal, Magali Sanches Duran and Sandra Maria Aluísio.
  • A Pseudo Label based Dataless Naive Bayes Algorithm for Text Classification with Seed Words – Ximing Li and Bo Yang.
  • A Reassessment of Reference-Based Grammatical Error Correction Metrics – Shamil Chollampatt and Hwee Tou Ng.
  • A review of Spanish corpora annotated with negation – Salud María Jiménez-Zafra, Roser Morante, Maite Martin and L. Alfonso Urena Lopez.
  • A Review on Deep Learning Techniques Applied to Answer Selection – Tuan Manh Lai, Trung Bui and Sheng Li.
  • A Survey of Domain Adaptation for Neural Machine Translation – Chenhui Chu and Rui Wang.
  • A Survey on Open Information Extraction – Christina Niklaus, Matthias Cetto, André Freitas and Siegfried Handschuh.
  • A Survey on Recent Advances in Named Entity Recognition from Deep Learning models – Vikas Yadav and Steven Bethard.
  • Adaptive Learning of Local Semantic and Global Structure Representations for Text Classification – Jianyu Zhao, Zhiqiang Zhan, Qichuan Yang, Yang Zhang, Changjian Hu, Zhensheng Li, Liuxin Zhang and Zhiqiang He.
  • Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text – Junjie Xing, Kenny Zhu and Shaodian Zhang.
  • Adaptive Weighting for Neural Machine Translation – Yachao Li, Junhui Li and Min Zhang.
  • Addressee and Response Selection for Multilingual Conversation – Motoki Sato, Hiroki Ouchi and Yuta Tsuboi.
  • Adversarial Feature Adaptation for Cross-lingual Relation Classification – Bowei Zou, Zengzhuang Xu, Yu Hong and Guodong Zhou.
  • AMR Beyond the Sentence: the Multi-sentence AMR corpus – Tim O’Gorman, Michael Regan, Kira Griffitt, Martha Palmer, Ulf Hermjakob and Kevin Knight.
  • An Analysis of Annotated Corpora for Emotion Classification in Text – Laura Ana Maria Bostan and Roman Klinger.
  • An Empirical Investigation of Error Types in Vietnamese Parsing – Quy Nguyen, Yusuke Miyao, Hiroshi Noji and Nhung Nguyen.
  • An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization – Gongbo Tang, Fabienne Cap, Eva Pettersson and Joakim Nivre.
  • An Interpretable Reasoning Network for Multi-Relation Question Answering – Mantong Zhou, Minlie Huang and xiaoyan zhu.
  • An Operation Network for Abstractive Sentence Compression – Naitong Yu, Jie Zhang, Minlie Huang and xiaoyan zhu.
  • Ant Colony System for Multi-Document Summarization – Asma Al-Saleh and Mohamed El Bachir Menai.
  • Argumentation Synthesis following Rhetorical Strategies – Henning Wachsmuth, Manfred Stede, Roxanne El Baff, Khalid Al Khatib, Maria Skeppstedt and Benno Stein.
  • Arguments and Adjuncts in Universal Dependencies – Adam Przepiórkowski and Agnieszka Patejuk.
  • Arrows are the Verbs of Diagrams – Malihe Alikhani and Matthew Stone.
  • Assessing Quality Estimation Models for Sentence-Level Prediction – Hoang Cuong and Jia Xu.
  • Attributed and Predictive Entity Embedding for Fine-Grained Entity Typing in Knowledge Bases – Hailong Jin, Lei Hou, Juanzi Li and Tiansi Dong.
  • Author Profiling for Abuse Detection – Pushkar Mishra, Marco Del Tredici, Helen Yannakoudakis and Ekaterina Shutova.
  • Authorship Identification for Literary Book Recommendations – Haifa Alharthi, Diana Inkpen and Stan Szpakowicz.
  • Automatic Assessment of Conceptual Text Complexity Using Knowledge Graphs – Sanja Štajner and Ioana Hulpus.
  • Automatically Creating a Lexicon of Verbal Polarity Shifters: Mono- and Cross-lingual Methods for German – Marc Schulder, Michael Wiegand and Josef Ruppenhofer.
  • Automatically Extracting Qualia Relations for the Rich Event Ontology – Ghazaleh Kazeminejad, Claire Bonial, Susan Windisch Brown and Martha Palmer.
  • Bridging resolution: Task definition, corpus resources and rule-based experiments – Ina Roesiger, Arndt Riester and Jonas Kuhn.
  • Butterfly Effects in Frame Semantic Parsing: impact of data processing on model ranking – Alexandre Kabbach, Corentin Ribeyre and Aurélie Herbelot.
  • Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy – Deepak Gupta, Rajkumar Pujari, Asif Ekbal, Pushpak Bhattacharyya, Anutosh Maitra, Tom Jain and Shubhashis Sengupta.
  • Character-Level Feature Extraction with Densely Connected Networks – Chanhee Lee, Young-Bum Kim, Dongyub Lee and Heuiseok Lim.
  • Clausal Modifiers in the Grammar Matrix – Kristen Howell and Olga Zamaraeva.
  • Combining Information-Weighted Sequence Alignment and Sound Correspondence Models for Improved Cognate Detection – Johannes Dellert.
  • Convolutional Neural Network for Universal Sentence Embeddings – Xiaoqi Jiao, Fang Wang and Dan Feng.
  • Corpus-based Content Construction – Balaji Vasan Srinivasan, Pranav Maneriker, Kundan Krishna and Natwar Modani.
  • Correcting Chinese Word Usage Errors for Learning Chinese as a Second Language – Yow-Ting Shiue, Hen-Hsen Huang and Hsin-Hsi Chen.
  • Cross-lingual Knowledge Projection Using Machine Translation and Target-side Knowledge Base Completion – Naoki Otani, Hirokazu Kiyomaru, Daisuke Kawahara and Sadao Kurohashi.
  • Cross-media User Profiling with Joint Textual and Social User Embedding – Jingjing Wang, Shoushan Li, Mingqi Jiang, Hanqian Wu and Guodong Zhou.
  • Crowdsourcing a Large Corpus of Clickbait on Twitter – Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann, Erika Patricia Garces Fernandez, Matthias Hagen and Benno Stein.
  • Deconvolution-Based Global Decoding for Neural Machine Translation – Junyang Lin, Xu SUN, Xuancheng Ren, Shuming Ma, jinsong su and Qi Su.
  • Deep Neural Networks at the Service of Multilingual Parallel Sentence Extraction – Ahmad Aghaebrahimian.
  • deepQuest: A Framework for Neural-based Quality Estimation – Julia Ive, Frédéric Blain and Lucia Specia.
  • Diachronic word embeddings and semantic shifts: a survey – Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski and Erik Velldal.
  • DIDEC: The Dutch Image Description and Eye-tracking Corpus – Emiel van Miltenburg, Ákos Kádár, Ruud Koolen and Emiel Krahmer.
  • Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning – Yaosheng Yang, Wenliang Chen, Zhenghua Li, Zhengqiu He and Min Zhang.
  • Document-level Multi-aspect Sentiment Classification by Jointly Modeling Users, Aspects, and Overall Ratings – Junjie Li, Haitong Yang and Chengqing Zong.
  • Double Path Networks for Sequence to Sequence Learning – Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao QIN and Tie-Yan Liu.
  • Dynamic Feature Selection with Attention in Incremental Parsing – Ryosuke Kohita, Hiroshi Noji and Yuji Matsumoto.
  • Embedding WordNet Knowledge for Textual Entailment – Yunshi Lan and Jing Jiang.
  • Encoding Sentiment Information into Word Vectors for Sentiment Analysis – Zhe Ye, Fang Li and Timothy Baldwin.
  • Enhancing General Sentiment Lexicons for Domain-Specific Use – Tim Kreutz and Walter Daelemans.
  • Enriching Word Embeddings with Domain Knowledge for Readability Assessment – Zhiwei Jiang, Qing Gu, Yafeng Yin and Daoxu Chen.
  • Ensure the Correctness of the Summary: Incorporate Entailment Knowledge into Abstractive Sentence Summarization – Haoran Li, Junnan Zhu, Jiajun Zhang and Chengqing Zong.
  • Evaluating the text quality, human likeness and tailoring component of PASS: A Dutch data-to-text system for soccer – Chris van der Lee, Bart Verduijn, Emiel Krahmer and Sander Wubben.
  • Evaluation of Unsupervised Compositional Representations – Hanan Aldarmaki and Mona Diab.
  • Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia – Michael Azmy, Peng Shi, Ihab Ilyas and Jimmy Lin.
  • Fast and Accurate Reordering with ITG Transition RNN – Hao Zhang, Axel Ng and Richard Sproat.
  • Few-Shot Charge Prediction with Discriminative Legal Attributes – Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu and Maosong Sun.
  • Fine-Grained Arabic Dialect Identification – Mohammad Salameh and Houda Bouamor.
  • Generating Reasonable and Diversified Story Ending Using Sequence to Sequence Model with Adversarial Training – Zhongyang Li, Xiao Ding and Ting Liu.
  • Generic refinement of expressive grammar formalisms with an application to discontinuous constituent parsing – Kilian Gebhardt.
  • Genre Identification and the Compositional Effect of Genre in Literature – Joseph Worsham and Jugal Kalita.
  • Gold Standard Annotations for Preposition and Verb Sense with Semantic Role Labels in Adult-Child Interactions – Lori Moon, Christos Christodoulopoulos, Fisher Cynthia, Sandra Franco and Dan Roth.
  • Graph Based Decoding for Event Sequencing and Coreference Resolution – Zhengzhong Liu, Teruko Mitamura and Eduard Hovy.
  • HL-EncDec: A Hybrid-Level Encoder-Decoder for Neural Response Generation – Sixing Wu, Dawei Zhang, Ying Li, Xing Xie and Zhonghai Wu.
  • How Predictable is Your State? Leveraging Lexical and Contextual Information for Predicting Legislative Floor Action at the State Level – Vladimir Eidelman, Anastassia Kornilova and Daniel Argyle.
  • Identifying Emergent Research Trends by Key Authors and Phrases – Shenhao Jiang, Animesh Prasad, Min-Yen Kan and Kazunari Sugiyama.
  • If you’ve seen some, you’ve seen them all: Identifying variants of multiword expressions – Caroline Pasquer, Agata Savary, Carlos Ramisch and Jean-Yves Antoine.
  • Improving Feature Extraction for Pathology Reports with Precise Negation Scope Detection – Olga Zamaraeva, Kristen Howell and Adam Rhine.
  • Improving Named Entity Recognition by Jointly Learning to Disambiguate Morphological Tags – Onur Gungor, Suzan Uskudarli and Tunga Gungor.
  • Incorporating Argument-Level Interactions for Persuasion Comments Evaluation using Co-attention Model – Lu Ji, Zhongyu Wei, Xiangkun Hu, Yang Liu, Qi Zhang and Xuanjing Huang.
  • Incorporating Deep Visual Features into Multiobjective based Multi-view Search Result Clustering – Sayantan Mitra, Mohammed Hasanuzzaman and Sriparna Saha.
  • Incorporating Image Matching Into Knowledge Acquisition for Event-Oriented Relation Recognition – Yu Hong, Yang Xu, Huibin Ruan, Bowei Zou, Jianmin Yao and Guodong Zhou.
  • Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model – Poorya Zaremoodi and Gholamreza Haffari.
  • Incremental Natural Language Processing: Challenges, Strategies, and Evaluation – Arne Köhn.
  • Indigenous language technologies in Canada: Assessment, challenges, and successes – Patrick Littell, Anna Kazantseva, Roland Kuhn, Aidan Pine, Antti Arppe, Christopher Cox and Marie-Odile Junker.
  • Information Aggregation via Dynamic Routing for Sequence Encoding – Jingjing Gong, Xipeng Qiu, Shaojing Wang and Xuanjing Huang.
  • Integrating Tree Structures and Graph Structures with Neural Networks to Classify Discussion Discourse Acts – Yasuhide Miura, Ryuji Kano, Motoki Taniguchi, Tomoki Taniguchi, Shotaro Misawa and Tomoko Ohkuma.
  • Interaction-Aware Topic Model for Microblog Conversations through Network Embedding and User Attention – Ruifang He, Xuefei Zhang, Di Jin, Longbiao Wang, Jianwu Dang and Xiangang Li.
  • Interpretation of Implicit Conditions in Database Search Dialogues – Shunya Fukunaga, Hitoshi Nishikawa, Takenobu Tokunaga, Hikaru Yokono and Tetsuro Takahashi.
  • Investigating the Working of Text Classifiers – Devendra Sachan, Manzil Zaheer and Ruslan Salakhutdinov.
  • iParaphrasing: Extracting Visually Grounded Paraphrases via an Image – Chenhui Chu, Mayu Otani and Yuta Nakashima.
  • ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents – Stefano Mezza, Alessandra Cervone, Evgeny Stepanov, Giuliano Tortoreto and Giuseppe Riccardi.
  • Joint Learning from Labeled and Unlabeled Data for Information Retrieval – Bo Li, Ping Cheng and Le Jia.
  • Joint Neural Entity Disambiguation with Output Space Search – Hamed Shahbazi, Xiaoli Fern, Reza Ghaeini, Chao Ma, Rasha Mohammad Obeidat and Prasad Tadepalli.
  • JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features – Hongru Liang, Haozheng Wang, Jun Wang, Shaodi You, Zhe Sun, Jin-Mao Wei and Zhenglu Yang.
  • Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection – Erik-Lân Do Dinh, Steffen Eger and Iryna Gurevych.
  • LCQMC:A Large-scale Chinese Question Matching Corpus – Xin Liu, Qingcai Chen, Chong Deng, Huajun Zeng, Jing Chen, Dongfang Li and Buzhou Tang.
  • Learning Emotion-enriched Word Representations – Ameeta Agrawal, Aijun An and Manos Papagelis.
  • Learning Multilingual Topics from Incomparable Corpora – Shudong Hao and Michael J. Paul.
  • Learning Semantic Sentence Embeddings using Sequential Pair-wise Discriminator – Badri Narayana Patro, Vinod Kumar Kurmi, Sandeep Kumar and Vinay Namboodiri.
  • Learning to Progressively Recognize New Named Entities with Sequence to Sequence Model – Lingzhen Chen and Alessandro Moschitti.
  • Learning to Search in Long Documents Using Document Structure – Mor Geva and Jonathan Berant.
  • Learning Visually-Grounded Semantics from Contrastive Adversarial Samples – Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang and Jian Sun.
  • Learning What to Share: Leaky Multi-Task Network for Text Classification – Liqiang Xiao, Honglun Zhang, Wenqing Chen, Yongkun Wang and Yaohui Jin.
  • Learning with Noise-Contrastive Estimation: Easing training by learning to scale – Matthieu Labeau and Alexandre Allauzen.
  • Leveraging Meta-Embeddings for Bilingual Lexicon Extraction from Specialized Comparable Corpora – Amir Hazem and Emmanuel Morin.
  • Lexi: A tool for adaptive, personalized text simplification – Joachim Bingel, Gustavo Paetzold and Anders Søgaard.
  • Local String Transduction as Sequence Labeling – Joana Ribeiro, Shashi Narayan, Shay B. Cohen and Xavier Carreras.
  • Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models – Hussein Al-Olimat, Krishnaprasad Thirunarayan, Valerie Shalin and Amit Sheth.
  • MCDTB: A Macro-level Chinese Discourse TreeBank – Feng Jiang, Sheng Xu, Xiaomin Chu, Peifeng Li, Qiaoming Zhu and Guodong Zhou.
  • MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation – Meng Zou, Xihan Li, Haokun Liu and Zhihong Deng.
  • Modeling Multi-turn Conversation with Deep Utterance Aggregation – Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu and Hai Zhao.
  • Modeling the Readability of German Targeting Adults and Children: An empirically broad analysis and its cross-corpus validation – Zarah Weiß and Detmar Meurers.
  • Multi-layer Representation Fusion for Neural Machine Translation – Qiang Wang, Fuxue Li, Tong Xiao, Yanyang Li, Yinqiao Li and Jingbo Zhu.
  • Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension – Liang Wang, Sujian Li, Wei Zhao, Kewei Shen, Meng Sun, Ruoyu Jia and Jingming Liu.
  • Multi-Source Multi-Class Fake News Detection – Hamid Karimi, Proteek Roy, Sari Saba-Sadiya and Jiliang Tang.
  • Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling – Ryo Masumura, Tomohiro Tanaka, Ryuichiro Higashinaka, Hirokazu Masataki and Yushi Aono.
  • Multi-task dialog act and sentiment recognition on Mastodon – Christophe Cerisara, Somayeh Jafaritazehjani, Adedayo Oluokun and Hoa T. Le.
  • Multi-Task Learning for Sequence Tagging: An Empirical Study – Soravit Changpinyo, Hexiang Hu and Fei Sha.
  • Multi-Task Neural Models for Translating Between Styles Within and Across Languages – Xing Niu, Sudha Rao and Marine Carpuat.
  • Narrative Schema Stability in News Text – Dan Simonson and Anthony Davis.
  • Natural Language Interface for Databases Using a Dual-Encoder Model – Ionel Alexandru Hosu, Radu Cristian Alexandru Iacob, Florin Brad, Stefan Ruseti and Traian Rebedea.
  • Neural Machine Translation Incorporating Named Entity – Arata Ugawa, Akihiro Tamura, Takashi Ninomiya, Hiroya Takamura and Manabu Okumura.
  • Neural Math Word Problem Solver with Reinforcement Learning – Danqing Huang, Jing Liu, Chin-Yew Lin and Jian Yin.
  • NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager – Idris Yusupov and Yurii Kuratov.
  • One vs. Many QA Matching with both Word-level and Sentence-level Attention Network – Lu Wang, Shoushan Li, Changlong Sun, Luo Si, Xiaozhong Liu, Min Zhang and Guodong Zhou.
  • Open-Domain Event Detection using Distant Supervision – Jun Araki and Teruko Mitamura.
  • Par4Sim — Adaptive Paraphrasing for Text Simplification – Seid Muhie Yimam and Chris Biemann.
  • Parallel Corpora for bi-lingual English-Ethiopian Languages Statistical Machine Translation – Michael Melese, Solomon Teferra Abate, Martha Yifiru Tachbelie, Million Meshesha, Wondwossen Mulugeta, Yaregal Assibie, Solomon Atinafu, Binyam Ephrem, Tewodros Abebe, Hafte Abera, Amanuel Lemma, Tsegaye Andargie, Seifedin Shifaw and Wondimagegnhue Tsegaye.
  • Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource – Antonios Anastasopoulos, Marika Lekakou, Josep Quer, Eleni Zimianiti, Justin DeBenedetto and David Chiang.
  • Personalizing Lexical Simplification – John Lee and Chak Yan Yeung.
  • Pluralizing Nouns across Agglutinating Bantu Languages – Joan Byamugisha, C. Maria Keet and Brian DeRenzi.
  • Point Precisely: Towards Ensuring the Precision of Data in Generated Texts Using Delayed Copy Mechanism – Liunian Li and Xiaojun Wan.
  • Projecting Embeddings for Domain Adaption: Joint Modeling of Sentiment Analysis in Diverse Domains – Jeremy Barnes, Roman Klinger and Sabine Schulte im Walde.
  • Reading Comprehension with Graph-based Temporal-Casual Reasoning – Yawei Sun, Gong Cheng and Yuzhong Qu.
  • Real-time Change Point Detection using On-line Topic Models – Yunli Wang and Cyril Goutte.
  • Refining Source Representations with Relation Networks for Neural Machine Translation – Wen Zhang, hu jiawei, Yang Feng and Qun Liu.
  • Representation Learning of Entities and Documents from Knowledge Base Descriptions – Ikuya Yamada, Hiroyuki Shindo and Yoshiyasu Takefuji.
  • Reproducing and Regularizing the SCRN Model – Olzhas Kabdolov, Zhenisbek Assylbekov and Rustem Takhanov.
  • Responding E-commerce Product Questions via Exploiting QA Collections and Reviews – Qian Yu, Wai Lam and Zihao Wang.
  • ReSyf: a French lexicon with ranked synonyms – Mokhtar Boumedyen BILLAMI, Thomas François and Nuria Gala.
  • Retrofitting Distributional Embeddings to Knowledge Graphs with Functional Relations – Ben Lengerich, Andrew Maas and Christopher Potts.
  • Revisiting the Hierarchical Multiscale LSTM – Ákos Kádár, Marc-Alexandre Côté, Grzegorz Chrupała and Afra Alishahi.
  • Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging – Andrew Matteson, Chanhee Lee, Youngbum Kim and Heuiseok Lim.
  • Robust Lexical Features for Improved Neural Network Named-Entity Recognition – Abbas Ghaddar and Phillippe Langlais.
  • RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian – Anna Rogers, Alexey Romanov, Anna Rumshisky, Svitlana Volkova, Mikhail Gronas and Alex Gribov.
  • Scoring and Classifying Implicit Positive Interpretations: A Challenge of Class Imbalance – Chantal van Son, Roser Morante, Lora Aroyo and Piek Vossen.
  • Semantic Parsing for Technical Support Questions – Abhirut Gupta, Anupama Ray, Gargi Dasgupta, Gautam Singh, Pooja Aggarwal and Prateeti Mohapatra.
  • Sensitivity to Input Order: Evaluation of an Incremental and Memory-Limited Bayesian Cross-Situational Word Learning Model – Sepideh Sadeghi and Matthias Scheutz.
  • Sentence Weighting for Neural Machine Translation Domain Adaptation – Shiqi Zhang and Deyi Xiong.
  • Seq2seq Dependency Parsing – Zuchao Li, Jiaxun Cai, Shexia He and Hai Zhao.
  • Sequence-to-Sequence Learning for Task-oriented Dialogue with Dialogue State Representation – Haoyang Wen, Yijia Liu, Wanxiang Che, Libo Qin and Ting Liu.
  • SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors – Luis Espinosa Anke and Steven Schockaert.
  • Simple Neologism Based Domain Independent Models to Predict Year of Authorship – Vivek Kulkarni, Yingtao Tian, Parth Dandiwala and Steve Skiena.
  • Sliced Recurrent Neural Networks – Zeping Yu and Gongshen Liu.
  • SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions – Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney and Nazli Goharian.
  • Source Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language – He Bai, Yu Zhou, Jiajun Zhang, Liang Zhao, Mei-Yuh Hwang and Chengqing Zong.
  • Stance Detection with Hierarchical Attention Network – Qingying Sun, Zhongqing Wang, Qiaoming Zhu and Guodong Zhou.
  • Structured Representation Learning for Online Debate Stance Prediction – Chang Li, Aldo Porco and Dan Goldwasser.
  • Style Detection for Free Verse Poetry from Text and Speech – Timo Baumann, Hussein Hussein and Burkhard Meyer-Sickendiek.
  • Style Obfuscation by Invariance – Chris Emmery, Enrique Manjavacas Arevalo and Grzegorz Chrupała.
  • Summarization Evaluation in the Absence of Human Model Summaries Using the Compositionality of Word Embeddings – Elaheh ShafieiBavani, Mohammad Ebrahimi, Raymond Wong and Fang Chen.
  • Synonymy in Bilingual Context: The CzEngClass Lexicon – Zdenka Uresova, Eva Fucikova, Eva Hajicova and Jan Hajic.
  • Tailoring Neural Architectures for Translating from Morphologically Rich Languages – Peyman Passban, Andy Way and Qun Liu.
  • Task-oriented Word Embedding for Text Classification – Qian Liu, Heyan Huang, Yang Gao, Xiaochi Wei, Yuxin Tian and Luyang Liu.
  • The APVA-TURBO Approach To Question Answering in Knowledge Base – Yue Wang, Richong Zhang, Cheng Xu and Yongyi Mao.
  • Toward Better Loanword Identification in Uyghur Using Cross-lingual Word Embeddings – Chenggang Mi, Yating Yang, Lei Wang, Xi Zhou and Tonghai Jiang.
  • Towards a Language for Natural Language Treebank Transductions – Carlos A. Prolo.
  • Towards an argumentative content search engine using weak supervision – Ran Levy, Ben Bogin, Shai Gretz, Ranit Aharonov and Noam Slonim.
  • Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources – Adeline Granet, Emmanuel Morin, Harold Mouchère, Solen Quiniou and Christian Viard-Gaudin.
  • Transfer Learning for Entity Recognition of Novel Classes – Juan Diego Rodriguez, Adam Caldwell and Alexander Liu.
  • Twitter corpus of Resource-Scarce Languages for Sentiment Analysis and Multilingual Emoji Prediction – Nurendra Choudhary, Rajat Singh, Vijjini Anvesh Rao and Manish Shrivastava.
  • Urdu Word Segmentation using Conditional Random Fields (CRFs) – Haris Bin Zia, Agha Ali Raza and Awais Athar.
  • User-Level Race and Ethnicity Predictors from Twitter Text – Daniel Preoţiuc-Pietro and Lyle Ungar.
  • Using Formulaic Expressions in Writing Assistance Systems – Kenichi Iwatsuki and Akiko Aizawa.
  • Using Word Embeddings for Unsupervised Acronym Disambiguation – Jean Charbonnier and Christian Wartena.
  • Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps – Nobuyuki Shimizu, Na Rong and Takashi Miyazaki.
  • Vocabulary Tailored Summary Generation – Kundan Krishna, Aniket Murhekar, Saumitra Sharma and Balaji Vasan Srinivasan.
  • What’s in Your Embedding, And How It Predicts Task Performance – Anna Rogers, Shashwath Hosur Ananthakrishna and Anna Rumshisky.
  • Who Feels What and Why? Annotation of a Literature Corpus with Semantic Roles of Emotions – Evgeny Kim and Roman Klinger.
  • Why does PairDiff work? – A Mathematical Analysis of Bilinear Relational Compositional Operators for Analogy Detection – Huda Hakami, Kohei Hayashi and Danushka Bollegala.
  • WikiRef: Wikilinks as a route to recommending appropriate references for scientific Wikipedia pages – Abhik Jana, Pranjal Kanojiya, Pawan Goyal and Animesh Mukherjee.
  • Word Sense Disambiguation Based on Word Similarity Calculation Using Word Vector Representation from a Knowledge-based Graph – Dongsuk O, Sunjae Kwon, Kyungsun Kim and Youngjoong Ko.

COLING 2018 Best papers

There are multiple categories of award at COLING 2018, as we laid out in an earlier blog post. We received 44 nominations for best papers over ten categories, and conferred best paper awards in the categories as follows:

  • Best error analysis: SGM: Sequence Generation Model for Multi-label Classification, by Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu and Houfeng Wang.
  • Best evaluation: SGM: Sequence Generation Model for Multi-label Classification, by Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu and Houfeng Wang.
  • Best linguistic analysis: Distinguishing affixoid formations from compounds, by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert
  • Best NLP engineering experiment: Authorless Topic Models: Biasing Models Away from Known Structure, by Laure Thompson and David Mimno
  • Best position paper: Arguments and Adjuncts in Universal Dependencies, by Adam Przepiórkowski and Agnieszka Patejuk
  • Best reproduction paper: Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering, by Wuwei Lan and Wei Xu
  • Best resource paper: AnlamVer: Semantic Model Evaluation Dataset for Turkish – Word Similarity and Relatedness, by Gökhan Ercan and Olcay Taner Yıldız
  • Best survey paper: A Survey on Open Information Extraction, by Christina Niklaus, Matthias Cetto, André Freitas and Siegfried Handschuh
  • Most reproducible: Design Challenges and Misconceptions in Neural Sequence Labeling, by Jie Yang, Shuailong Liang and Yue Zhang

Note that, as announced last year, for open science & reproducibility COLING 2018 did not confer best paper awards to paper that could not make the code/resources publicly available by camera ready time. This means you can ask the best paper authors for associated data and programs right now, and they should be able to provide you with a link.

In addition, we would like to note the following papers as “Area Chair Favorites”, which were nominated by reviewers and recognised as excellent by chairs.

  • Visual Question Answering Dataset for Bilingual Image Understanding: A study of cross-lingual transfer using attention maps. Nobuyuki Shimizu, Na Rong and Takashi Miyazaki
  • Using J-K-fold Cross Validation To Reduce Variance When Tuning NLP Models. Henry Moss, David Leslie and Paul Rayson
  • Measuring the Diversity of Automatic Image Descriptions. Emiel van Miltenburg, Desmond Elliott and Piek Vossen
  • Reading Comprehension with Graph-based Temporal-Causal Reasoning. Yawei Sun, Gong Cheng and Yuzhong Qu
  • Diachronic word embeddings and semantic shifts: a survey. Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski and Erik Velldal
  • Transfer Learning for Entity Recognition of Novel Classes. Juan Diego Rodriguez, Adam Caldwell and Alexander Liu
  • Joint Modeling of Structure Identification and Nuclearity Recognition in Macro Chinese Discourse Treebank. Xiaomin Chu, Feng Jiang, Yi Zhou, Guodong Zhou and Qiaoming Zhu
  • Unsupervised Morphology Learning with Statistical Paradigms. Hongzhi Xu, Mitchell Marcus, Charles Yang and Lyle Ungar
  • Challenges of language technologies for the Americas indigenous languages. Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra and Ivan Meza-Ruiz
  • A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis. Yicheng Zou, Tao Gui, Qi Zhang and Xuanjing Huang
  • From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources. Ilia Kuznetsov and Iryna Gurevych
  • The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities. Marco Del Tredici and Raquel Fernández
  • Relation Induction in Word Embeddings Revisited. Zied Bouraoui, Shoaib Jameel and Steven Schockaert
  • Learning with Noise-Contrastive Estimation: Easing training by learning to scale. Matthieu Labeau and Alexandre Allauzen
  • Stress Test Evaluation for Natural Language Inference. Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose and Graham Neubig
  • Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs. Wenpeng Yin, Yadollah Yaghoobzadeh and Hinrich Schütze
  • SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions. Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney and Nazli Goharian
  • Automatically Extracting Qualia Relations for the Rich Event Ontology. Ghazaleh Kazeminejad, Claire Bonial, Susan Windisch Brown and Martha Palmer
  • What represents “style” in authorship attribution?. Kalaivani Sundararajan and Damon Woodard
  • SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors. Luis Espinosa Anke and Steven Schockaert
  • GenSense: A Generalized Sense Retrofitting Model. Yang-Yin Lee, Ting-Yu Yen, Hen-Hsen Huang, Yow-Ting Shiue and Hsin-Hsi Chen
  • A Multi-Attention based Neural Network with External Knowledge for Story Ending Predicting Task. Qian Li, Ziwei Li, Jin-Mao Wei, Yanhui Gu, Adam Jatowt and Zhenglu Yang
  • Abstract Meaning Representation for Multi-Document Summarization. Kexin Liao, Logan Lebanoff and Fei Liu
  • Cooperative Denoising for Distantly Supervised Relation Extraction. Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan and Ying Shen
  • Dialogue Act Driven Conversation Model: An Experimental Study. Harshit Kumar, Arvind Agarwal and Sachindra Joshi
  • Dynamic Multi-Level, Multi-Task Learning for Sentence Simplification. Han Guo, Ramakanth Pasunuru and Mohit Bansal
  • A Knowledge-Augmented Neural Network Model for Implicit Discourse Relation Classification. Yudai Kishimoto, Yugo Murawaki and Sadao Kurohashi
  • Abstractive Multi-Document Summarization using Paraphrastic Sentence Fusion. Mir Tafseer Nayeem, Tanvir Ahmed Fuad and Yllias Chali
  • They Exist! Introducing Plural Mentions to Coreference Resolution and Entity Linking. Ethan Zhou and Jinho D. Choi
  • A Comparison of Transformer and Recurrent Neural Networks on Multilingual NMT. Surafel Melaku Lakew, Mauro Cettolo and Marcello Federico
  • Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media. Isabel Cachola, Eric Holgate, Daniel Preoţiuc-Pietro and Junyi Jessy Li
  • On Adversarial Examples for Character-Level Neural Machine Translation. Javid Ebrahimi, Daniel Lowd and Dejing Dou
  • Neural Transition-based String Transduction for Limited-Resource Setting in Morphology. Peter Makarov and Simon Clematide
  • Structured Dialogue Policy with Graph Neural Networks. Lu Chen, Bowen Tan, Sishan Long and Kai Yu

We would like to recognise with exceptional thanks our best paper committee.

Acceptance rate

As we noted in a previous post, the acceptance rate is an important metric of competitiveness for authors with accepted papers.

… for individual researchers, especially those employed in or hoping to be employed in academia, acceptance of papers to COLING and similar venues is very important for job prospects/promotion/etc. Furthermore, it isn’t simply a matter of publishing in peer-reviewed venues, but in high-prestige, competitive venues. Where the validation view of peer review would view it as binary question (does this paper make a validatable contribution or not?), the prestige view instead speaks to ranking—where we end up with best papers, strong papers, borderline papers that get in, borderline papers that don’t get in, and papers that were easy to decide to reject. (And, for full disclosure, it is in the interest of a conference to strive to become and maintain status as a high-prestige, competitive venue.)

Not surprisingly, we’ve received several requests for the acceptance rate for COLING 2018. It turns out that determining that number is not straightforward. We initially had 1017 submissions, but some of those (129) were withdrawn, either early in the process (the authors never in fact completed the paper) or later, usually in light of acceptance at another venue, per the COLING 2018 dual submission policy. The denominator for our acceptance rate excludes these papers as it hardly seems fair to include papers that either weren’t reviewed, or were withdrawn because they were accepted elsewhere. Conversely, we decided to include the papers desk rejected (n=33) in the denominator.

With a total of 332 papers accepted for publication, that gives an acceptance rate of 37.4%.

PC chairs report back: On the effectiveness of author response

The utility of the author response part of the conference review process is hotly debated. At COLING 2018, we decided to have the author response be addressed only to the area chairs (and PC co-chairs), and not the reviewers. The purpose of this blog post is to report back on our experience with this model (largely positive, from the PC perspective!) and also to share with the community what we have learned, inhabiting this role, about what makes an effective author response.

For background, here is a description of the decision making process at the PC level. Keep in mind that COLING 2018 received 1017 submissions, of which 880 were still ‘active’ at the point of these decisions.  (The difference is a combination of desk rejects and papers withdrawn, the latter mostly in light of acceptance to other venues with earlier notifications.)

Outline of our process

Final accept/reject decisions for COLING 2018 were made as follows:

We asked the ACs for each area to provide a ranking of the papers in their area and to indicate recommendations of accept, maybe accept, maybe reject, or reject. We specifically instructed the ACs to not use the reviewer scores to sort the papers, but rather to come to their own ranking based on their judgment, given the reviews, discussion among reviews, author responses, and (where necessary) reading the papers.

Our role as PCs was to turn those recommendations into decisions. To do so, we first looked at each area’s report and determined which papers had clear recommendations and which were borderline.  For the former, we went with the AC recommendations directly. The borderline cases were either papers that the ACs marked as ‘maybe accept’ or ‘maybe reject’, or, for areas that only used ‘accept’ and ‘reject’, the last two ‘accept’ papers and the first two ‘reject’ papers in the ACs’ ranking. This gave us a bit over 200 papers to consider.

We divided the areas into two sets, one for each of us. (We were careful at this point to put the areas containing papers with which one of us had COIs into the other PC’s stack.) Area by area, we looked at the borderline papers, considering the reviews, the reviewer discussion (if any), the author response, comments from the ACs, and sometimes the papers (to clarify particular points; we didn’t read the papers in full). Although the PC role on START allows us to see the authors of all submissions, we worked out ways to look at all the information we needed to do this without seeing the author names (or institutions, etc).

Of the 200 or so papers we looked at, there were 23 for which we wanted to have further discussion. This was done over Skype, despite the 9 hour time difference! These papers were evenly distributed between Emily’s and Leon’s areas, but clustered towards the start of each of our respective stacks; our analysis is that as we worked our way through the process, we each gained a better sense of how to make the decisions and found less uncertainty. (Discussion of COI papers was done with the General Chair, Pierre Isabelle, not the other PC, per our COI policy.)

As a final step to verify data entry (to make sure what is entered in START actually matches our intentions), we went through and looked at both the accepted papers with the lowest reviewer scores and the rejected papers with the highest reviewer scores. 98 papers with an average score 3 or higher were rejected. 27 papers with an average score lower than 3 were accepted. (Remember, it’s not just about the numbers!) For each of these, we went back to our notes to check that the right information was entered (it was) and in so doing, we found that, for the majority of the papers which were accepted despite low reviewer scores (and correspondingly harsh reviews), our notes reflected effective author responses. This furthermore is consistent with our subjective sense that the author responses really did make a difference in the case of difficult decisions, that is, the papers we were looking at.

What makes an effective author response?

The effective author responses all had certain characteristics in common. They were written in a tone that was respectful, calm and confident (but not arrogant). They had specific answers to reviewers’ specific questions or specific replies to reviewers’ criticisms. For example, if a reviewer pointed out that a paper failed to discuss important related work, an effective author response would either acknowledge the omission and indicate that it will be addressed in the final version, or clearly state why the indicated paper isn’t in fact relevant. Effective author responses to reviewer questions about points that aren’t clear were short and to the point (and specific). This gave us confidence that the answers would be incorporated in the final version. In many cases, authors related the results of experiments they hadn’t had space for, or ran the analyses during the response period; this is much more effective than an ephemeral promise to add the content. Author responses could also be effective in indicating that reviewers misunderstood key points of the paper or the background into which it fits, but only if they were written in the calm, confident tone mentioned above.

Many effective author responses also expressed gratitude for the reviewers’ feedback. This was nice to see, but it wasn’t a problem when it wasn’t there.

What makes an ineffective author response?

In effective author responses, on the other hand seemed to be written in a place of anger. We understand where authors are coming from when this happens! Reviews, especially negative reviews, can sting. But an author response that comes across as angry, condescending, or combative is not effective at persuading the ACs & PCs that the reviewers have things the wrong way around, nor does it provide good evidence that the paper will be improved for the camera ready version.

Best practices for writing author responses

Here we try to distill our experience of reading the author responses for ~200 papers (not all papers had them, but most did) into some helpful tips.

For conference organizers

We definitely recommend setting up an author response process, but having the author responses go to the ACs (and PCs) only, not the reviewers.  Two ways to improve on what we did:

  • Clarify the word count constraints better than we did. We asked for no more than 400 words total, but the way START enforced that was no more than 400 words per review (since there were separate author response boxes for each review).
  • Don’t make the mistake we made of sending authors who wanted to do a late author response to their ACs … in the very small number of cases where that happened, it compromised anonymity of authors to ACs.

For authors

  • Read the reviews and write the angry version. Then set it aside and write a calmer one.
  • If you can, show your author response to someone who will read it for you and let you know where it sounds angry/arrogant/petty.
  • Try starting with “Thank you for the helpful feedback”—this isn’t necessary, and you can edit it out afterwards for space, but it might help you get off on the right foot regarding tone.
  • Don’t play the reviewers off each other (“R1 says this paper is hard to read, but that’s clearly wrong, because R2 said it was easy to follow.”) Rest assured that the ACs will read all of the reviews; they’ll have seen R2’s comments too.
  • Similarly, don’t feel obliged to reply to everything in the reviews. General negative comments (e.g. “I found this paper hard to read”) don’t require a response and there probably isn’t a response that would be helpful. Either the paper really is unclear or the reviewer doesn’t have sufficient background / didn’t leave enough time to read the paper carefully. Which scenario this is will likely be evident from the rest of the reviews and the author response.
  • Don’t promise the moon and the stars in the final version. It’s hard to accept a borderline paper based on promises alone.
  • Do indicate specific answers to key questions, in a way that is obviously easily incorporated in the final version. (And in that case it’s fine to say “We will add clarification along these lines”, or similar.)
  • Do concisely demonstrate mastery of the area, if reviewers probe issues you have considered during your research and you have the answers to hand.
  • Don’t play games with the word count. We saw two author responses where the authors got around the software’s restriction to 400 words (per box!) by_joining_whole_sentences_with_underscores. This does not make a good impression.

Ultimately, even a calm and confident author response doesn’t necessarily push a paper on the borderline over into accept. Sometimes the paper just isn’t ready and it’s not reasonable to try to fix what needs fixing or add what needs adding for the final version. Nonetheless, we found that the above patterns do make author responses more effective, and so we wanted to share them.

 

 

COLING schedule construction: Next steps

We are proud to have sent out the acceptance notifications for COLING 2018 ahead of schedule! But, our work as chairs is not done. Here are our next steps:

Progam construction

We have prepared a schedule “frame”, with plenary sessions (opening, keynotes, best papers, closing), parallel sessions (talks and posters), all fit in around coffee breaks, lunch and the excursions. Our task now is to group the accepted papers into coherent talk and poster sessions. In doing so, we will consider:

  • Author preferences (as indicated in START)
  • Area chair recommendations
  • Thematic coherence of sessions
  • Suitability of each topic for each format

Our goal is to have the program constructed by June 13. That timing is partially dependent on the best paper award process, outlined below.

Planning ahead

In an event of this size, it is inevitable that some number of presenters may be unable to attend at the last minute. In that case, we hope that speakers will be able to arrange to present remotely (per the inclusion policy).  If that is not possible, and an oral presentation is being pulled, we will seek to replace it with the most thematically similar poster available.

Best paper awards

We have 10 award categories, the 9 listed in our previous post on this topic, plus ‘Best error analysis’, which we really should have thought of initially! We have 11 scholars who have agreed to be on this committee. And we have 41 papers which have been nominated, each for one of the specific awards.

We will shortly be creating subcommittees of the best paper committee to consider each award. Each award will be considered by two committee members and most committee members will be working on two award types. The exception is the “Best NLP engineering experiment” award, as that award type has the most nominations (being the most common paper type among our submissions). The committee members working on that type will focus only on it. We are open to the possibility that some awards may go unallocated (if this is warranted) and also that a paper may end up with a different award than the one it was nominated for.

Timeline

May 17: Nominated papers to best paper committee
June 1: Each subcommittee reports to the whole BPC with their nomination and a handful of alternates; the BPC then discusses results
June 8: The committee confirms up to ten best paper awards for nomination to the PC co-chairs
June 13: Best papers confirmed, and authors notified

Anonymity

In order to preserve anonymity in the best paper award selection process, we will not post the list of accepted papers until the selection is done. Individual authors are of course free at this point to post their own information, but we trust our best paper committee won’t go hunting for it.

availability

As mentioned in our requirements post, only papers that have made the resources/code publicly available by camera ready time will be considered for best paper awards; those that rely on code or data, but haven’t made it available, will be taken out of the running.

Best paper committee

Our responsive, expert committee members are:

Publication preparation

Going from drafts to papers in proceedings is a massive undertaking—for you and for us. Our hard-working publication chairs, Xiaodan Zhu and Zhiyuan Liu, are directing and supporting the process of getting hundreds of main-conference papers (and later, more hundreds of workshop papers) into a form where they can be easily and freely downloaded by anyone. This collection of published papers is a huge part of the output of COLING. Creating them involves getting the proceedings to compile properly, which as you may have experience of, is tough enough for one single paper—let alone 300+ in one volume. So please, support them in this critical, painstaking work by getting your paper as tight and well-formatted as possible.

A window into the decision process

We are aware that the decision process for a large conference like COLING can be quite opaque from the point of view of authors, especially those who have not served in the role of AC or PC in the past. In this post, we aim to demystify a bit what we are doing (and why it takes so long from submission to decision!). As always, our belief is that more transparency leads to a better process—as we are committed to doing what we lay out, and what we lay out should be justified in this writing—and to a better understanding of the outcomes.

Timeline

Many of our authors are probably aware that reviews were due on April 10, and reviews are seen as the primary determinant of acceptance, so you might well wonder why you won’t be hearing about acceptance decisions until May 17. What could possibly take so long?

We (the PC co-chairs) met in Seattle last July to lay out a detailed timeline, making sure to build in time for careful decision making and also to allow for buffers to handle the near-certainty that some things would go wrong.  The portion between April 10 and May 17 looks like this:

April 10 Reviews due
April 11 ACs request reviewer discussion, chase missing reviews
April 15 Reviewer discussion ends
April 16 ACs request fixes to problematic reviews (too short, inappropriate tone)
April 19 Deadline for reviews to be updated based on AC feedback
April 20 Reviews available to authors; author response begins
April 25 Author response ends
April 26 AC discussion starts
May 3 Reviewer identities revealed to co-reviewers
May 4 AC recommendations due to PC co-chairs
May 16 Signatures revealed for signed reviews
May 17 Acceptance notifications

As you can see, the time between the initial deadline for reviews and the final acceptance notification is largely dedicated to two things: making sure all reviews are present and appropriate, and leaving time for thoughtful consideration by both ACs and PC co-chairs in the decision making process.

Of course, not everything goes according to plan. As of April 25, we still have a handful of missing or incomplete reviews. In many of these cases, ACs (including our Special Cirumstances ACs) are stepping in to provide the missing reviews. That this can be done blind is another benefit of keeping author identity from the ACs! (It’s not quite double blind, as authors can probably work out who the ACs are for their track, but that direction is less critical in this case.)

How did we end up with missing reviews? In some cases, this was not the fault of the reviewers at all. There were a handful of cases where START had the wrong email addresses for committee members, and we only discovered this when the ACs emailed the committee members from outside START—only to discover they hadn’t received their assignments! In other cases, committee members agreed to review and submitted bids and then didn’t turn in their reviews. While we absolutely understand that things come up, in the case that someone can’t complete their reviewing assignment, the best course of action in terms of minimizing impact on others (authors, other reviewers asked to step in, and the ACs/PCs managing the process) is just to communicate this fact as soon as possible.

Instructions to ACs

In our very first post to this PC blog we laid out our goals for the COLING 2018 program:

Our goals for COLING 2018 are (1) to create a program of high quality papers which represent diverse approaches to and applications of computational linguistics written and presented by researchers from throughout our international community; (2) to facilitate thoughtful reviewing which is both informative to ACs (and to us as PC co-chairs) and helpful to authors; and (3) to ensure that the results published at COLING 2018 are as reproducible as possible.

The process by which reviews are turned into acceptance decisions is a key part of the first of those goals (but not the only part—recruiting a strong, diverse pool of submissions was a key first step, as well as the design of the review process). Accordingly, these are the directions we have given to ACs, as they consider each paper in their area:

Please,​ ​please​ ​do​ ​​not​ ​​simply​ ​rank​ ​papers​ ​by​ ​overall​ ​score.​ ​Three​ ​reviewers​ ​is​ ​just​ ​not enough​ ​to​ ​get​ ​a​ ​reliable​ ​estimate​ ​of​ ​a​ ​paper’s​ ​quality.​ ​Maybe​ ​one​ ​reviewer​ ​didn’t​ ​read​ ​the paper,​ ​another​ ​one​ ​didn’t​ ​understand​ ​it​ ​and​ ​reacted​ ​poorly,​ ​and​ ​a​ ​final​ ​reviewer​ ​always​ ​gives negative​ ​scores;​ ​maybe​ ​one​ ​reviewer​ ​as​ ​warped​ ​priorities​ ​and​ ​another​ ​doesn’t​ ​know​ ​the area​ ​as​ ​well.​ ​There’s​ ​too​ ​much​ ​individual​ ​variance​ ​for​ ​a​ ​tiny​ ​number​ ​of​ ​reviewers​ ​(i.e.​ ​3)​ ​to precisely​ ​judge​ ​a​ ​paper.

 

In​ ​fact,​ ​don’t​ ​even​ ​sort​ ​papers​ ​like​ ​this​ ​to​ ​start​ ​out​ ​with;​ ​glancing​ ​at​ ​that​ ​list​ ​will​ ​unconsciously bias​ ​perception​ ​of​ ​the​ ​papers​ ​and​ ​that’ll​ ​mean​ ​poor​ ​decisions.​ ​Save​ ​yourself​ ​-​ ​don’t​ ​let knowledge​ ​of​ ​that​ ​ranking​ ​make​ ​a​ ​nuanced​ ​review​ ​go​ ​unread.

 

However,​ ​as​ ​an​ ​area​ ​chair,​ ​you​ ​know​ ​your​ ​area​ ​well,​ ​and​ ​have​ ​good​ ​ideas​ ​of​ ​the​ ​technical merits​ ​of​ ​individual​ ​works​ ​in​ ​that​ ​area.​ ​You​ ​should​ ​be​ ​understand​ ​the​ ​technical​ ​content​ ​when needed​ ​and​ ​be​ ​able​ ​to​ ​judge​ ​the​ ​reviews’​ ​quality​ ​for​ ​yourself.​ ​Once​ ​the​ ​scores​ ​are​ ​in,​ ​you’ll also​ ​have​ ​a​ ​good​ ​idea​ ​of​ ​which​ ​reviewers​ ​generally​ ​grade​ ​low​ ​(or​ ​high).

 

Try​ ​to​ ​order​ ​the​ ​papers​ ​in​ ​such​ ​a​ ​way​ ​that​ ​the​ ​ones​ ​you​ ​like​ ​most​ ​at​ ​the​ ​top,​ ​the​ ​ones​ ​that shouldn’t​ ​appear​ ​are​ ​at​ ​the​ ​bottom,​ ​and​ ​each​ ​paper​ ​is​ ​more​ ​preferable​ ​than​ ​the​ ​one​ ​below. You​ ​can​ ​split​ ​this​ ​work​ ​with​ ​your​ ​co-AC​ ​as​ ​you​ ​prefer;​ ​some​ ​will​ ​take​ ​half​ ​the​ ​papers​ ​and then​ ​merge,​ ​but​ ​if​ ​you​ ​do​ ​this,​ ​it’s​ ​important​ ​to​ ​realise​ ​that​ ​the​ ​split​ ​won’t​ ​be​ ​perfect​ ​-​ ​you won’t​ ​be​ ​able​ ​to​ ​interleave​ ​the​ ​resulting​ ​ranking​ ​one-by-one.​ ​In​ ​any​ ​event,​ ​both​ ​you​ ​and​ ​your co-AC​ ​must​ ​explicitly​ ​agree​ ​on​ ​the​ ​final​ ​ranking.

 

Use​ ​the​ ​reviews​ ​and​ ​author​ ​feedback​ ​as​ ​the​ ​evidence​ ​for​ ​the​ ​ranking,​ ​and​ ​be​ ​sure​ ​and confident​ ​about​ ​every​ ​decision.​ ​If​ ​you’re​ ​not​ ​yet​ ​confident,​ ​there​ ​are​ ​a​ ​few​ ​options.​ ​Ask​ ​the reviewers​ ​to​ ​clarify,​ ​or​ ​to​ ​examine​ ​a​ ​point;​ ​ask​ ​your​ ​co-AC​ ​for​ ​their​ ​opinion;​ ​find​ ​another reviewer​ ​for​ ​an​ ​extra​ ​opinion,​ ​if​ ​this​ ​can​ ​be​ ​done​ ​quickly;​ ​or​ ​ask​ ​us​ ​to​ ​send​ ​over​ ​resources.

 

Once​ ​you​ ​have​ ​an​ ​ordering,​ ​think​ ​about​ ​which​ ​of​ ​that​ ​set​ ​you’d​ ​recommend​ ​for​ ​acceptance, and​ ​send​ ​us​ ​the​ ​rankings​ ​along​ ​with​ ​your​ ​recommendations.​ ​​​You​ ​should​ ​also​ ​build​ ​a​ ​short report​ ​on​ ​your​ ​area​ ​-​ ​the​ ​process​ ​and​ ​the​ ​trends​ ​you​ ​saw​ ​there.​ ​Between​ ​you​ ​and​ ​your co-chair,​ ​this​ ​should​ ​be​ ​around​ ​100-500​ ​words.

As you can see, we are emphasizing holistic understanding of the merits of each paper, and de-emphasizing the numerical scores. Which brings up the obvious question: Why not rely on the scores?

It’s not just about the scores

Scoring is far too unreliable to be used as acceptance recommendation. We have only three reviewers, each biased in their own way. You won’t get good statistics with a population of 3, and we don’t expect to. This isn’t the reviewer’s fault; it’s just plain statistics. Rather, each review has to be considered on its own—in terms of overall bias, expertise, and how well a paper was understood by them.

So, in the words of Jason Eisner, from his fantastic “How to Serve as Program Chair of a Conference” guide:

How not to do it: Please, please, please don’t just sort the papers by the 3 reviewers’ average overall recommendation! There is too much variance in these scores for n=3 to be a large enough sample. Maybe reviewer #1 tends to give high scores to everyone, reviewer #2 has warped priorities, and reviewer #3 barely read the paper or barely knows the area. Whereas another paper drew a different set of 3 reviewers.

How still not to do it: Even as a first step, don’t sort the papers by average recommendation. Trust me — this noisy and uncalibrated ranking isn’t even a good way to triage the papers into likely accepts, likely rejects, and borderline papers that deserve a closer look. Don’t risk letting it subtly influence the final decisions, or letting it doom some actual, nuanced reviews to go unread.

What I told myself: When you’re working with several hundred papers, a single paper with an average score of 3.8 may seem to merit only a shrug and a coin flip. But a single false negative might harm a poor student’s confidence, delay her progress to her next project, or undermine her advisor’s grant proposal or promotion case. Conversely, a single false positive wastes the time of quite a lot of people in your audience.

To do this step fairly, then, for the 872 papers remaining undecided, requires a considerable effort.

The dual role of peer reviewed conferences

As we (the PC co-chairs) work to oversee this process and then construct a final program out of AC recommendations, we are mindful of the dual role that a full-paper peer-review conference like COLING 2018 is playing.

On the one hand, peer review is meant to be an integral part of the process of doing science. If something is published in a peer-reviewed venue, that is an indication that it has been read critically by a set of reviewers and found to make a worthwhile contribution to the field of inquiry. This doesn’t ensure that it is correct, or even that most people up-to-date with the field would find it reliable, but it is an indication of scientific value. (This is all the more difficult in interdisciplinary fields, as we address some in an earlier blog post.) This aspect of peer review fits well with the interests of the conference audience as stake-holders: The audience benefits from having vetted papers curated for them at the event.

On the other hand, for individual researchers, especially those employed in or hoping to be employed in academia, acceptance of papers to COLING and similar venues is very important for job prospects/promotion/etc. Furthermore, it isn’t simply a matter of publishing in peer-reviewed venues, but in high-prestige, competitive venues. Where the validation view of peer review would view it as binary question (does this paper make a validatable contribution or not?), the prestige view instead speaks to ranking—where we end up with best papers, strong papers, borderline papers that get in, borderline papers that don’t get in, and papers that were easy to decide to reject. (And, for full disclosure, it is in the interest of a conference to strive to become and maintain status as a high-prestige, competitive venue.)

While understanding our role in the validation aspect of peer review, we are indeed viewing it as a ranking rather than binary process, for several reasons. First, the reviewers are also human, and it is simply not the case that any group of 3-5 humans can definitively decide whether any given paper (roughly in their field) is definitely ‘valid’ or ‘invalid’ as a scientific contribution. Second, even if we did have a perfect oracle for validity, it’s not the case that the amount of available spots in a given conference will be a perfect match for the number of ‘valid’ papers among the submissions. In case there are more worthy papers than spots, decisions have to be made somehow—and we believe that somehow should include both measures of degree of interest in the paper and overall diversity of approaches and topics in the program. (Conversely, we will not be aiming to ‘fill up’ a certain number of spots just because we have them.) Finally, we work with the understanding that COLING is not the only conference available, and that authors whose work is not accepted to COLING will in most cases be able to improve the presentation and/or underlying methodology and submit to another conference.

That ranking is ultimately binarized into accept/reject (modulo best paper awards) and we understand (and have our own personal experiences with!) the way that a paper rejection can seem to convey: ‘this research is not valid/not worthy.’ Or alternatively, that authors with relatively high headline scores on a paper that is nonetheless rejected might feel that the ‘true’ or ‘correct’ result for their paper was overridden by the ACs or PC. But we hope that this blog post will help to dispel those notions by providing a broader view of the process.

PC process once we have the AC reports

Once the ACs provide us with their rankings and reports, on May 4, we (PC co-chairs) will have the task of building from them a (nearly) complete conference program—the one outstanding piece will be the selection of best papers from among the accepted papers. Ahead of time, we have blocked out a ‘frame’ for the overall program so we have upper limits on how many oral presentations and poster presentations we can accept.

As a first step, we will look to see how the total acceptance recommendations of the ACs compares to the total number of spots available. However, it is not our role to simply accept the AC’s recommendations, but rather to review them and ensure that the decisions as a whole are consistent (to the extent feasible, given that the whole process is noisy) and that the resulting program meets our goals of diversity in regard to topics and approaches (again, to the extent feasible, given the submission pool). We have also asked ACs to recommend mode of presentation (oral, poster), with the understanding that oral presentations are not ‘better papers’ than posters, but rather that some topics are more likely to be successful in each mode of presentation.

Though the author identities have been hidden from ACs, they haven’t been hidden from us. Nonetheless, as we work with the AC reports, we will have paper numbers & titles (but not author lists) to work from and will not go out of our way to associate author identities. Furthermore, the final accept/reject decisions for any papers that either of us have a COI with will be handled by the other PC co-chair together with the conference GC.

Review statistics

So far, there have been many things to measure of our review process at COLING. Here are a few.

Firstly, it’s interesting to see how many reviewers recommend the authors cite them. We can’t evaluate how appropriate this is, but it happened in 68 out of 2806 reviews (2.4%).

Best paper nominations are quite rare in general. This gives very little signal for the best paper committee to work with. To gain more information, in addition to asking whether a paper warranted further recognition, we asked reviewers to say if a given paper was the best out of those they had reviewed. This worked well for 747 reviewers, but 274 reviewers (26.8%) said no paper they reviewed was the best of their reviewing allocation.

Mean scores and confidence can be broken down by type, as follows.

Score Confidence
Computationally-aided linguistic analysis 2.85 3.42
NLP engineering experiment paper 2.86 3.51
Position paper 2.41 3.36
Reproduction paper 2.92 3.54
Resource paper 2.76 3.50
Survey paper 2.93 3.58

We can see that reviewers were least confident with position papers, and were both most confident and most pleased with survey papers—though reproduction papers came in a close second in regard to mean score. This fits the general expectation that position papers are hard to evaluate.

The overall distribution of scores follows.