COLING 2018 Accepted Papers

Here is the list of papers accepted at COLING 2018, to appear in Santa Fe. This list was delayed until the best paper process was completed, to make sure that these awards were selected without committee members being able to know the identity of paper authors.

Congratulations to all authors of accepted papers; we look forward to seeing you in New Mexico!

  • A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation – Surafel Melaku Lakew, Mauro Cettolo and Marcello Federico.
  • A Computational Model for the Linguistic Notion of Morphological Paradigm – Miikka Silfverberg, Ling Liu and Mans Hulden.
  • A Knowledge-Augmented Neural Network Model for Implicit Discourse Relation Classification – Yudai Kishimoto, Yugo Murawaki and Sadao Kurohashi.
  • A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis – Yicheng Zou, Tao Gui, Qi Zhang and Xuanjing Huang.
  • A Multi-Attention based Neural Network with External Knowledge for Story Ending Predicting Task – Qian Li, Ziwei Li, Jin-Mao Wei, Yanhui Gu, Adam Jatowt and Zhenglu Yang.
  • A New Approach to Animacy Detection – Labiba Jahan, Geeticka Chauhan and Mark Finlayson.
  • A New Concept of Deep Reinforcement Learning based Augmented General Tagging System – Yu Wang, Abhishek Patel and Hongxia Jin.
  • A Position-aware Bidirectional Attention Network for Aspect-level Sentiment Analysis – Shuqin Gu, Lipeng Zhang, Yuexian Hou and Yin Song.
  • A Practical Incremental Learning Framework For Sparse Entity Extraction – Hussein Al-Olimat, Steven Gustafson, Jason Mackay, Krishnaprasad Thirunarayan and Amit Sheth.
  • A Prospective-Performance Network to Alleviate Myopia in Beam Search for Response Generation – Zongsheng Wang, Yunzhi Bai, Bowen Wu, Zhen Xu, Zhuoran Wang and Baoxun Wang.
  • A Reinforcement Learning Framework for Natural Question Generation using Bi-discriminators – Zhihao Fan, Zhongyu Wei, Siyuan Wang, Yang Liu and Xuanjing Huang.
  • A Retrospective Analysis of the Fake News Challenge Stance-Detection Task – Andreas Hanselowski, Avinesh PVS, Benjamin Schiller, Felix Caspelherr, Debanjan Chaudhuri, Christian M. Meyer and Iryna Gurevych.
  • Ab Initio: Automatic Latin Proto-word Reconstruction – Alina Maria Ciobanu and Liviu P. Dinu.
  • Abstract Meaning Representation for Multi-Document Summarization – Kexin Liao, Logan Lebanoff and Fei Liu.
  • Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion – Mir Tafseer Nayeem, Tanvir Ahmed Fuad and Yllias Chali.
  • Adopting the Word-Pair-Dependency-Triplets with Individual Comparison for Natural Language Inference – Qianlong Du, Chengqing Zong and Keh-Yih Su.
  • Adversarial Domain Adaptation for Variational Neural Language Generation in Dialogue Systems – Van-Khanh Tran and Le-Minh Nguyen.
  • Adversarial Multi-lingual Neural Relation Extraction – Xiaozhi Wang, Xu Han, Yankai Lin, Zhiyuan Liu and Maosong Sun.
  • Aff2Vec: Affect–Enriched Distributional Word Representations – Sopan Khosla, Niyati Chhaya and Kushal Chawla.
  • All-in-one: Multi-task Learning for Rumour Verification – Elena Kochkina, Maria Liakata and Arkaitz Zubiaga.
  • An Attribute Enhanced Domain Adaptive Model for Cold-Start Spam Review Detection – Zhenni You, Tieyun Qian and Bing Liu.
  • An Empirical Study on Fine-Grained Named Entity Recognition – Khai Mai, Thai-Hoang Pham, Minh Trung Nguyen, Nguyen Tuan Duc, Danushka Bollegala, Ryohei Sasano and Satoshi Sekine.
  • An Exploration of Three Lightly-supervised Representation Learning Approaches for Named Entity Classification – Ajay Nagesh and Mihai Surdeanu.
  • AnlamVer: Semantic Model Evaluation Dataset for Turkish – Word Similarity and Relatedness – Gökhan Ercan and Olcay Taner Yıldız.
  • Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension – Mao Nakanishi, Tetsunori Kobayashi and Yoshihiko Hayashi.
  • Ask No More: Deciding when to guess in referential visual dialogue – RAVI SHEKHAR, Tim Baumgärtner, Aashish Venkatesh, Elia Bruni, Raffaella Bernardi and Raquel Fernández.
  • Aspect and Sentiment Aware Abstractive Review Summarization – Min Yang, Qiang Qu, Ying Shen, Qiao Liu, Wei Zhao and Jia Zhu.
  • Aspect-based summarization of pros and cons in unstructured product reviews – Florian Kunneman, Sander Wubben, Antal van den Bosch and Emiel Krahmer.
  • Assessing Composition in Sentence Vector Representations – Allyson Ettinger, Ahmed Elgohary, Colin Phillips and Philip Resnik.
  • Attending Sentences to detect Satirical Fake News – Sohan De Sarkar, Fan Yang and Arjun Mukherjee.
  • Authorless Topic Models: Biasing Models Away from Known Structure – Laure Thompson and David Mimno.
  • Authorship Attribution By Consensus Among Multiple Features – Jagadeesh Patchala and Raj Bhatnagar.
  • Automated Fact Checking: Task Formulations, Methods and Future Directions – James Thorne and Andreas Vlachos.
  • Automated Scoring: Beyond Natural Language Processing – Nitin Madnani and Aoife Cahill.
  • Automatic Detection of Fake News – Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre and Rada Mihalcea.
  • Bridge Video and Text with Cascade Syntactic Structure – Guolong Wang, Zheng Qin, Kaiping Xu, Kai Huang and Shuxiong Ye.
  • Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis – Andrew Moore and Paul Rayson.
  • Can Rumour Stance Alone Predict Veracity? – Sebastian Dungs, Ahmet Aker, Norbert Fuhr and Kalina Bontcheva.
  • CASCADE: Contextual Sarcasm Detection in Online Discussion Forums – Devamanyu Hazarika, Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann and Rada Mihalcea.
  • Challenges and Opportunities of Applying Natural Language Processing in Business Process Management – Han Van der Aa, Josep Carmona, Henrik Leopold, Jan Mendling and Lluís Padró.
  • Challenges of language technologies for the indigenous languages of the Americas – Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra and Ivan Meza-Ruiz.
  • Context-Sensitive Generation of Open-Domain Conversational Responses – Wei-Nan Zhang, Yiming Cui, Yifa Wang, Qingfu Zhu, Lingzhi Li, Lianqiang Zhou and Ting Liu.
  • Contextual String Embeddings for Sequence Labeling – Alan Akbik, Duncan Blythe and Roland Vollgraf.
  • Cooperative Denoising for Distantly Supervised Relation Extraction – Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan and Ying Shen.
  • Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need! – Steffen Eger, Johannes Daxenberger, Christian Stab and Iryna Gurevych.
  • Deep Enhanced Representation for Implicit Discourse Relation Recognition – Hongxiao Bai and Hai Zhao.
  • Dependent Gated Reading for Cloze-Style Question Answering – Reza Ghaeini, Xiaoli Fern, Hamed Shahbazi and Prasad Tadepalli.
  • Design Challenges and Misconceptions in Neural Sequence Labeling – Jie Yang, Shuailong Liang and Yue Zhang.
  • Design Challenges in Named Entity Transliteration – Yuval Merhav and Stephen Ash.
  • Dialogue-act-driven Conversation Model : An Experimental Study – Harshit Kumar, Arvind Agarwal and Sachindra Joshi.
  • Distance-Free Modeling of Multi-Predicate Interactions in End-to-End Japanese Predicate-Argument Structure Analysis – Yuichiroh Matsubayashi and Kentaro Inui.
  • Distinguishing affixoid formations from compounds – Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert.
  • Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data? – Yi Zhang, Xu SUN, Shuming Ma, Yang Yang and Xuancheng Ren.
  • Dynamic Multi-Level Multi-Task Learning for Sentence Simplification – Han Guo, Ramakanth Pasunuru and Mohit Bansal.
  • Effective Attention Modeling for Aspect-Level Sentiment Classification – Ruidan He, Wee Sun Lee, Hwee Tou Ng and Daniel Dahlmeier.
  • Embedding Words as Distributions with a Bayesian Skip-gram Model – Arthur Bražinskas, Serhii Havrylov and Ivan Titov.
  • Emotion Detection and Classification in a Multigenre Corpus with Joint Multi-Task Deep Learning – Shabnam Tafreshi and Mona Diab.
  • Emotion Representation Mapping for Automatic Lexicon Construction (Mostly) Performs on Human Level – Sven Buechel and Udo Hahn.
  • Employing Text Matching Network to Recognise Nuclearity in Chinese Discourse – Sheng Xu, Peifeng Li, Guodong Zhou and Qiaoming Zhu.
  • Enhanced Aspect Level Sentiment Classification with Auxiliary Memory – Peisong Zhu and Tieyun Qian.
  • Enhancing Sentence Embedding with Generalized Pooling – Qian Chen, Zhen-Hua Ling and Xiaodan Zhu.
  • Exploiting Structure in Representation of Named Entities using Active Learning – Nikita Bhutani, Kun Qian, Yunyao Li, H. V. Jagadish, Mauricio Hernandez and Mitesh Vasa.
  • Exploiting Syntactic Structures for Humor Recognition – Lizhen Liu, Donghai Zhang and Wei Song.
  • Exploratory Neural Relation Classification for Domain Knowledge Acquisition – Yan Fan, Chengyu Wang and Xiaofeng He.
  • Exploring the Influence of Spelling Errors on Lexical Variation Measures – Ryo Nagata, Taisei Sato and Hiroya Takamura.
  • Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media – Isabel Cachola, Eric Holgate, Daniel Preoţiuc-Pietro and Junyi Jessy Li.
  • Extracting Parallel Sentences with Bidirectional Recurrent Neural Networks to Improve Machine Translation – Francis Grégoire and Philippe Langlais.
  • Extractive Headline Generation Based on Learning to Rank for Community Question Answering – Tatsuru Higurashi, Hayato Kobayashi, Takeshi Masuyama and Kazuma Murao.
  • Folksonomication: Predicting Tags for Movies from Plot Synopses using Emotion Flow Encoded Neural Network – Sudipta Kar, Suraj Maharjan and Thamar Solorio.
  • From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources – Ilia Kuznetsov and Iryna Gurevych.
  • Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model – Shaohui Kuang and Deyi Xiong.
  • GenSense: A Generalized Sense Retrofitting Model – Yang-Yin Lee, Ting-Yu Yen, Hen-Hsen Huang, Yow-Ting Shiue and Hsin-Hsi Chen.
  • Graphene: Semantically-Linked Propositions in Open Information Extraction – Matthias Cetto, Christina Niklaus, André Freitas and Siegfried Handschuh.
  • Grounded Textual Entailment – Hoa Vu, Claudio Greco, Aliia Erofeeva, Somayeh Jafaritazehjan, Guido Linders, Marc Tanti, Alberto Testoni, Raffaella Bernardi and Albert Gatt.
  • How emotional are you? Neural Architectures for Emotion Intensity Prediction in Microblogs – Devang Kulshreshtha, Pranav Goel and Anil Kumar Singh.
  • Hybrid Attention based Multimodal Network for Spoken Language Classification – Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li and Ivan Marsic.
  • Implicit Discourse Relation Recognition using Neural Tensor Network with Interactive Attention and Sparse Learning – Fengyu Guo, Ruifang He, Di Jin, Jianwu Dang, Longbiao Wang and Xiangang Li.
  • Improving Neural Machine Translation by Incorporating Hierarchical Subword Features – Makoto Morishita, Jun Suzuki and Masaaki Nagata.
  • Integrating Question Classification and Deep Learning for improved Answer Selection – Harish Tayyar Madabushi, Mark Lee and John Barnden.
  • Investigating Productive and Receptive Knowledge: A Profile for Second Language Learning – Leonardo Zilio, Rodrigo Wilkens and Cédrick Fairon.
  • Joint Modeling of Structure Identification and Nuclearity Recognition in Macro Chinese Discourse Treebank – Xiaomin Chu, Feng Jiang, Yi Zhou, Guodong Zhou and Qiaoming Zhu.
  • Knowledge as A Bridge: Improving Cross-domain Answer Selection with External Knowledge – Yang Deng, Ying Shen, Min Yang, Yaliang Li, Nan Du, Wei Fan and Kai Lei.
  • Learning Features from Co-occurrences: A Theoretical Analysis – Yanpeng Li.
  • Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types – Paul Felt, Eric Ringger, Kevin Seppi and Jordan Boyd-Graber.
  • Learning Sentiment Composition from Sentiment Lexicons – Orith Toledo-Ronen, Roy Bar-Haim, Alon Halfon, Charles Jochim, Amir Menczel, Ranit Aharonov and Noam Slonim.
  • Learning Target-Specific Representations of Financial News Documents For Cumulative Abnormal Return Prediction – Junwen Duan, Yue Zhang, Xiao Ding, Ching-Yun Chang and Ting Liu.
  • Learning to Generate Word Representations using Subword Information – Yeachan Kim, Kang-Min Kim, Ji-Min Lee and SangKeun Lee.
  • Learning Word Meta-Embeddings by Autoencoding – Danushka Bollegala and Cong Bao.
  • Low-resource Cross-lingual Event Type Detection via Distant Supervision with Minimal Effort – Aldrian Obaja Muis, Naoki Otani, Nidhi Vyas, Ruochen Xu, Yiming Yang, Teruko Mitamura and Eduard Hovy.
  • Lyrics Segmentation: Textual Macrostructure Detection using Convolutions – Michael Fell, Yaroslav Nechaev, Elena Cabrio and Fabien Gandon.
  • Measuring the Diversity of Automatic Image Descriptions – Emiel van Miltenburg, Desmond Elliott and Piek Vossen.
  • Model-Free Context-Aware Word Composition – Bo An, Xianpei Han and Le Sun.
  • Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches – Shaohui Kuang, Deyi Xiong, Weihua Luo and Guodong Zhou.
  • Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering – Daniil Sorokin and Iryna Gurevych.
  • Modeling with Recurrent Neural Networks for Open Vocabulary Slots – Jun-Seong Kim, Junghoe Kim, SeungUn Park, Kwangyong Lee and Yoonju Lee.
  • Multilevel Heuristics for Rationale-Based Entity Relation Classification in Sentences – Shiou Tian Hsu, Mandar Chaudhary and Nagiza Samatova.
  • Multilingual Neural Machine Translation with Task-Specific Attention – Graeme Blackwood, Miguel Ballesteros and Todd Ward.
  • Multimodal Grounding for Language Processing – Lisa Beinborn, Teresa Botschen and Iryna Gurevych.
  • Neural Activation Semantic Models: Computational lexical semantic models of localized neural activations – Nikos Athanasiou, Elias Iosif and Alexandros Potamianos.
  • Neural Collective Entity Linking – Yixin Cao, Lei Hou, Juanzi Li and Zhiyuan Liu.
  • Neural Machine Translation with Decoding History Enhanced Attention – Mingxuan Wang.
  • Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering – Wuwei Lan and Wei Xu.
  • Neural Relation Classification with Text Descriptions – Feiliang Ren, Di Zhou, Zhihui Liu, Yongcheng Li, Rongsheng Zhao, Yongkang Liu and Xiaobo Liang.
  • Neural Transition-based String Transduction for Limited-Resource Setting in Morphology – Peter Makarov and Simon Clematide.
  • Novelty Goes Deep. A Deep Neural Solution To Document Level Novelty Detection – Tirthankar Ghosal, Vignesh Edithal, Asif Ekbal, Pushpak Bhattacharyya, Srinivasa Satya Sameer Kumar Chivukula and George Tsatsaronis.
  • On Adversarial Examples for Character-Level Neural Machine Translation – Javid Ebrahimi, Daniel Lowd and Dejing Dou.
  • One-shot Learning for Question-Answering in Gaokao History Challenge – Zhuosheng Zhang and Hai Zhao.
  • Open Information Extraction from Conjunctive Sentences – Swarnadeep Saha and Mausam -.
  • Open Information Extraction on Scientific Text: An Evaluation – Paul Groth, Mike Lauruhn, Antony Scerri and Ron Daniel, Jr..
  • Pattern-revising Enhanced Simple Question Answering over Knowledge Bases – Yanchao Hao, Hao Liu, Shizhu He, Kang Liu and Jun Zhao.
  • Personalized Text Retrieval for Learners of Chinese as a Foreign Language – Chak Yan Yeung and John Lee.
  • Predicting Stances from Social Media Posts using Factorization Machines – Akira Sasaki, Kazuaki Hanawa, Naoaki Okazaki and Kentaro Inui.
  • Punctuation as Native Language Interference – Ilia Markov, Vivi Nastase and Carlo Strapparava.
  • Quantifying training challenges of dependency parsers – Lauriane Aufrant, Guillaume Wisniewski and François Yvon.
  • Recognizing Humour using Word Associations and Humour Anchor Extraction – Andrew Cattle and Xiaojuan Ma.
  • Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs – Wenpeng Yin, Yadollah Yaghoobzadeh and Hinrich Schütze.
  • Relation Induction in Word Embeddings Revisited – Zied Bouraoui, Shoaib Jameel and Steven Schockaert.
  • Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from Modern Hebrew – Adam Amram, Anat Ben-David and Reut Tsarfaty.
  • Rethinking the Agreement in Human Evaluation Tasks – Jacopo Amidei, Paul Piwek and Alistair Willis.
  • RNN Simulations of Grammaticality Judgments on Long-distance Dependencies – Shammur Absar Chowdhury and Roberto Zamparelli.
  • Self-Normalization Properties of Language Modeling – Jacob Goldberger and Oren Melamud.
  • Semi-Supervised Disfluency Detection – Feng Wang, Zhen Yang, Wei Chen, Shuang Xu, Bo Xu and Qianqian Dong.
  • Semi-Supervised Lexicon Learning for Wide-Coverage Semantic Parsing – Bo Chen, Bo An, Le Sun and Xianpei Han.
  • Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding – Yutai Hou, Yijia Liu, Wanxiang Che and Ting Liu.
  • SGM: Sequence Generation Model for Multi-label Classification – Pengcheng Yang, Xu SUN, Wei Li, Shuming Ma, Wei Wu and Houfeng WANG.
  • Simple Algorithms For Sentiment Analysis On Sentiment Rich, Data Poor Domains. – Prathusha K Sarma and William Sethares.
  • Sprucing up the trees – Error detection in treebanks – Ines Rehbein and Josef Ruppenhofer.
  • Stress Test Evaluation for Natural Language Inference – Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose and Graham Neubig.
  • Structure-Infused Copy Mechanisms for Abstractive Summarization – Kaiqiang Song, Lin Zhao and Fei Liu.
  • Structured Dialogue Policy with Graph Neural Networks – Lu Chen, Bowen Tan, Sishan Long and Kai Yu.
  • Subword-augmented Embedding for Cloze Reading Comprehension – Zhuosheng Zhang, Yafang Huang and Hai Zhao.
  • Systematic Study of Long Tail Phenomena in Entity Linking – Filip Ilievski, Piek Vossen and Stefan Schlobach.
  • The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities – Marco Del Tredici and Raquel Fernández.
  • They Exist! Introducing Plural Mentions to Coreference Resolution and Entity Linking – Ethan Zhou and Jinho D. Choi.
  • Topic or Style? Exploring the Most Useful Features for Authorship Attribution – Yunita Sari, Mark Stevenson and Andreas Vlachos.
  • Towards a unified framework for bilingual terminology extraction of single-word and multi-word terms – Jingshu Liu, Emmanuel Morin and Peña Saldarriaga.
  • Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies – Taraka Rama and Søren Wichmann.
  • Transition-based Neural RST Parsing with Implicit Syntax Features – Nan Yu, Meishan Zhang and Guohong Fu.
  • Treat us like the sequences we are: Prepositional Paraphrasing of Noun Compounds using LSTM – Girishkumar Ponkiya, Kevin Patel, Pushpak Bhattacharyya and Girish Palshikar.
  • Triad-based Neural Network for Coreference Resolution – Yuanliang Meng and Anna Rumshisky.
  • Two Local Models for Neural Constituent Parsing – Zhiyang Teng and Yue Zhang.
  • Unsupervised Morphology Learning with Statistical Paradigms – Hongzhi Xu, Mitchell Marcus, Charles Yang and Lyle Ungar.
  • Using J-K-fold Cross Validation To Reduce Variance When Tuning NLP Models – Henry Moss, David Leslie and Paul Rayson.
  • Variational Attention for Sequence-to-Sequence Models – Hareesh Bahuleyan, Lili Mou, Olga Vechtomova and Pascal Poupart.
  • What represents “style” in authorship attribution? – Kalaivani Sundararajan and Damon Woodard.
  • Who is Killed by Police: Introducing Supervised Attention for Hierarchical LSTMs – Minh Nguyen and Thien Nguyen.
  • Word-Level Loss Extensions for Neural Temporal Relation Classification – Artuur Leeuwenberg and Marie-Francine Moens.
  • Zero Pronoun Resolution with Attention-based Neural Network – Qingyu Yin, Yu Zhang, Wei-Nan Zhang, Ting Liu and William Yang Wang.
  • A Dataset for Building Code-Mixed Goal Oriented Conversation Systems – Suman Banerjee, Nikita Moghe, Siddhartha Arora and Mitesh M. Khapra.
  • A Deep Dive into Word Sense Disambiguation with LSTM – Minh Le, Marten Postma, Jacopo Urbani and Piek Vossen.
  • A Full End-to-End Semantic Role Labeler, Syntactic-agnostic Over Syntactic-aware? – Jiaxun Cai, Shexia He, Zuchao Li and Hai Zhao.
  • A LSTM Approach with Sub-Word Embeddings for Mongolian Phrase Break Prediction – Rui Liu, Feilong Bao, Guanglai Gao, Hui Zhang and Yonghe Wang.
  • A Neural Question Answering Model Based on Semi-Structured Tables – Hao Wang, Xiaodong Zhang, Shuming Ma, Xu SUN, Houfeng WANG and wang mengxiang.
  • A Nontrivial Sentence Corpus for the Task of Sentence Readability Assessment in Portuguese – Sidney Evaldo Leal, Magali Sanches Duran and Sandra Maria Aluísio.
  • A Pseudo Label based Dataless Naive Bayes Algorithm for Text Classification with Seed Words – Ximing Li and Bo Yang.
  • A Reassessment of Reference-Based Grammatical Error Correction Metrics – Shamil Chollampatt and Hwee Tou Ng.
  • A review of Spanish corpora annotated with negation – Salud María Jiménez-Zafra, Roser Morante, Maite Martin and L. Alfonso Urena Lopez.
  • A Review on Deep Learning Techniques Applied to Answer Selection – Tuan Manh Lai, Trung Bui and Sheng Li.
  • A Survey of Domain Adaptation for Neural Machine Translation – Chenhui Chu and Rui Wang.
  • A Survey on Open Information Extraction – Christina Niklaus, Matthias Cetto, André Freitas and Siegfried Handschuh.
  • A Survey on Recent Advances in Named Entity Recognition from Deep Learning models – Vikas Yadav and Steven Bethard.
  • Adaptive Learning of Local Semantic and Global Structure Representations for Text Classification – Jianyu Zhao, Zhiqiang Zhan, Qichuan Yang, Yang Zhang, Changjian Hu, Zhensheng Li, Liuxin Zhang and Zhiqiang He.
  • Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text – Junjie Xing, Kenny Zhu and Shaodian Zhang.
  • Adaptive Weighting for Neural Machine Translation – Yachao Li, Junhui Li and Min Zhang.
  • Addressee and Response Selection for Multilingual Conversation – Motoki Sato, Hiroki Ouchi and Yuta Tsuboi.
  • Adversarial Feature Adaptation for Cross-lingual Relation Classification – Bowei Zou, Zengzhuang Xu, Yu Hong and Guodong Zhou.
  • AMR Beyond the Sentence: the Multi-sentence AMR corpus – Tim O’Gorman, Michael Regan, Kira Griffitt, Martha Palmer, Ulf Hermjakob and Kevin Knight.
  • An Analysis of Annotated Corpora for Emotion Classification in Text – Laura Ana Maria Bostan and Roman Klinger.
  • An Empirical Investigation of Error Types in Vietnamese Parsing – Quy Nguyen, Yusuke Miyao, Hiroshi Noji and Nhung Nguyen.
  • An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization – Gongbo Tang, Fabienne Cap, Eva Pettersson and Joakim Nivre.
  • An Interpretable Reasoning Network for Multi-Relation Question Answering – Mantong Zhou, Minlie Huang and xiaoyan zhu.
  • An Operation Network for Abstractive Sentence Compression – Naitong Yu, Jie Zhang, Minlie Huang and xiaoyan zhu.
  • Ant Colony System for Multi-Document Summarization – Asma Al-Saleh and Mohamed El Bachir Menai.
  • Argumentation Synthesis following Rhetorical Strategies – Henning Wachsmuth, Manfred Stede, Roxanne El Baff, Khalid Al Khatib, Maria Skeppstedt and Benno Stein.
  • Arguments and Adjuncts in Universal Dependencies – Adam Przepiórkowski and Agnieszka Patejuk.
  • Arrows are the Verbs of Diagrams – Malihe Alikhani and Matthew Stone.
  • Assessing Quality Estimation Models for Sentence-Level Prediction – Hoang Cuong and Jia Xu.
  • Attributed and Predictive Entity Embedding for Fine-Grained Entity Typing in Knowledge Bases – Hailong Jin, Lei Hou, Juanzi Li and Tiansi Dong.
  • Author Profiling for Abuse Detection – Pushkar Mishra, Marco Del Tredici, Helen Yannakoudakis and Ekaterina Shutova.
  • Authorship Identification for Literary Book Recommendations – Haifa Alharthi, Diana Inkpen and Stan Szpakowicz.
  • Automatic Assessment of Conceptual Text Complexity Using Knowledge Graphs – Sanja Štajner and Ioana Hulpus.
  • Automatically Creating a Lexicon of Verbal Polarity Shifters: Mono- and Cross-lingual Methods for German – Marc Schulder, Michael Wiegand and Josef Ruppenhofer.
  • Automatically Extracting Qualia Relations for the Rich Event Ontology – Ghazaleh Kazeminejad, Claire Bonial, Susan Windisch Brown and Martha Palmer.
  • Bridging resolution: Task definition, corpus resources and rule-based experiments – Ina Roesiger, Arndt Riester and Jonas Kuhn.
  • Butterfly Effects in Frame Semantic Parsing: impact of data processing on model ranking – Alexandre Kabbach, Corentin Ribeyre and Aurélie Herbelot.
  • Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy – Deepak Gupta, Rajkumar Pujari, Asif Ekbal, Pushpak Bhattacharyya, Anutosh Maitra, Tom Jain and Shubhashis Sengupta.
  • Character-Level Feature Extraction with Densely Connected Networks – Chanhee Lee, Young-Bum Kim, Dongyub Lee and Heuiseok Lim.
  • Clausal Modifiers in the Grammar Matrix – Kristen Howell and Olga Zamaraeva.
  • Combining Information-Weighted Sequence Alignment and Sound Correspondence Models for Improved Cognate Detection – Johannes Dellert.
  • Convolutional Neural Network for Universal Sentence Embeddings – Xiaoqi Jiao, Fang Wang and Dan Feng.
  • Corpus-based Content Construction – Balaji Vasan Srinivasan, Pranav Maneriker, Kundan Krishna and Natwar Modani.
  • Correcting Chinese Word Usage Errors for Learning Chinese as a Second Language – Yow-Ting Shiue, Hen-Hsen Huang and Hsin-Hsi Chen.
  • Cross-lingual Knowledge Projection Using Machine Translation and Target-side Knowledge Base Completion – Naoki Otani, Hirokazu Kiyomaru, Daisuke Kawahara and Sadao Kurohashi.
  • Cross-media User Profiling with Joint Textual and Social User Embedding – Jingjing Wang, Shoushan Li, Mingqi Jiang, Hanqian Wu and Guodong Zhou.
  • Crowdsourcing a Large Corpus of Clickbait on Twitter – Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann, Erika Patricia Garces Fernandez, Matthias Hagen and Benno Stein.
  • Deconvolution-Based Global Decoding for Neural Machine Translation – Junyang Lin, Xu SUN, Xuancheng Ren, Shuming Ma, jinsong su and Qi Su.
  • Deep Neural Networks at the Service of Multilingual Parallel Sentence Extraction – Ahmad Aghaebrahimian.
  • deepQuest: A Framework for Neural-based Quality Estimation – Julia Ive, Frédéric Blain and Lucia Specia.
  • Diachronic word embeddings and semantic shifts: a survey – Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski and Erik Velldal.
  • DIDEC: The Dutch Image Description and Eye-tracking Corpus – Emiel van Miltenburg, Ákos Kádár, Ruud Koolen and Emiel Krahmer.
  • Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning – Yaosheng Yang, Wenliang Chen, Zhenghua Li, Zhengqiu He and Min Zhang.
  • Document-level Multi-aspect Sentiment Classification by Jointly Modeling Users, Aspects, and Overall Ratings – Junjie Li, Haitong Yang and Chengqing Zong.
  • Double Path Networks for Sequence to Sequence Learning – Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao QIN and Tie-Yan Liu.
  • Dynamic Feature Selection with Attention in Incremental Parsing – Ryosuke Kohita, Hiroshi Noji and Yuji Matsumoto.
  • Embedding WordNet Knowledge for Textual Entailment – Yunshi Lan and Jing Jiang.
  • Encoding Sentiment Information into Word Vectors for Sentiment Analysis – Zhe Ye, Fang Li and Timothy Baldwin.
  • Enhancing General Sentiment Lexicons for Domain-Specific Use – Tim Kreutz and Walter Daelemans.
  • Enriching Word Embeddings with Domain Knowledge for Readability Assessment – Zhiwei Jiang, Qing Gu, Yafeng Yin and Daoxu Chen.
  • Ensure the Correctness of the Summary: Incorporate Entailment Knowledge into Abstractive Sentence Summarization – Haoran Li, Junnan Zhu, Jiajun Zhang and Chengqing Zong.
  • Evaluating the text quality, human likeness and tailoring component of PASS: A Dutch data-to-text system for soccer – Chris van der Lee, Bart Verduijn, Emiel Krahmer and Sander Wubben.
  • Evaluation of Unsupervised Compositional Representations – Hanan Aldarmaki and Mona Diab.
  • Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia – Michael Azmy, Peng Shi, Ihab Ilyas and Jimmy Lin.
  • Fast and Accurate Reordering with ITG Transition RNN – Hao Zhang, Axel Ng and Richard Sproat.
  • Few-Shot Charge Prediction with Discriminative Legal Attributes – Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu and Maosong Sun.
  • Fine-Grained Arabic Dialect Identification – Mohammad Salameh and Houda Bouamor.
  • Generating Reasonable and Diversified Story Ending Using Sequence to Sequence Model with Adversarial Training – Zhongyang Li, Xiao Ding and Ting Liu.
  • Generic refinement of expressive grammar formalisms with an application to discontinuous constituent parsing – Kilian Gebhardt.
  • Genre Identification and the Compositional Effect of Genre in Literature – Joseph Worsham and Jugal Kalita.
  • Gold Standard Annotations for Preposition and Verb Sense with Semantic Role Labels in Adult-Child Interactions – Lori Moon, Christos Christodoulopoulos, Fisher Cynthia, Sandra Franco and Dan Roth.
  • Graph Based Decoding for Event Sequencing and Coreference Resolution – Zhengzhong Liu, Teruko Mitamura and Eduard Hovy.
  • HL-EncDec: A Hybrid-Level Encoder-Decoder for Neural Response Generation – Sixing Wu, Dawei Zhang, Ying Li, Xing Xie and Zhonghai Wu.
  • How Predictable is Your State? Leveraging Lexical and Contextual Information for Predicting Legislative Floor Action at the State Level – Vladimir Eidelman, Anastassia Kornilova and Daniel Argyle.
  • Identifying Emergent Research Trends by Key Authors and Phrases – Shenhao Jiang, Animesh Prasad, Min-Yen Kan and Kazunari Sugiyama.
  • If you’ve seen some, you’ve seen them all: Identifying variants of multiword expressions – Caroline Pasquer, Agata Savary, Carlos Ramisch and Jean-Yves Antoine.
  • Improving Feature Extraction for Pathology Reports with Precise Negation Scope Detection – Olga Zamaraeva, Kristen Howell and Adam Rhine.
  • Improving Named Entity Recognition by Jointly Learning to Disambiguate Morphological Tags – Onur Gungor, Suzan Uskudarli and Tunga Gungor.
  • Incorporating Argument-Level Interactions for Persuasion Comments Evaluation using Co-attention Model – Lu Ji, Zhongyu Wei, Xiangkun Hu, Yang Liu, Qi Zhang and Xuanjing Huang.
  • Incorporating Deep Visual Features into Multiobjective based Multi-view Search Result Clustering – Sayantan Mitra, Mohammed Hasanuzzaman and Sriparna Saha.
  • Incorporating Image Matching Into Knowledge Acquisition for Event-Oriented Relation Recognition – Yu Hong, Yang Xu, Huibin Ruan, Bowei Zou, Jianmin Yao and Guodong Zhou.
  • Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model – Poorya Zaremoodi and Gholamreza Haffari.
  • Incremental Natural Language Processing: Challenges, Strategies, and Evaluation – Arne Köhn.
  • Indigenous language technologies in Canada: Assessment, challenges, and successes – Patrick Littell, Anna Kazantseva, Roland Kuhn, Aidan Pine, Antti Arppe, Christopher Cox and Marie-Odile Junker.
  • Information Aggregation via Dynamic Routing for Sequence Encoding – Jingjing Gong, Xipeng Qiu, Shaojing Wang and Xuanjing Huang.
  • Integrating Tree Structures and Graph Structures with Neural Networks to Classify Discussion Discourse Acts – Yasuhide Miura, Ryuji Kano, Motoki Taniguchi, Tomoki Taniguchi, Shotaro Misawa and Tomoko Ohkuma.
  • Interaction-Aware Topic Model for Microblog Conversations through Network Embedding and User Attention – Ruifang He, Xuefei Zhang, Di Jin, Longbiao Wang, Jianwu Dang and Xiangang Li.
  • Interpretation of Implicit Conditions in Database Search Dialogues – Shunya Fukunaga, Hitoshi Nishikawa, Takenobu Tokunaga, Hikaru Yokono and Tetsuro Takahashi.
  • Investigating the Working of Text Classifiers – Devendra Sachan, Manzil Zaheer and Ruslan Salakhutdinov.
  • iParaphrasing: Extracting Visually Grounded Paraphrases via an Image – Chenhui Chu, Mayu Otani and Yuta Nakashima.
  • ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents – Stefano Mezza, Alessandra Cervone, Evgeny Stepanov, Giuliano Tortoreto and Giuseppe Riccardi.
  • Joint Learning from Labeled and Unlabeled Data for Information Retrieval – Bo Li, Ping Cheng and Le Jia.
  • Joint Neural Entity Disambiguation with Output Space Search – Hamed Shahbazi, Xiaoli Fern, Reza Ghaeini, Chao Ma, Rasha Mohammad Obeidat and Prasad Tadepalli.
  • JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features – Hongru Liang, Haozheng Wang, Jun Wang, Shaodi You, Zhe Sun, Jin-Mao Wei and Zhenglu Yang.
  • Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection – Erik-Lân Do Dinh, Steffen Eger and Iryna Gurevych.
  • LCQMC:A Large-scale Chinese Question Matching Corpus – Xin Liu, Qingcai Chen, Chong Deng, Huajun Zeng, Jing Chen, Dongfang Li and Buzhou Tang.
  • Learning Emotion-enriched Word Representations – Ameeta Agrawal, Aijun An and Manos Papagelis.
  • Learning Multilingual Topics from Incomparable Corpora – Shudong Hao and Michael J. Paul.
  • Learning Semantic Sentence Embeddings using Sequential Pair-wise Discriminator – Badri Narayana Patro, Vinod Kumar Kurmi, Sandeep Kumar and Vinay Namboodiri.
  • Learning to Progressively Recognize New Named Entities with Sequence to Sequence Model – Lingzhen Chen and Alessandro Moschitti.
  • Learning to Search in Long Documents Using Document Structure – Mor Geva and Jonathan Berant.
  • Learning Visually-Grounded Semantics from Contrastive Adversarial Samples – Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang and Jian Sun.
  • Learning What to Share: Leaky Multi-Task Network for Text Classification – Liqiang Xiao, Honglun Zhang, Wenqing Chen, Yongkun Wang and Yaohui Jin.
  • Learning with Noise-Contrastive Estimation: Easing training by learning to scale – Matthieu Labeau and Alexandre Allauzen.
  • Leveraging Meta-Embeddings for Bilingual Lexicon Extraction from Specialized Comparable Corpora – Amir Hazem and Emmanuel Morin.
  • Lexi: A tool for adaptive, personalized text simplification – Joachim Bingel, Gustavo Paetzold and Anders Søgaard.
  • Local String Transduction as Sequence Labeling – Joana Ribeiro, Shashi Narayan, Shay B. Cohen and Xavier Carreras.
  • Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models – Hussein Al-Olimat, Krishnaprasad Thirunarayan, Valerie Shalin and Amit Sheth.
  • MCDTB: A Macro-level Chinese Discourse TreeBank – Feng Jiang, Sheng Xu, Xiaomin Chu, Peifeng Li, Qiaoming Zhu and Guodong Zhou.
  • MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation – Meng Zou, Xihan Li, Haokun Liu and Zhihong Deng.
  • Modeling Multi-turn Conversation with Deep Utterance Aggregation – Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu and Hai Zhao.
  • Modeling the Readability of German Targeting Adults and Children: An empirically broad analysis and its cross-corpus validation – Zarah Weiß and Detmar Meurers.
  • Multi-layer Representation Fusion for Neural Machine Translation – Qiang Wang, Fuxue Li, Tong Xiao, Yanyang Li, Yinqiao Li and Jingbo Zhu.
  • Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension – Liang Wang, Sujian Li, Wei Zhao, Kewei Shen, Meng Sun, Ruoyu Jia and Jingming Liu.
  • Multi-Source Multi-Class Fake News Detection – Hamid Karimi, Proteek Roy, Sari Saba-Sadiya and Jiliang Tang.
  • Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling – Ryo Masumura, Tomohiro Tanaka, Ryuichiro Higashinaka, Hirokazu Masataki and Yushi Aono.
  • Multi-task dialog act and sentiment recognition on Mastodon – Christophe Cerisara, Somayeh Jafaritazehjani, Adedayo Oluokun and Hoa T. Le.
  • Multi-Task Learning for Sequence Tagging: An Empirical Study – Soravit Changpinyo, Hexiang Hu and Fei Sha.
  • Multi-Task Neural Models for Translating Between Styles Within and Across Languages – Xing Niu, Sudha Rao and Marine Carpuat.
  • Narrative Schema Stability in News Text – Dan Simonson and Anthony Davis.
  • Natural Language Interface for Databases Using a Dual-Encoder Model – Ionel Alexandru Hosu, Radu Cristian Alexandru Iacob, Florin Brad, Stefan Ruseti and Traian Rebedea.
  • Neural Machine Translation Incorporating Named Entity – Arata Ugawa, Akihiro Tamura, Takashi Ninomiya, Hiroya Takamura and Manabu Okumura.
  • Neural Math Word Problem Solver with Reinforcement Learning – Danqing Huang, Jing Liu, Chin-Yew Lin and Jian Yin.
  • NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager – Idris Yusupov and Yurii Kuratov.
  • One vs. Many QA Matching with both Word-level and Sentence-level Attention Network – Lu Wang, Shoushan Li, Changlong Sun, Luo Si, Xiaozhong Liu, Min Zhang and Guodong Zhou.
  • Open-Domain Event Detection using Distant Supervision – Jun Araki and Teruko Mitamura.
  • Par4Sim — Adaptive Paraphrasing for Text Simplification – Seid Muhie Yimam and Chris Biemann.
  • Parallel Corpora for bi-lingual English-Ethiopian Languages Statistical Machine Translation – Michael Melese, Solomon Teferra Abate, Martha Yifiru Tachbelie, Million Meshesha, Wondwossen Mulugeta, Yaregal Assibie, Solomon Atinafu, Binyam Ephrem, Tewodros Abebe, Hafte Abera, Amanuel Lemma, Tsegaye Andargie, Seifedin Shifaw and Wondimagegnhue Tsegaye.
  • Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource – Antonios Anastasopoulos, Marika Lekakou, Josep Quer, Eleni Zimianiti, Justin DeBenedetto and David Chiang.
  • Personalizing Lexical Simplification – John Lee and Chak Yan Yeung.
  • Pluralizing Nouns across Agglutinating Bantu Languages – Joan Byamugisha, C. Maria Keet and Brian DeRenzi.
  • Point Precisely: Towards Ensuring the Precision of Data in Generated Texts Using Delayed Copy Mechanism – Liunian Li and Xiaojun Wan.
  • Projecting Embeddings for Domain Adaption: Joint Modeling of Sentiment Analysis in Diverse Domains – Jeremy Barnes, Roman Klinger and Sabine Schulte im Walde.
  • Reading Comprehension with Graph-based Temporal-Casual Reasoning – Yawei Sun, Gong Cheng and Yuzhong Qu.
  • Real-time Change Point Detection using On-line Topic Models – Yunli Wang and Cyril Goutte.
  • Refining Source Representations with Relation Networks for Neural Machine Translation – Wen Zhang, hu jiawei, Yang Feng and Qun Liu.
  • Representation Learning of Entities and Documents from Knowledge Base Descriptions – Ikuya Yamada, Hiroyuki Shindo and Yoshiyasu Takefuji.
  • Reproducing and Regularizing the SCRN Model – Olzhas Kabdolov, Zhenisbek Assylbekov and Rustem Takhanov.
  • Responding E-commerce Product Questions via Exploiting QA Collections and Reviews – Qian Yu, Wai Lam and Zihao Wang.
  • ReSyf: a French lexicon with ranked synonyms – Mokhtar Boumedyen BILLAMI, Thomas François and Nuria Gala.
  • Retrofitting Distributional Embeddings to Knowledge Graphs with Functional Relations – Ben Lengerich, Andrew Maas and Christopher Potts.
  • Revisiting the Hierarchical Multiscale LSTM – Ákos Kádár, Marc-Alexandre Côté, Grzegorz Chrupała and Afra Alishahi.
  • Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging – Andrew Matteson, Chanhee Lee, Youngbum Kim and Heuiseok Lim.
  • Robust Lexical Features for Improved Neural Network Named-Entity Recognition – Abbas Ghaddar and Phillippe Langlais.
  • RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian – Anna Rogers, Alexey Romanov, Anna Rumshisky, Svitlana Volkova, Mikhail Gronas and Alex Gribov.
  • Scoring and Classifying Implicit Positive Interpretations: A Challenge of Class Imbalance – Chantal van Son, Roser Morante, Lora Aroyo and Piek Vossen.
  • Semantic Parsing for Technical Support Questions – Abhirut Gupta, Anupama Ray, Gargi Dasgupta, Gautam Singh, Pooja Aggarwal and Prateeti Mohapatra.
  • Sensitivity to Input Order: Evaluation of an Incremental and Memory-Limited Bayesian Cross-Situational Word Learning Model – Sepideh Sadeghi and Matthias Scheutz.
  • Sentence Weighting for Neural Machine Translation Domain Adaptation – Shiqi Zhang and Deyi Xiong.
  • Seq2seq Dependency Parsing – Zuchao Li, Jiaxun Cai, Shexia He and Hai Zhao.
  • Sequence-to-Sequence Learning for Task-oriented Dialogue with Dialogue State Representation – Haoyang Wen, Yijia Liu, Wanxiang Che, Libo Qin and Ting Liu.
  • SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors – Luis Espinosa Anke and Steven Schockaert.
  • Simple Neologism Based Domain Independent Models to Predict Year of Authorship – Vivek Kulkarni, Yingtao Tian, Parth Dandiwala and Steve Skiena.
  • Sliced Recurrent Neural Networks – Zeping Yu and Gongshen Liu.
  • SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions – Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney and Nazli Goharian.
  • Source Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language – He Bai, Yu Zhou, Jiajun Zhang, Liang Zhao, Mei-Yuh Hwang and Chengqing Zong.
  • Stance Detection with Hierarchical Attention Network – Qingying Sun, Zhongqing Wang, Qiaoming Zhu and Guodong Zhou.
  • Structured Representation Learning for Online Debate Stance Prediction – Chang Li, Aldo Porco and Dan Goldwasser.
  • Style Detection for Free Verse Poetry from Text and Speech – Timo Baumann, Hussein Hussein and Burkhard Meyer-Sickendiek.
  • Style Obfuscation by Invariance – Chris Emmery, Enrique Manjavacas Arevalo and Grzegorz Chrupała.
  • Summarization Evaluation in the Absence of Human Model Summaries Using the Compositionality of Word Embeddings – Elaheh ShafieiBavani, Mohammad Ebrahimi, Raymond Wong and Fang Chen.
  • Synonymy in Bilingual Context: The CzEngClass Lexicon – Zdenka Uresova, Eva Fucikova, Eva Hajicova and Jan Hajic.
  • Tailoring Neural Architectures for Translating from Morphologically Rich Languages – Peyman Passban, Andy Way and Qun Liu.
  • Task-oriented Word Embedding for Text Classification – Qian Liu, Heyan Huang, Yang Gao, Xiaochi Wei, Yuxin Tian and Luyang Liu.
  • The APVA-TURBO Approach To Question Answering in Knowledge Base – Yue Wang, Richong Zhang, Cheng Xu and Yongyi Mao.
  • Toward Better Loanword Identification in Uyghur Using Cross-lingual Word Embeddings – Chenggang Mi, Yating Yang, Lei Wang, Xi Zhou and Tonghai Jiang.
  • Towards a Language for Natural Language Treebank Transductions – Carlos A. Prolo.
  • Towards an argumentative content search engine using weak supervision – Ran Levy, Ben Bogin, Shai Gretz, Ranit Aharonov and Noam Slonim.
  • Transfer Learning for a Letter-Ngrams to Word Decoder in the Context of Historical Handwriting Recognition with Scarce Resources – Adeline Granet, Emmanuel Morin, Harold Mouchère, Solen Quiniou and Christian Viard-Gaudin.
  • Transfer Learning for Entity Recognition of Novel Classes – Juan Diego Rodriguez, Adam Caldwell and Alexander Liu.
  • Twitter corpus of Resource-Scarce Languages for Sentiment Analysis and Multilingual Emoji Prediction – Nurendra Choudhary, Rajat Singh, Vijjini Anvesh Rao and Manish Shrivastava.
  • Urdu Word Segmentation using Conditional Random Fields (CRFs) – Haris Bin Zia, Agha Ali Raza and Awais Athar.
  • User-Level Race and Ethnicity Predictors from Twitter Text – Daniel Preoţiuc-Pietro and Lyle Ungar.
  • Using Formulaic Expressions in Writing Assistance Systems – Kenichi Iwatsuki and Akiko Aizawa.
  • Using Word Embeddings for Unsupervised Acronym Disambiguation – Jean Charbonnier and Christian Wartena.
  • Visual Question Answering Dataset for Bilingual Image Understanding: A Study of Cross-Lingual Transfer Using Attention Maps – Nobuyuki Shimizu, Na Rong and Takashi Miyazaki.
  • Vocabulary Tailored Summary Generation – Kundan Krishna, Aniket Murhekar, Saumitra Sharma and Balaji Vasan Srinivasan.
  • What’s in Your Embedding, And How It Predicts Task Performance – Anna Rogers, Shashwath Hosur Ananthakrishna and Anna Rumshisky.
  • Who Feels What and Why? Annotation of a Literature Corpus with Semantic Roles of Emotions – Evgeny Kim and Roman Klinger.
  • Why does PairDiff work? – A Mathematical Analysis of Bilinear Relational Compositional Operators for Analogy Detection – Huda Hakami, Kohei Hayashi and Danushka Bollegala.
  • WikiRef: Wikilinks as a route to recommending appropriate references for scientific Wikipedia pages – Abhik Jana, Pranjal Kanojiya, Pawan Goyal and Animesh Mukherjee.
  • Word Sense Disambiguation Based on Word Similarity Calculation Using Word Vector Representation from a Knowledge-based Graph – Dongsuk O, Sunjae Kwon, Kyungsun Kim and Youngjoong Ko.

COLING 2018 Best papers

There are multiple categories of award at COLING 2018, as we laid out in an earlier blog post. We received 44 nominations for best papers over ten categories, and conferred best paper awards in the categories as follows:

  • Best error analysis: SGM: Sequence Generation Model for Multi-label Classification, by Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu and Houfeng Wang.
  • Best evaluation: SGM: Sequence Generation Model for Multi-label Classification, by Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu and Houfeng Wang.
  • Best linguistic analysis: Distinguishing affixoid formations from compounds, by Josef Ruppenhofer, Michael Wiegand, Rebecca Wilm and Katja Markert
  • Best NLP engineering experiment: Authorless Topic Models: Biasing Models Away from Known Structure, by Laure Thompson and David Mimno
  • Best position paper: Arguments and Adjuncts in Universal Dependencies, by Adam Przepiórkowski and Agnieszka Patejuk
  • Best reproduction paper: Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering, by Wuwei Lan and Wei Xu
  • Best resource paper: AnlamVer: Semantic Model Evaluation Dataset for Turkish – Word Similarity and Relatedness, by Gökhan Ercan and Olcay Taner Yıldız
  • Best survey paper: A Survey on Open Information Extraction, by Christina Niklaus, Matthias Cetto, André Freitas and Siegfried Handschuh
  • Most reproducible: Design Challenges and Misconceptions in Neural Sequence Labeling, by Jie Yang, Shuailong Liang and Yue Zhang

Note that, as announced last year, for open science & reproducibility COLING 2018 did not confer best paper awards to paper that could not make the code/resources publicly available by camera ready time. This means you can ask the best paper authors for associated data and programs right now, and they should be able to provide you with a link.

In addition, we would like to note the following papers as “Area Chair Favorites”, which were nominated by reviewers and recognised as excellent by chairs.

  • Visual Question Answering Dataset for Bilingual Image Understanding: A study of cross-lingual transfer using attention maps. Nobuyuki Shimizu, Na Rong and Takashi Miyazaki
  • Using J-K-fold Cross Validation To Reduce Variance When Tuning NLP Models. Henry Moss, David Leslie and Paul Rayson
  • Measuring the Diversity of Automatic Image Descriptions. Emiel van Miltenburg, Desmond Elliott and Piek Vossen
  • Reading Comprehension with Graph-based Temporal-Causal Reasoning. Yawei Sun, Gong Cheng and Yuzhong Qu
  • Diachronic word embeddings and semantic shifts: a survey. Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski and Erik Velldal
  • Transfer Learning for Entity Recognition of Novel Classes. Juan Diego Rodriguez, Adam Caldwell and Alexander Liu
  • Joint Modeling of Structure Identification and Nuclearity Recognition in Macro Chinese Discourse Treebank. Xiaomin Chu, Feng Jiang, Yi Zhou, Guodong Zhou and Qiaoming Zhu
  • Unsupervised Morphology Learning with Statistical Paradigms. Hongzhi Xu, Mitchell Marcus, Charles Yang and Lyle Ungar
  • Challenges of language technologies for the Americas indigenous languages. Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra and Ivan Meza-Ruiz
  • A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis. Yicheng Zou, Tao Gui, Qi Zhang and Xuanjing Huang
  • From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources. Ilia Kuznetsov and Iryna Gurevych
  • The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities. Marco Del Tredici and Raquel Fernández
  • Relation Induction in Word Embeddings Revisited. Zied Bouraoui, Shoaib Jameel and Steven Schockaert
  • Learning with Noise-Contrastive Estimation: Easing training by learning to scale. Matthieu Labeau and Alexandre Allauzen
  • Stress Test Evaluation for Natural Language Inference. Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose and Graham Neubig
  • Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs. Wenpeng Yin, Yadollah Yaghoobzadeh and Hinrich Schütze
  • SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions. Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney and Nazli Goharian
  • Automatically Extracting Qualia Relations for the Rich Event Ontology. Ghazaleh Kazeminejad, Claire Bonial, Susan Windisch Brown and Martha Palmer
  • What represents “style” in authorship attribution?. Kalaivani Sundararajan and Damon Woodard
  • Semantic Vector Networks. Luis Espinosa Anke and Steven Schockaert
  • GenSense: A Generalized Sense Retrofitting Model. Yang-Yin Lee, Ting-Yu Yen, Hen-Hsen Huang, Yow-Ting Shiue and Hsin-Hsi Chen
  • A Multi-Attention based Neural Network with External Knowledge for Story Ending Predicting Task. Qian Li, Ziwei Li, Jin-Mao Wei, Yanhui Gu, Adam Jatowt and Zhenglu Yang
  • Abstract Meaning Representation for Multi-Document Summarization. Kexin Liao, Logan Lebanoff and Fei Liu
  • Cooperative Denoising for Distantly Supervised Relation Extraction. Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan and Ying Shen
  • Dialogue Act Driven Conversation Model: An Experimental Study. Harshit Kumar, Arvind Agarwal and Sachindra Joshi
  • Dynamic Multi-Level, Multi-Task Learning for Sentence Simplification. Han Guo, Ramakanth Pasunuru and Mohit Bansal
  • A Knowledge-Augmented Neural Network Model for Implicit Discourse Relation Classification. Yudai Kishimoto, Yugo Murawaki and Sadao Kurohashi
  • Abstractive Multi-Document Summarization using Paraphrastic Sentence Fusion. Mir Tafseer Nayeem, Tanvir Ahmed Fuad and Yllias Chali
  • They Exist! Introducing Plural Mentions to Coreference Resolution and Entity Linking. Ethan Zhou and Jinho D. Choi
  • A Comparison of Transformer and Recurrent Neural Networks on Multilingual NMT. Surafel Melaku Lakew, Mauro Cettolo and Marcello Federico
  • Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media. Isabel Cachola, Eric Holgate, Daniel Preoţiuc-Pietro and Junyi Jessy Li
  • On Adversarial Examples for Character-Level Neural Machine Translation. Javid Ebrahimi, Daniel Lowd and Dejing Dou
  • Neural Transition-based String Transduction for Limited-Resource Setting in Morphology. Peter Makarov and Simon Clematide
  • Structured Dialogue Policy with Graph Neural Networks. Lu Chen, Bowen Tan, Sishan Long and Kai Yu

We would like to recognise with exceptional thanks our best paper committee.

Acceptance rate

As we noted in a previous post, the acceptance rate is an important metric of competitiveness for authors with accepted papers.

… for individual researchers, especially those employed in or hoping to be employed in academia, acceptance of papers to COLING and similar venues is very important for job prospects/promotion/etc. Furthermore, it isn’t simply a matter of publishing in peer-reviewed venues, but in high-prestige, competitive venues. Where the validation view of peer review would view it as binary question (does this paper make a validatable contribution or not?), the prestige view instead speaks to ranking—where we end up with best papers, strong papers, borderline papers that get in, borderline papers that don’t get in, and papers that were easy to decide to reject. (And, for full disclosure, it is in the interest of a conference to strive to become and maintain status as a high-prestige, competitive venue.)

Not surprisingly, we’ve received several requests for the acceptance rate for COLING 2018. It turns out that determining that number is not straightforward. We initially had 1017 submissions, but some of those (129) were withdrawn, either early in the process (the authors never in fact completed the paper) or later, usually in light of acceptance at another venue, per the COLING 2018 dual submission policy. The denominator for our acceptance rate excludes these papers as it hardly seems fair to include papers that either weren’t reviewed, or were withdrawn because they were accepted elsewhere. Conversely, we decided to include the papers desk rejected (n=33) in the denominator.

With a total of 332 papers accepted for publication, that gives an acceptance rate of 37.4%.

PC chairs report back: On the effectiveness of author response

The utility of the author response part of the conference review process is hotly debated. At COLING 2018, we decided to have the author response be addressed only to the area chairs (and PC co-chairs), and not the reviewers. The purpose of this blog post is to report back on our experience with this model (largely positive, from the PC perspective!) and also to share with the community what we have learned, inhabiting this role, about what makes an effective author response.

For background, here is a description of the decision making process at the PC level. Keep in mind that COLING 2018 received 1017 submissions, of which 880 were still ‘active’ at the point of these decisions.  (The difference is a combination of desk rejects and papers withdrawn, the latter mostly in light of acceptance to other venues with earlier notifications.)

Outline of our process

Final accept/reject decisions for COLING 2018 were made as follows:

We asked the ACs for each area to provide a ranking of the papers in their area and to indicate recommendations of accept, maybe accept, maybe reject, or reject. We specifically instructed the ACs to not use the reviewer scores to sort the papers, but rather to come to their own ranking based on their judgment, given the reviews, discussion among reviews, author responses, and (where necessary) reading the papers.

Our role as PCs was to turn those recommendations into decisions. To do so, we first looked at each area’s report and determined which papers had clear recommendations and which were borderline.  For the former, we went with the AC recommendations directly. The borderline cases were either papers that the ACs marked as ‘maybe accept’ or ‘maybe reject’, or, for areas that only used ‘accept’ and ‘reject’, the last two ‘accept’ papers and the first two ‘reject’ papers in the ACs’ ranking. This gave us a bit over 200 papers to consider.

We divided the areas into two sets, one for each of us. (We were careful at this point to put the areas containing papers with which one of us had COIs into the other PC’s stack.) Area by area, we looked at the borderline papers, considering the reviews, the reviewer discussion (if any), the author response, comments from the ACs, and sometimes the papers (to clarify particular points; we didn’t read the papers in full). Although the PC role on START allows us to see the authors of all submissions, we worked out ways to look at all the information we needed to do this without seeing the author names (or institutions, etc).

Of the 200 or so papers we looked at, there were 23 for which we wanted to have further discussion. This was done over Skype, despite the 9 hour time difference! These papers were evenly distributed between Emily’s and Leon’s areas, but clustered towards the start of each of our respective stacks; our analysis is that as we worked our way through the process, we each gained a better sense of how to make the decisions and found less uncertainty. (Discussion of COI papers was done with the General Chair, Pierre Isabelle, not the other PC, per our COI policy.)

As a final step to verify data entry (to make sure what is entered in START actually matches our intentions), we went through and looked at both the accepted papers with the lowest reviewer scores and the rejected papers with the highest reviewer scores. 98 papers with an average score 3 or higher were rejected. 27 papers with an average score lower than 3 were accepted. (Remember, it’s not just about the numbers!) For each of these, we went back to our notes to check that the right information was entered (it was) and in so doing, we found that, for the majority of the papers which were accepted despite low reviewer scores (and correspondingly harsh reviews), our notes reflected effective author responses. This furthermore is consistent with our subjective sense that the author responses really did make a difference in the case of difficult decisions, that is, the papers we were looking at.

What makes an effective author response?

The effective author responses all had certain characteristics in common. They were written in a tone that was respectful, calm and confident (but not arrogant). They had specific answers to reviewers’ specific questions or specific replies to reviewers’ criticisms. For example, if a reviewer pointed out that a paper failed to discuss important related work, an effective author response would either acknowledge the omission and indicate that it will be addressed in the final version, or clearly state why the indicated paper isn’t in fact relevant. Effective author responses to reviewer questions about points that aren’t clear were short and to the point (and specific). This gave us confidence that the answers would be incorporated in the final version. In many cases, authors related the results of experiments they hadn’t had space for, or ran the analyses during the response period; this is much more effective than an ephemeral promise to add the content. Author responses could also be effective in indicating that reviewers misunderstood key points of the paper or the background into which it fits, but only if they were written in the calm, confident tone mentioned above.

Many effective author responses also expressed gratitude for the reviewers’ feedback. This was nice to see, but it wasn’t a problem when it wasn’t there.

What makes an ineffective author response?

In effective author responses, on the other hand seemed to be written in a place of anger. We understand where authors are coming from when this happens! Reviews, especially negative reviews, can sting. But an author response that comes across as angry, condescending, or combative is not effective at persuading the ACs & PCs that the reviewers have things the wrong way around, nor does it provide good evidence that the paper will be improved for the camera ready version.

Best practices for writing author responses

Here we try to distill our experience of reading the author responses for ~200 papers (not all papers had them, but most did) into some helpful tips.

For conference organizers

We definitely recommend setting up an author response process, but having the author responses go to the ACs (and PCs) only, not the reviewers.  Two ways to improve on what we did:

  • Clarify the word count constraints better than we did. We asked for no more than 400 words total, but the way START enforced that was no more than 400 words per review (since there were separate author response boxes for each review).
  • Don’t make the mistake we made of sending authors who wanted to do a late author response to their ACs … in the very small number of cases where that happened, it compromised anonymity of authors to ACs.

For authors

  • Read the reviews and write the angry version. Then set it aside and write a calmer one.
  • If you can, show your author response to someone who will read it for you and let you know where it sounds angry/arrogant/petty.
  • Try starting with “Thank you for the helpful feedback”—this isn’t necessary, and you can edit it out afterwards for space, but it might help you get off on the right foot regarding tone.
  • Don’t play the reviewers off each other (“R1 says this paper is hard to read, but that’s clearly wrong, because R2 said it was easy to follow.”) Rest assured that the ACs will read all of the reviews; they’ll have seen R2’s comments too.
  • Similarly, don’t feel obliged to reply to everything in the reviews. General negative comments (e.g. “I found this paper hard to read”) don’t require a response and there probably isn’t a response that would be helpful. Either the paper really is unclear or the reviewer doesn’t have sufficient background / didn’t leave enough time to read the paper carefully. Which scenario this is will likely be evident from the rest of the reviews and the author response.
  • Don’t promise the moon and the stars in the final version. It’s hard to accept a borderline paper based on promises alone.
  • Do indicate specific answers to key questions, in a way that is obviously easily incorporated in the final version. (And in that case it’s fine to say “We will add clarification along these lines”, or similar.)
  • Do concisely demonstrate mastery of the area, if reviewers probe issues you have considered during your research and you have the answers to hand.
  • Don’t play games with the word count. We saw two author responses where the authors got around the software’s restriction to 400 words (per box!) by_joining_whole_sentences_with_underscores. This does not make a good impression.

Ultimately, even a calm and confident author response doesn’t necessarily push a paper on the borderline over into accept. Sometimes the paper just isn’t ready and it’s not reasonable to try to fix what needs fixing or add what needs adding for the final version. Nonetheless, we found that the above patterns do make author responses more effective, and so we wanted to share them.



COLING schedule construction: Next steps

We are proud to have sent out the acceptance notifications for COLING 2018 ahead of schedule! But, our work as chairs is not done. Here are our next steps:

Progam construction

We have prepared a schedule “frame”, with plenary sessions (opening, keynotes, best papers, closing), parallel sessions (talks and posters), all fit in around coffee breaks, lunch and the excursions. Our task now is to group the accepted papers into coherent talk and poster sessions. In doing so, we will consider:

  • Author preferences (as indicated in START)
  • Area chair recommendations
  • Thematic coherence of sessions
  • Suitability of each topic for each format

Our goal is to have the program constructed by June 13. That timing is partially dependent on the best paper award process, outlined below.

Planning ahead

In an event of this size, it is inevitable that some number of presenters may be unable to attend at the last minute. In that case, we hope that speakers will be able to arrange to present remotely (per the inclusion policy).  If that is not possible, and an oral presentation is being pulled, we will seek to replace it with the most thematically similar poster available.

Best paper awards

We have 10 award categories, the 9 listed in our previous post on this topic, plus ‘Best error analysis’, which we really should have thought of initially! We have 11 scholars who have agreed to be on this committee. And we have 41 papers which have been nominated, each for one of the specific awards.

We will shortly be creating subcommittees of the best paper committee to consider each award. Each award will be considered by two committee members and most committee members will be working on two award types. The exception is the “Best NLP engineering experiment” award, as that award type has the most nominations (being the most common paper type among our submissions). The committee members working on that type will focus only on it. We are open to the possibility that some awards may go unallocated (if this is warranted) and also that a paper may end up with a different award than the one it was nominated for.


May 17: Nominated papers to best paper committee
June 1: Each subcommittee reports to the whole BPC with their nomination and a handful of alternates; the BPC then discusses results
June 8: The committee confirms up to ten best paper awards for nomination to the PC co-chairs
June 13: Best papers confirmed, and authors notified


In order to preserve anonymity in the best paper award selection process, we will not post the list of accepted papers until the selection is done. Individual authors are of course free at this point to post their own information, but we trust our best paper committee won’t go hunting for it.


As mentioned in our requirements post, only papers that have made the resources/code publicly available by camera ready time will be considered for best paper awards; those that rely on code or data, but haven’t made it available, will be taken out of the running.

Best paper committee

Our responsive, expert committee members are:

Publication preparation

Going from drafts to papers in proceedings is a massive undertaking—for you and for us. Our hard-working publication chairs, Xiaodan Zhu and Zhiyuan Liu, are directing and supporting the process of getting hundreds of main-conference papers (and later, more hundreds of workshop papers) into a form where they can be easily and freely downloaded by anyone. This collection of published papers is a huge part of the output of COLING. Creating them involves getting the proceedings to compile properly, which as you may have experience of, is tough enough for one single paper—let alone 300+ in one volume. So please, support them in this critical, painstaking work by getting your paper as tight and well-formatted as possible.

A window into the decision process

We are aware that the decision process for a large conference like COLING can be quite opaque from the point of view of authors, especially those who have not served in the role of AC or PC in the past. In this post, we aim to demystify a bit what we are doing (and why it takes so long from submission to decision!). As always, our belief is that more transparency leads to a better process—as we are committed to doing what we lay out, and what we lay out should be justified in this writing—and to a better understanding of the outcomes.


Many of our authors are probably aware that reviews were due on April 10, and reviews are seen as the primary determinant of acceptance, so you might well wonder why you won’t be hearing about acceptance decisions until May 17. What could possibly take so long?

We (the PC co-chairs) met in Seattle last July to lay out a detailed timeline, making sure to build in time for careful decision making and also to allow for buffers to handle the near-certainty that some things would go wrong.  The portion between April 10 and May 17 looks like this:

April 10 Reviews due
April 11 ACs request reviewer discussion, chase missing reviews
April 15 Reviewer discussion ends
April 16 ACs request fixes to problematic reviews (too short, inappropriate tone)
April 19 Deadline for reviews to be updated based on AC feedback
April 20 Reviews available to authors; author response begins
April 25 Author response ends
April 26 AC discussion starts
May 3 Reviewer identities revealed to co-reviewers
May 4 AC recommendations due to PC co-chairs
May 16 Signatures revealed for signed reviews
May 17 Acceptance notifications

As you can see, the time between the initial deadline for reviews and the final acceptance notification is largely dedicated to two things: making sure all reviews are present and appropriate, and leaving time for thoughtful consideration by both ACs and PC co-chairs in the decision making process.

Of course, not everything goes according to plan. As of April 25, we still have a handful of missing or incomplete reviews. In many of these cases, ACs (including our Special Cirumstances ACs) are stepping in to provide the missing reviews. That this can be done blind is another benefit of keeping author identity from the ACs! (It’s not quite double blind, as authors can probably work out who the ACs are for their track, but that direction is less critical in this case.)

How did we end up with missing reviews? In some cases, this was not the fault of the reviewers at all. There were a handful of cases where START had the wrong email addresses for committee members, and we only discovered this when the ACs emailed the committee members from outside START—only to discover they hadn’t received their assignments! In other cases, committee members agreed to review and submitted bids and then didn’t turn in their reviews. While we absolutely understand that things come up, in the case that someone can’t complete their reviewing assignment, the best course of action in terms of minimizing impact on others (authors, other reviewers asked to step in, and the ACs/PCs managing the process) is just to communicate this fact as soon as possible.

Instructions to ACs

In our very first post to this PC blog we laid out our goals for the COLING 2018 program:

Our goals for COLING 2018 are (1) to create a program of high quality papers which represent diverse approaches to and applications of computational linguistics written and presented by researchers from throughout our international community; (2) to facilitate thoughtful reviewing which is both informative to ACs (and to us as PC co-chairs) and helpful to authors; and (3) to ensure that the results published at COLING 2018 are as reproducible as possible.

The process by which reviews are turned into acceptance decisions is a key part of the first of those goals (but not the only part—recruiting a strong, diverse pool of submissions was a key first step, as well as the design of the review process). Accordingly, these are the directions we have given to ACs, as they consider each paper in their area:

Please,​ ​please​ ​do​ ​​not​ ​​simply​ ​rank​ ​papers​ ​by​ ​overall​ ​score.​ ​Three​ ​reviewers​ ​is​ ​just​ ​not enough​ ​to​ ​get​ ​a​ ​reliable​ ​estimate​ ​of​ ​a​ ​paper’s​ ​quality.​ ​Maybe​ ​one​ ​reviewer​ ​didn’t​ ​read​ ​the paper,​ ​another​ ​one​ ​didn’t​ ​understand​ ​it​ ​and​ ​reacted​ ​poorly,​ ​and​ ​a​ ​final​ ​reviewer​ ​always​ ​gives negative​ ​scores;​ ​maybe​ ​one​ ​reviewer​ ​as​ ​warped​ ​priorities​ ​and​ ​another​ ​doesn’t​ ​know​ ​the area​ ​as​ ​well.​ ​There’s​ ​too​ ​much​ ​individual​ ​variance​ ​for​ ​a​ ​tiny​ ​number​ ​of​ ​reviewers​ ​(i.e.​ ​3)​ ​to precisely​ ​judge​ ​a​ ​paper.


In​ ​fact,​ ​don’t​ ​even​ ​sort​ ​papers​ ​like​ ​this​ ​to​ ​start​ ​out​ ​with;​ ​glancing​ ​at​ ​that​ ​list​ ​will​ ​unconsciously bias​ ​perception​ ​of​ ​the​ ​papers​ ​and​ ​that’ll​ ​mean​ ​poor​ ​decisions.​ ​Save​ ​yourself​ ​-​ ​don’t​ ​let knowledge​ ​of​ ​that​ ​ranking​ ​make​ ​a​ ​nuanced​ ​review​ ​go​ ​unread.


However,​ ​as​ ​an​ ​area​ ​chair,​ ​you​ ​know​ ​your​ ​area​ ​well,​ ​and​ ​have​ ​good​ ​ideas​ ​of​ ​the​ ​technical merits​ ​of​ ​individual​ ​works​ ​in​ ​that​ ​area.​ ​You​ ​should​ ​be​ ​understand​ ​the​ ​technical​ ​content​ ​when needed​ ​and​ ​be​ ​able​ ​to​ ​judge​ ​the​ ​reviews’​ ​quality​ ​for​ ​yourself.​ ​Once​ ​the​ ​scores​ ​are​ ​in,​ ​you’ll also​ ​have​ ​a​ ​good​ ​idea​ ​of​ ​which​ ​reviewers​ ​generally​ ​grade​ ​low​ ​(or​ ​high).


Try​ ​to​ ​order​ ​the​ ​papers​ ​in​ ​such​ ​a​ ​way​ ​that​ ​the​ ​ones​ ​you​ ​like​ ​most​ ​at​ ​the​ ​top,​ ​the​ ​ones​ ​that shouldn’t​ ​appear​ ​are​ ​at​ ​the​ ​bottom,​ ​and​ ​each​ ​paper​ ​is​ ​more​ ​preferable​ ​than​ ​the​ ​one​ ​below. You​ ​can​ ​split​ ​this​ ​work​ ​with​ ​your​ ​co-AC​ ​as​ ​you​ ​prefer;​ ​some​ ​will​ ​take​ ​half​ ​the​ ​papers​ ​and then​ ​merge,​ ​but​ ​if​ ​you​ ​do​ ​this,​ ​it’s​ ​important​ ​to​ ​realise​ ​that​ ​the​ ​split​ ​won’t​ ​be​ ​perfect​ ​-​ ​you won’t​ ​be​ ​able​ ​to​ ​interleave​ ​the​ ​resulting​ ​ranking​ ​one-by-one.​ ​In​ ​any​ ​event,​ ​both​ ​you​ ​and​ ​your co-AC​ ​must​ ​explicitly​ ​agree​ ​on​ ​the​ ​final​ ​ranking.


Use​ ​the​ ​reviews​ ​and​ ​author​ ​feedback​ ​as​ ​the​ ​evidence​ ​for​ ​the​ ​ranking,​ ​and​ ​be​ ​sure​ ​and confident​ ​about​ ​every​ ​decision.​ ​If​ ​you’re​ ​not​ ​yet​ ​confident,​ ​there​ ​are​ ​a​ ​few​ ​options.​ ​Ask​ ​the reviewers​ ​to​ ​clarify,​ ​or​ ​to​ ​examine​ ​a​ ​point;​ ​ask​ ​your​ ​co-AC​ ​for​ ​their​ ​opinion;​ ​find​ ​another reviewer​ ​for​ ​an​ ​extra​ ​opinion,​ ​if​ ​this​ ​can​ ​be​ ​done​ ​quickly;​ ​or​ ​ask​ ​us​ ​to​ ​send​ ​over​ ​resources.


Once​ ​you​ ​have​ ​an​ ​ordering,​ ​think​ ​about​ ​which​ ​of​ ​that​ ​set​ ​you’d​ ​recommend​ ​for​ ​acceptance, and​ ​send​ ​us​ ​the​ ​rankings​ ​along​ ​with​ ​your​ ​recommendations.​ ​​​You​ ​should​ ​also​ ​build​ ​a​ ​short report​ ​on​ ​your​ ​area​ ​-​ ​the​ ​process​ ​and​ ​the​ ​trends​ ​you​ ​saw​ ​there.​ ​Between​ ​you​ ​and​ ​your co-chair,​ ​this​ ​should​ ​be​ ​around​ ​100-500​ ​words.

As you can see, we are emphasizing holistic understanding of the merits of each paper, and de-emphasizing the numerical scores. Which brings up the obvious question: Why not rely on the scores?

It’s not just about the scores

Scoring is far too unreliable to be used as acceptance recommendation. We have only three reviewers, each biased in their own way. You won’t get good statistics with a population of 3, and we don’t expect to. This isn’t the reviewer’s fault; it’s just plain statistics. Rather, each review has to be considered on its own—in terms of overall bias, expertise, and how well a paper was understood by them.

So, in the words of Jason Eisner, from his fantastic “How to Serve as Program Chair of a Conference” guide:

How not to do it: Please, please, please don’t just sort the papers by the 3 reviewers’ average overall recommendation! There is too much variance in these scores for n=3 to be a large enough sample. Maybe reviewer #1 tends to give high scores to everyone, reviewer #2 has warped priorities, and reviewer #3 barely read the paper or barely knows the area. Whereas another paper drew a different set of 3 reviewers.

How still not to do it: Even as a first step, don’t sort the papers by average recommendation. Trust me — this noisy and uncalibrated ranking isn’t even a good way to triage the papers into likely accepts, likely rejects, and borderline papers that deserve a closer look. Don’t risk letting it subtly influence the final decisions, or letting it doom some actual, nuanced reviews to go unread.

What I told myself: When you’re working with several hundred papers, a single paper with an average score of 3.8 may seem to merit only a shrug and a coin flip. But a single false negative might harm a poor student’s confidence, delay her progress to her next project, or undermine her advisor’s grant proposal or promotion case. Conversely, a single false positive wastes the time of quite a lot of people in your audience.

To do this step fairly, then, for the 872 papers remaining undecided, requires a considerable effort.

The dual role of peer reviewed conferences

As we (the PC co-chairs) work to oversee this process and then construct a final program out of AC recommendations, we are mindful of the dual role that a full-paper peer-review conference like COLING 2018 is playing.

On the one hand, peer review is meant to be an integral part of the process of doing science. If something is published in a peer-reviewed venue, that is an indication that it has been read critically by a set of reviewers and found to make a worthwhile contribution to the field of inquiry. This doesn’t ensure that it is correct, or even that most people up-to-date with the field would find it reliable, but it is an indication of scientific value. (This is all the more difficult in interdisciplinary fields, as we address some in an earlier blog post.) This aspect of peer review fits well with the interests of the conference audience as stake-holders: The audience benefits from having vetted papers curated for them at the event.

On the other hand, for individual researchers, especially those employed in or hoping to be employed in academia, acceptance of papers to COLING and similar venues is very important for job prospects/promotion/etc. Furthermore, it isn’t simply a matter of publishing in peer-reviewed venues, but in high-prestige, competitive venues. Where the validation view of peer review would view it as binary question (does this paper make a validatable contribution or not?), the prestige view instead speaks to ranking—where we end up with best papers, strong papers, borderline papers that get in, borderline papers that don’t get in, and papers that were easy to decide to reject. (And, for full disclosure, it is in the interest of a conference to strive to become and maintain status as a high-prestige, competitive venue.)

While understanding our role in the validation aspect of peer review, we are indeed viewing it as a ranking rather than binary process, for several reasons. First, the reviewers are also human, and it is simply not the case that any group of 3-5 humans can definitively decide whether any given paper (roughly in their field) is definitely ‘valid’ or ‘invalid’ as a scientific contribution. Second, even if we did have a perfect oracle for validity, it’s not the case that the amount of available spots in a given conference will be a perfect match for the number of ‘valid’ papers among the submissions. In case there are more worthy papers than spots, decisions have to be made somehow—and we believe that somehow should include both measures of degree of interest in the paper and overall diversity of approaches and topics in the program. (Conversely, we will not be aiming to ‘fill up’ a certain number of spots just because we have them.) Finally, we work with the understanding that COLING is not the only conference available, and that authors whose work is not accepted to COLING will in most cases be able to improve the presentation and/or underlying methodology and submit to another conference.

That ranking is ultimately binarized into accept/reject (modulo best paper awards) and we understand (and have our own personal experiences with!) the way that a paper rejection can seem to convey: ‘this research is not valid/not worthy.’ Or alternatively, that authors with relatively high headline scores on a paper that is nonetheless rejected might feel that the ‘true’ or ‘correct’ result for their paper was overridden by the ACs or PC. But we hope that this blog post will help to dispel those notions by providing a broader view of the process.

PC process once we have the AC reports

Once the ACs provide us with their rankings and reports, on May 4, we (PC co-chairs) will have the task of building from them a (nearly) complete conference program—the one outstanding piece will be the selection of best papers from among the accepted papers. Ahead of time, we have blocked out a ‘frame’ for the overall program so we have upper limits on how many oral presentations and poster presentations we can accept.

As a first step, we will look to see how the total acceptance recommendations of the ACs compares to the total number of spots available. However, it is not our role to simply accept the AC’s recommendations, but rather to review them and ensure that the decisions as a whole are consistent (to the extent feasible, given that the whole process is noisy) and that the resulting program meets our goals of diversity in regard to topics and approaches (again, to the extent feasible, given the submission pool). We have also asked ACs to recommend mode of presentation (oral, poster), with the understanding that oral presentations are not ‘better papers’ than posters, but rather that some topics are more likely to be successful in each mode of presentation.

Though the author identities have been hidden from ACs, they haven’t been hidden from us. Nonetheless, as we work with the AC reports, we will have paper numbers & titles (but not author lists) to work from and will not go out of our way to associate author identities. Furthermore, the final accept/reject decisions for any papers that either of us have a COI with will be handled by the other PC co-chair together with the conference GC.

Review statistics

So far, there have been many things to measure of our review process at COLING. Here are a few.

Firstly, it’s interesting to see how many reviewers recommend the authors cite them. We can’t evaluate how appropriate this is, but it happened in 68 out of 2806 reviews (2.4%).

Best paper nominations are quite rare in general. This gives very little signal for the best paper committee to work with. To gain more information, in addition to asking whether a paper warranted further recognition, we asked reviewers to say if a given paper was the best out of those they had reviewed. This worked well for 747 reviewers, but 274 reviewers (26.8%) said no paper they reviewed was the best of their reviewing allocation.

Mean scores and confidence can be broken down by type, as follows.

Score Confidence
Computationally-aided linguistic analysis 2.85 3.42
NLP engineering experiment paper 2.86 3.51
Position paper 2.41 3.36
Reproduction paper 2.92 3.54
Resource paper 2.76 3.50
Survey paper 2.93 3.58

We can see that reviewers were least confident with position papers, and were both most confident and most pleased with survey papers—though reproduction papers came in a close second in regard to mean score. This fits the general expectation that position papers are hard to evaluate.

The overall distribution of scores follows.

Anonymity and Review

Anonymous review is a way of achieving a fairer process. The ongoing discussion among many in our field led to us examining how well this was really working, and rethinking how anonymity was implemented for COLING this year.

One step we took was to make sure that area chairs did not know who the authors were. This is important because area chairs are the ones putting forward recommendations based on reviews; area chairs are the people who mediate between borderline papers and acceptance, or who assess reviewer ratings to decide if they put a paper on the wrong side of the acceptance boundary. This is a critical and powerful role. So, we should be extra sure that if a venue has chosen to run an anonymized process, the area chairs don’t see paper authors’ names.

This policy caused a little initial surprise but everyone has adapted quickly. In order for this to work, authors must continue to hide their identity, especially through author response to chairs—the current process.

We also increased anonymity in reviewer discussion: reviewers did not and still do not know each others’ identity. To keep review tone professional, we will reveal reviewer identities to each other later in the process, so if you are one of our generous program committee members, you can see who perhaps wrote the excellent review you saw, and also who left the blank one—on submissions you also reviewed.

It’s established that signed reviews—that is, those including the reviewer’s name—are generally found by authors to be of better quality and tone. We gave an option to reviewers to sign their reviews. This time, 121 reviewers used this, out of 1020 active review authors (11.9%).

On the topic of anonymity, there have been a few rejections due to poor or absent anonymization. To help future authors, here are some ways anonymity can be broken.

  • Linking to a personal or institutional github account and making it clear in the prose it is the authors’ (e.g. “We make this available at”).
  • Describing and citing prior work as “we showed”, “our previous work”, and so on
  • Leaving names and affiliations on the front page
  • Including unpublished papers in the bibliography

Some of these can be avoided by simply only referring to one’s past literature in the camera-ready copy, and holding back for review, which is a strategy we recommend. Of course it’s not always possible, but in most of cases we saw, refraining from self-citing would not have damaged the narrative and would have left the paper compliant.

The final step in the review process, from the author side, is author response to chairs. Please remember to keep yourself anonymous here—the chairs know neither author nor reviewer identities, which helps them be impartial.

Author response

The value of the author response mechanism is frequently debated in our field and can be a source of stress for authors. On the one hand, when our work is being reviewed by others, it can feel helpless to not have the opportunity to respond to those reviews. On the other hand, there is the perennial question about whether author responses ever “help” (in the sense of taking a paper over the line to “accept” from “reject”). (On that point, see this very thoughtful analysis by Hal Daumé III for the process for NAACL 2013.) And finally there is the issue that author responses must be turned around in a short time and can be tricky to write: How to strike the right tone (firm, polite, confident; not pleading or angry) especially when we might still be feeling the sting of negative reviews. As reviewers, we have seen both very effective author responses (expressing gratitude for feedback and pointing out sources of misunderstanding) and very ineffective ones (pure vitriol, or long lists of promises of what will be accomplished before the camera-ready version).

In light of all of this, what we settled on for COLING 2018 is an optional author response to be seen by the area chairs only – and not the reviewers. Thus we are providing authors with the opportunity to flag reviewer misunderstandings for area chairs and to answer questions raised by reviews. The latter should only be done when the information is already available and can be indicated in a short statement (e.g. “Indeed, we did set the random seed and will include this information in the camera ready” but not “That is an interesting idea for a further experiment, we will run that one and include the numbers in the camera ready”). We also note that author response is optional and area chairs will not read anything into the lack of an author response.

Author response will run from 20-25 April.

Why this route? Well, the quantitative evidence is that pointing out reviewer mistakes rarely leads to a change in scores. The folk knowledge has been for some time that responses are really used by ACs to detect misaligned reviews. So rather than encourage an intrinsically difficult communication that has had little to no effect in the past, we instead divert the replies to go to the authoritative party they are relevant to. This gives a little extra work for ACs, but as they’re acting in pairs and areas are roughly the same compact size, our hope is that time can be spent more on working out the dialog around a paper and less on administering a huge set of authors and reviewers.

Lessons Learned

The role of PC chair is interesting in many ways. It provides a perhaps unparalleled opportunity to influence the way in which research is approached and presented in our field. For COLING 2018, we have been taking this responsibility very seriously and working hard, through both decisions for the review process and the publicization of those ideas in this blog, the push the field in directions that we believe will be fruitful, including stronger interdisciplinarity and more reproducibility.

On the flip side, the role of PC chair comes with some serious downsides. One is the heart rending process of deciding on and then informing authors of desk rejects. We did our utmost to do this as fairly as humanly possible, starting with publicizing our desk reject policy. We hoped that that move would reduce the number of desk rejects, and it may have, but there were still a handful of papers rejected without review under the policy.

The most common reason for a desk reject by a long way was the paper’s length (ie. documents were submitted with more than 9 content pages). Papers in the completely incorrect template were also desk rejected, as were those with squashed line spacing, reduced font size, removed author boxes, and so on. Other reasons for desk rejection were bad anonymisation; some papers, for example, linked to the author’s private github repository. This is the sort of thing that can really wait until camera ready. All papers sent in other templates were desk rejected (we saw e.g. NAACL, ACL, NIPS formats). One paper was rejected for breaking the arXiv embargo period, having been published there fewer than 30 days before the COLING deadline. No edits were allowed after the deadline had passed. This was a very unpleasant process overall and we can only make a plea to authors to follow the guidelines so that work gets the attention it needs, instead of rejection without feedback. That way there don’t have to be any desk rejects at all. They are often desperately unpleasant to send, and probably even worse to receive.

In this blog post, we wanted to briefly reflect on what we have learned about the kind of practices that put people in the corners that lead to the kind of mistakes that result in desk rejects. In general, we see that there is a culture of last-minutism in our field. Deadlines can inspire people to get things done that otherwise seem impossible, but doing things in a rush also has downsides. Here are some DOs and DON’Ts of paper submission that we hope will spare people some pain in the future:

  • Do access the submission system early, so you know what awaits.
  • Do read the CFP carefully. Such documents can be intimidating, especially for first-time submitters, but the information there all has a purpose, and it’s easier to make use of if you get it early.
  • Don’t leave submitting your final paper until the absolute last minute. If something goes wrong (e.g. submitting the wrong pdf, losing your internet connection), you’ll have missed the deadline. This happens regularly and is wasteful. Sometimes you might not find out it was the wrong PDF until after the deadline, or might be so rushed that the paper spills over the page limit unnoticed. This means the hard work has to wait for another conference.

And finally a couple of thoughts on interacting with PC chairs, especially in large conferences:

  • Please don’t ask the PC chairs to upload a PDF for you after the deadline. The deadline is a deadline. Asking for it to be bent is asking the PC chairs to not apply policies evenly and fairly.
  • Do be aware that the PC chairs in a conference this size are communicating with ~1000 authors and ~1000 reviewers, and keep that in mind as you make requests.