This guest post is by Nicoletta Calzolari.
Not-documented Language Resources (LRs) don’t exist!
The LRE Map of Language Resources (data and tools) (http://lremap.elra.info) is an innovative instrument introduced at LREC2010 with the aim of monitoring the wealth of data and technologies developed and used in our field. Why “Map”? Because we aimed at representing the relevant features of a large territory, also for the aspects not represented in the official catalogues of the major players of the field. But we had other purposes too: we wanted to draw attention to the importance of the LRs that are behind many of our papers and to map also the “use” of LRs, to understand the purposes of the developed LRs.
Its collaborative, bottom-up, creation was critical: we conceived the Map as a means to influence a “change of culture” in our community, whereby everyone is asked to make a minimal effort to document the LRs that are used or created, thus understanding the need of proper documentation. By spreading the LR documentation effort across many people instead of leaving it only in the hands of the distribution centres, we also encourage awareness of the importance of metadata and proper documentation. Documenting a resource is the first step for making it identifiable, which in its turn is the first step towards reproducibility.
With all these purposes in mind we thought we could exploit the great opportunity offered by LREC and the involvement of so many authors from so many countries, from different modalities and working in so many areas of NLP. Afterwards the Map was used also in the framework of other major Conferences, in particular by COLING, and this provides another opportunity for useful comparisons.
The number of LRs currently described in the Map is 7453 (instances), collected from 17 different conferences. The major conferences for which we have data on a regular basis are LREC and COLING.
With initiatives such as the LRE Map and “Share your LRs” (introduced in 2014) we want to encourage in the field of LT and LRs what is already in use in more mature disciplines, i.e. ensure proper documentation and reproducibility as a normal practice. We think that research is strongly affected also by such infrastructural (meta-research) activities and therefore we continue to promote – also through such initiatives – a greater visibility of LRs, the sharing of LRs in an easier way and the reproducibility of research results.
Here is the vision: it must become common practice also in our field that when you submit a paper either to a conference or a journal you are offered the opportunity to document and upload the LRs related to your research. This is even more important in a data-intensive discipline like NLP. The small cost that each of us will pay to document, share, etc. should be paid back from benefiting of others’ efforts.
What do we ask to colleagues submitting at COLING 2018? Please document all the LRs mentioned in your paper!