Publication of Results
The task received submissions from eighteen teams, of which two involved one of the co-organizers, and another three do not participate in the official ranking because they arrived after the closing deadline or made use of extra training data (beyond the list of white-listed resources for the task). An overview of participating teams, their approaches to meaning representation parsing, detailed result statistics, and in-depth system descriptions for each of the submissions are presented in the MRP 2019 proceedings volume, which is published through the (Anthology of the) Association for Computational Linguistics.
Sunday, November 3, 2019 |
|
14:00–14:30 | MRP 2019: Cross-Framework Meaning Representation Parsing Stephan Oepen, Omri Abend, Jan Hajic, Daniel Hershcovich, Marco Kuhlmann, Tim O’Gorman, Nianwen Xue, Jayeol Chun, Milan Straka and Zdenka Uresova |
14:30–14:33 | TUPA at MRP 2019: A Multi-Task Baseline System Daniel Hershcovich and Ofir Arviv |
14:33–14:36 | The ERG at MRP 2019: Radically Compositional Semantic Dependencies Stephan Oepen and Dan Flickinger |
14:39–14:42 | ShanghaiTech at MRP 2019: Sequence-to-Graph Transduction with Second-Order Edge Inference for Cross-Framework Meaning Representation Parsing |
14:42–14:45 | Saarland at MRP 2019: Compositional parsing across all graphbanks Lucia Donatelli, Meaghan Fowlie, Jonas Groschwitz, Alexander Koller, Matthias Lindemann, Mario Mina and Pia Weißenhorn |
14:45–14:48 | HIT-SCIR at MRP 2019: A Unified Pipeline for Meaning Representation Parsing via Efficient Training and Effective Encoding Wanxiang Che, Longxu Dou, Yang Xu, Yuxuan Wang, Yijia Liu and Ting Liu |
for Cross-Framework Meaning Representation Parsing |
|
14:51–14:54 | JBNU at MRP 2019: Multi-level Biaffine Attention for Semantic Dependency Parsing Seung-Hoon Na, Jinwoon Min, Kwanghyeon Park, Jong-Hun Shin and Young-Kil Kim |
14:54–14:57 | CUHK at MRP 2019: Transition-Based Parser with Cross-Framework Variable-Arity Resolve Action Sunny Lai, Chun Hei Lo, Kwong Sak Leung and Yee Leung |
14:57–15:00 | Hitachi at MRP 2019: Unified Encoder-to-Biaffine Network for Cross-Framework Meaning Representation Parsing Yuta Koreeda, Gaku Morio, Terufumi Morishita, Hiroaki Ozaki and Kohsuke Yanai |
15:00–15:03 | ÚFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task Milan Straka and Jana Straková |
15:03–15:06 | Amazon at MRP 2019: Parsing Meaning Representations with Lexical and Phrasal Anchoring |
15:06–15:09 | SUDA-Alibaba at MRP 2019: Graph-Based Models with BERT Yue Zhang, Wei Jiang, Qingrong Xia, Junjie Cao, Rui Wang, Zhenghua Li and Min Zhang |
15:09–15:12 | ÚFAL-Oslo at MRP 2019: Garage Sale Semantic Parsing Kira Droganova, Andrey Kutuzov, Nikita Mediankin and Daniel Zeman |
15:12–15:15 | Peking at MRP 2019: Factorization- and Composition-Based Parsing for Elementary Dependency Structures Yufei Chen, Yajie Ye and Weiwei Sun |
15:15–15:30 | Final Discussion: Towards MRP 2020 (Everyone) |
16:30–18:00 | Poster Session (All Participating Teams) |
Evaluation Data
Parsers will be evaluated on unseen, held-out data for which the gold-standard target graphs will not be available to participants before the end of the evaluation period (please see below). For some of the parser inputs used in evaluation, target annotations are available in multiple frameworks; a shared sub-set of 100 sentences have been annotated with gold-standard target graphs in all five frameworks.
DM | PSD | EDS | UCCA | AMR | |
---|---|---|---|---|---|
Text Type | mixed | mixed | mixed | mixed | mixed |
Sentences | 3,359 | 3,359 | 3,359 | 1,131 | 1,998 |
Tokens | 64,853 | 64,853 | 64,853 | 21,647 | 39,520 |
The MRP evaluation data will be distributed in the same file format
as the training graphs, but without the nodes
,
edges
, and tops
values (essentially
presenting empty graphs, which participants are expected to fill in).
Thus, the input
property on each evaluation ‘graph’
provides the string to be parsed, and an additional top-level
property targets
indicates which output
framework(s) to predict.
The evaluation data will be packaged as a single file input.mrp
(containing a total of 6288 strings to be parsed), but in principle each
sentence can be processed in isolation.
For each parser input and each of its targets
values, participating
systems are expected to output one complete semantic graph in MRP format
(for a total of 13,206 predicted graphs in a complete submissions).
The MRP evaluation data will be bundled with some of the same
‘companion’ resources as the
training data, viz.
state-of-the-art morpho-syntactic dependency parses (as a separate file
udpipe.mrp
).
Unlike for the training data, however, companion AMR ‘alignmnets’
(i.e. partial anchorings, in MRP terms) cannot be provided for the
evaluation data, seeing as these would presume knowledge of the gold-standard
AMR graphs.
System Ranking
The primary evaluation metric for the task will be cross-framework MRP F1 scores. Participating parsers will be ranked based on average F1 across all evaluation data and target frameworks. For broader comparison, additional, per-framework scores will be published, both in the MRP and applicable framework-specific metrics. Albeit not the primary goal of the task, ‘partial’ submission are possible, in the sense of not providing parser outputs for all target frameworks. The training and evaluaton setup in MRP 2019 differs from previous tasks for all frameworks involved; thus, single-framework submissions can help make connections to previously published results.
System Development
The task operates as what is at times called a closed track. Beyond the training and ‘companion’ data provided by the co-organizers, participants are restricted in which additional data and pre-trained models are legitimate to use in system development. These constraints are imposed to improve comparability of results and overall fairness: beyond resources explicitly ‘white-listed’ for the task, no additional data or other knowledge sources must be used in system development, training, or tuning.
Evaluation Period
The evaluation period of the task will run from Monday, July 8, to Thursday, July 25, 2019, 12:00 noon in Central Europe (CEST). At the start of the evaluation period, the data will be distributed, again, through the Linguistic Data Consortium (LDC), as a new archive available for download by registered participants who have previously obtained the MRP training data from the LDC. The LDC expects to enable download of the evaluation data starting at 10:00 o'clock (in the morning) at the US East Coast (EST) on July 8, 2019.
Participants will be expected to prepare their submission by processing all parser inputs using the same general parsing system. All parser outputs have to be serialized in the MRP common interchange format, as multiple, separate graphs for each input string that calls for predicting multiple target frameworks. Team registration and submissions will be hosted on the CodaLab service, where basic validation will be applied to each submission using the mtool validator. Access to CodaLab will require at least one team member to self-register for the task (called a ‘competition’ in CodaLab terms), but it should be possible for multiple CodaLab users to jointly form a team. Participants must agree to putting their submitted parser outputs into the public domain, such that all submissions can be made available for general download after completion of the evaluation period.
System Submission
To make a submission, participants need to obtain the evaluation data package
via the LDC (see above) and process parser inputs (from the MRP file
input.mrp
) according to the instructions in the
README.txt
file included with the data.
While it is possible to process individual parser inputs separately (for
example to parallelize parsing), all parser outputs to be submitted must be
concatenated into a single MRP file (for example output.mrp
) and
compressed into a
ZIP archive prior to uploading
the submission to the
CodaLab site.
On a Un*x system, for example, an archive file submission.zip
suitable
for upload to CodaLab can be created as follows:
$ zip submission.zip output.mrp adding: output.mrp (deflated 88%)
Validation of parser outputs in MRP serialization is supported in mtool (the Swiss Army Knife of Meaning Representation), and it is strongly recommend that participants validate their graphs prior to submission to CodaLab.
It is possible to make multiple submissions throughout the evaluation period. For each team, only the most recent submission (made before or on Thursday, July 25, 2019, 12:00 noon in Central Europe) will be considered for scoring; evaluation of multiple, different ‘runs’ (or system configurations) will not be possible during the official evaluation period. The closing date for the evaluation period is July 25, 2019, 12:00 noon in Central Europe (CEST).
Registration of Intent
To make your interest in the MRP 2019 task known and to receive updates on data and software, please self-subscribe to the mailing list for (moderated) MRP announcments. The mailing list archives are available publicly. To obtain the training data for the task, please (a) make sure your team is subscribed to the above mailing list and (b) fill in and return to the LDC the no-cost license agreement for the task. A more formal registration of participating teams will be required in early July, as the evaluation period nears (please see the task schedule and below).