This Page Pertains to the (Now Closed) 2019 Edition of the MRP Shared Task

Training Data

The table below summarizes the training data that will be provided for the task. The task operates as what is at times called a closed track, i.e. participants are constrained in which additional data and pre-trained models are legitimate to use in system development; see below. While some of the semantic graph frameworks in the task continue to evolve and continuously make available revised and extended data, we anticipate that these selections will provide stable reference points for empirical comparison for at least a couple of years following the task.

	DM	PSD	EDS	UCCA	AMR
Text Type	newspaper	newspaper	newspaper	mixed	mixed
Sentences	35,656	35,656	35,656	6,572	56,240
Tokens	802,717	802,717	802,717	138,268	1,000,217

The DM and PSD data sets are annotations over the exact same selection of texts, which for the previous SemEval tasks have been aligned at the sentence and token levels. As DM was originally derived from EDS, the EDS graphs cover the same texts. The training data for these frameworks draws from a homogeneous source, the venerable WSJ text first annotated in the Penn Treebank (PTB), notably Sections 00–20. As a common point of reference, the task organizers have released a sample of 100 WSJ sentences annotated in all five frameworks in early April 2019.

UCCA training annotations are over web reviews text from the English Web Treebank, and from English Wikipedia articles on celebrities. While in principle UCCA structures are not confined to a single sentence (about 0.18% of edges cross sentence boundaries), passages are split to individual sentences, discarding inter-relations between them to create a standard setting across the frameworks.

AMR annotations are drawn from a wide variety of texts, with the majority of sentences coming from on-line discussion forums. The training corpus also contains newswire, folktales, fiction, and Wikipedia articles.

Because some of the semantic graph banks involved in the shared task have originally been released by the Linguistic Data Consortium (LDC), we will rely on the LDC to distribute the training data to participants under no-cost evaluation licenses. Registration for the task will be a prerequisite to data access. Upon completion of the competition, we will package all task data (including system submissions and evaluation results) for general release by the LDC, as well as make available those subsets that are copyright-free for public, open-source download.

Companion Data

At a technical level, training (and evaluation) data will be distributed in two formats, (a) as sequences of ‘raw’ sentence strings and (b) in pre-tokenized, PoS-tagged, and lemmatized form. For the latter, we provide premium-quality English morpho-syntactic analyses to participants, by training a state-of-the-art dependency parser (the post-futuristic development version of UDPipe; Straka, 2018) on the union of available syntactic training data for English and using jack-knifing (where required) to avoid overlap of morpho-syntactic training data with the texts underlying the semantic graph banks of the task. These parser outputs, in the context of MRP 2019, are referred to as morpho-syntactic companion trees. Whether as merely a source of fairly decent PTB-style tokenization, or as a vantage point for approaches to meaning representation parsing that start from explicit syntactic structure, this optional resource will hopefully offer community value in its own right. The underlying parsing models and software will become publicly available upon completion of the shared task. Additionally, versions of the companion package starting from mid-June 2019 include automatically generated reference anchorings (commonly called ‘alignments’ in AMR parsing) for the AMR graphs in the training data, obtained from the JAMR and ISI tools of Flanigan et al. (2016) and Pourdamghani et al. (2014).

For reasons of comparability and fairness, the MRP 2019 shared task constrains which additional data or pre-trained models (e.g. corpora, word embeddings, lexica, or other annotations) can be legitimately used besides the resources distributed by the task organizers. The overall principle is that all participants should in principle be able to use the same range of data. However, the organizers expect to keep such constraints to the minimum required and invite participants to suggest relevant data or models. To make precise which resources can be used in system development in addition to the data provided by the task organizers, there is an official ‘white-list’ of legitimate resources. The organizers welcome suggestions for additional data to white-list; in case you anticipate wanting to use resources that are not currently on the MRP white-list, please contact the organizers no later than June 3, 2019. The list will be closed and frozen after that date.

Evaluation Data

For all five frameworks, there are established in-domain evaluation sets, which will also serve as test data in the shared task. Additionally, there are common out-of-domain evaluation sets for DM, PSD, EDS, and UCCA (where training data is relatively homogeneous); furthermore, the task organizers will prepare a new (smallish) test set with gold-standard annotations in all frameworks. The instructions for prospective participants provide further information on the nature and scope of evaluation data for MRP 2019.

The evaluation data will be published in the same file format as the training and companion data, viz. the JSON-based uniform MRP interchange format. The target graphs (i.e. the nodes, edges, and tops fields) will of course not be available until completion of the evaluation period, but high-quality tokenization, PoS tags, lemmatization, and syntactic dependency trees will be provided for the evaluation data in the same manner as through the morpho-syntactic companion trees for the training data.

Uniform Graph Interchange Format

Besides differences in anchoring, the frameworks also vary in how they label nodes and edges, and to what degree they allow multiple edges between two nodes, multiple outgoing edges of the same label, or multiple instances of the same property on a node. Node labels for Flavor (0) graphs typically are lemmas, optionally combined with a (morpho-syntactic) part of speech and a (syntactico-semantic) sense or frame identifier. Node labels for the other graph flavors tend to be more abstract, i.e. are interpreted as concept or relation identifiers (where for the vast majority, of course, there too is a systematic relationship to lemmas, lexical categories, and (sub-)senses). Graph nodes in UCCA are formally unlabeled, and anchoring is used to relate leaf nodes of these graphs to input sub-strings. Conversely, edge labels in all cases come from a fixed and relatively small inventory of (semantic) argument names, though there is stark variation in label granularity (ranging between about a dozen in UCCA and around 90 or 100 in PSD and AMR, respectively). For the shared task, we have for the first time repackaged the five graph banks into a uniform and normalized abstract representation with a common serialization format.

The common interchange format for semantic graphs implements the abstract model of Kuhlmann & Oepen (2016) as a JSON-based serialization for graphs across frameworks. This format describes general directed graphs, with structured node and edge labels, and optional anchoring and ordering of nodes. JSON is easily manipulated in all programming languages and offers parser developers the option of ‘in situ’ augmentation of the graph reprensentations from the task with system-specific additional information, e.g. by adding private properties to the JSON objects. The MRP serialization is based on the JSON Lines format, where a stream of objects is serialized with line breaks as the separator character. Each MRP graph is represented as a JSON object with top-level properties tops, nodes, and edges; these are discussed in more detail below. Additionally, the input property on all graphs presents the ‘raw’ surface string corresponding to this graph; thus, parser inputs for the task are effectively assumed to be sentence-segmented but not pre-tokenized. Additional information about each graph is provided as properties id (a string), flavor (an integer in the range 0–2), framework (a string), version (a decimal number), and time (a string, encoding when the graph was serialized).

The nodes and edges values on graphs each are list-valued, but the order among list elements is only meaningful for the nodes of Flavor (0) graphs. Node objects have an obligatory id property (an integer) and optional properties called label, properties and values, as well as anchors. The label (a string) has a distinguished status in evaluation; the properties and values are both list-valued, such that elements between the lists correspond by position. Together, the two lists present a framework-specific, non-recursive attribute–value matrix (where duplicate properties are in principle allowed). The anchors list, if present, contains pairs of from–to sub-string indices into the input string of the graph. Finally, the edge objects in the top-level edges list all have two integer-valued properties: source and target, which encode the start and end nodes, respectively, to which the edge is incident. All edges in the MRP collection further have a (string-valued) label property, although formally this is considered optional. Parallel to graph nodes, edges can carry framework-specific attributes and values lists; in MRP 2019, only the UCCA framework makes use of edge attributes.

Training Data

Companion Data

Evaluation Data

Uniform Graph Interchange Format

Graph Analysis Software